Platform : MS SQL Server 2008 R2 Integration Services, IBM iSeries DB2 V6R1
The SQL Server Integration Services (SSIS) is a sophisticated Extract-Transform-Load (ETL) tool that eases data analysts to perform data import/export from a system to another one and do some data transformation. Like any other ETL tool existing in the market nowadays, SSIS is commonly used in data warehouse to collect the data from multiple sources and put in one consolidated big database.
One of the well-known databases is the IBM iSeries DB2 which is widely operated in many financial companies (banking, insurance, etc) because of its fantastic capability to store and process giant records. In this tutorial, I will explain the steps to pull data from iSeries DB2 using SSIS.
Installing the ODBC Driver
As you already know, in general there are two common interfaces used to connect a database: ODBC and OLE DB. In this occasion we will use the first one. The DB2 ODBC driver used is part of a software called IBM iSeriess Access. When this software is installed on a computer (could be a PC or a server), besides an ODBC driver there are also other tools like the 5250 emulator, data transfer tools, documentation etc – however for now we only want to focus on the ODBC driver. It is important for you to consult with the server vendor what version of iSeriess Access that is compatible with the operating system of the computer and the iSeries machine. In my case, I use the following operating systems:
- MS Windows Server 2008 64 bit
- IBM iSeries V6R1
In that environment, the version of IBM iSeriess Access used is V5R4M0 and we find the ODBC driver is working well with MS SQL Server 2008 R2.
So installing the IBM iSeriess Access is the first step must be done before going any further. Please carefully examine that the ODBC driver is installed appropriately and able to connect the DB2 database successfully.
If you have the previous version of IBM iSeriess Access (e.g. V5R3M0 or V5R2M0), there is a big chance that the program cannot be installed on MS Windows 2008 operating system. If you insist to install it, some components will fail to install. It is suggested to seek the appropriate version as explained above or the newer one if exists.
Creating a DSN
When using an ODBC driver, a definition of a Data Source Name (DSN) is necessary to specify the database parameters such as the server name, database name, user ID etc.
- Go to Control Panel à Administrative Tools –> Data Sources (ODBC)
- In the ODBC Data Source Administration window, select User DSN or System DSN and press Add to create a new data source. The System DSN is actually more preferred rather than the User DSN. Later the NOTE box in the following page will explain the reason.
- Select iSeries Access ODBC Driver from the list.
- In the General tab, please specify the Data source name and the System name (the server name of the iSeries machine).
- In the Server tab, type the SQL default library. This is the library that will be used after the user is successfully connected to the server.
You may set the Connection type as Read-Only or leave it as default. Normally the SQL command sent to the server is only SELECT commands that will not change the data; besides the iSeries sysadmin is usually very strict concerning the data security. Although the connection is Read/Write, but if the user right given is read-only then you still will not be able to change the data after all.
- Press OK to save the new DSN.
When creating a new DSN, usually the most preferred type is the System DSN rather the other DSN type, i.e. User DSN and File DSN. The reason is because the System DSN works for anyone using that system, no matter who logs onto the machine, the system DSN can be seen by all users. For example you log in to the server to create the DSN, but eventually the user who will use the DSN is the SQL service account. If the User DSN is used, most likely the SQL service account will not find the DSN you’ve created.
The System DSN normally works fine in most of the cases. However I find a pretty strange thing in one of my SQL Server installations. If I create a System DSN or User DSN then the DSN cannot be found by the SSIS package although it is there in the ODBC Data Source Administrator – however if I create a File DSN, then it will appear. So specifically for this server, I use the File DSN.
Honestly I cannot explain this anomaly because in other computer we use System DSN and works fine. My suggestion is to use a System DSN as far as it works – but if it doesn’t, then try to use the File DSN.
Creating the Data Flow Task
After the DSN is done, let’s proceed to open the SQL Server Business Intelligence Development Studio application to design a SSIS package.
- Create a new Integration Services Project and give a name as you wish.
- Now we are going to create a data connection for the IBM iSeries DB2. Do a right-click some where around the Connection Managers pane to display a pop-up menu. Select New ADO/NET Connection.
- Choose the provider as .Net Providers/Odbc Data Provider, and select the DSN you’ve just created from the list box. Do not forget to specify the User name and the Password. See the picture below.
If you cannot find the DSN, you probably have the same problem with mine as explained in the NOTE box earlier (about the anomaly). If that is the case, then you should choose the User Connection string option followed by pressing the Build button. In the next window, select the DSN from the File DSN (that’s why you must create the File DSN first). Upon finished, the connection string is automatically built for you. The Build button is very helpful as the DSN cannot be selected, the database connection string must be typed – but this kind button do that for us from the defined File DSN.
- Click the Test Connection button to verify the configuration. Press OK when everything works fine.
After finished creating the IBM iSeries DB2 data source connection, you may continue with the destination data connection, for example SQL Server, Text or Excel. Typically the OLE DB Connection is used for the SQL Server data connection. When all data connections have been properly set up, then the SSIS package is ready to be developed.
- Drag a Data Flow Task from the Toolbox to the design pane and then double-click to open it.
- Now we want to specify the data source of the Data Flow task, in this case is the IBM iSeries DB2. Drag an ADO NET Source and double-click to open it. Select the data source from the ADO NET Connection created earlier.
- In the Data Flow task, normally the source connection can be linked directly to the destination connection; i.e. the source columns are mapped one by one to the destination columns. However when the iSeries DB2 is used as the source data, the link should not be made directly because of the incompatibility data type between iSeries DB2 and SQL Server. We need a middle part in between those connections that convert the data type from the source to the destination.
To do that, drag a Data Conversion from the Toolbox to the design pane. Actually not all columns need to be converted. We must pay attention on source columns only those with the text data type while the other columns with numeric data type typically do not need to be converted. The input columns need to be mapped as String (DT_STR) or Unicode string (DT_WSTR) becoming output alias. Don’t forget to specify the length too.
This document does not explain in detail how to do the data conversion. Please refer to the SQL Server Online Book or the other resource how to do that.
Right after the Data Conversion, drag a destination connection and link it with the previous component as shown in the following picture. For the purpose of easier explanation in this document, let’s assume the destination connection is SQL Server.
If the destination is a SQL Server database, then SQL Server Destination or OLE DB Destination can be chosen.
The rest of the project are the same with the other SSIS package, please complete it by your self.
Executing the SSIS Package in a SQL Server Job
The reason why I need to add this last section in the explanation because there is one special trick that must be done relating to the iSeries DB2 source data – if you don’t follow this step, the SSIS package cannot be called from the SQL Server Job. You might not find this advice any where; I give it freely as a compliment for your keenness to read my writing.
The typical problem that possibly arises when a SSIS package is executed by the SQL Server Job is the login error. I know it sounds ridiculous since we already specify the password for the DB2 user when creating the package. But some how it seems the SQL Server Job looses the password or cannot retrieve the stored password when execute the SSIS package. This error will not occur when the SSIS package is run directly in the SQL Server Business Intelligence Development Studio and in the SQL Server Integration Service, that’s why it is quite confusing.
Please be sure to import the SSIS package first into the SQL Server Integration Service as we will call the package in a SQL Server Job.
Here is the secret.
- You already notice that when creating a step in the SQL Server Job, the SSIS package is referred using the SQL Server Integration Services Package type and the name of the package is defined in the Package column. Those configurations are made in the General tab as depicted in the following picture.
Those are common configurations in any SSIS package execution step.
- Switch to the Data sources tab, place the check marks on all existing data connections.
- Go to the Command line tab, and choose Edit the command line manually. As you do that, the Command line text box becomes enabled and you are able to edit the command line.
Find the text “User ID=…” or “uid=…”, then add the text “Pwd=<enter the password>” manually. The picture below shows the illustration. Don’t forget to add the “;” character to separate the parameters.
That’s all. Save the Job and it’s ready for action.