How to load data from Excel into Oracle Data Integrator (ODI) 12.2.1 using OpenLink JDBC — ODBC Bridge Drivers
The Oracle Data Integrator (ODI) is a Graphical Extract, load and transform (ELT) tool for the integration of remote data sources into warehouse for use in business intelligent systems. The OpenLink JDBC-ODBC Bridge driver enables the connectivity between JDBC applications and ODBC accessible data sources across multiple operating system types.
There is often a need to read data from Excel spreadsheets and integrate it with different target technologies in Data Integration projects. The challenge is that MS Excel only supports ODBC connectivity whereas Oracle Data Integrator (ODI) and many other tools only support JDBC connectivity natively. This is where the Sun JDBC-ODBC Bridge packaged with Sun Java Development Kit (JDK) proved very useful to make the bridge between ODBC and JDBCS. However, starting with JDK 1.8 (required for ODI 12.2.1) the Sun JDBC-ODBC Bridge has been deprecated.
In this Blog post the Pre-built ODI Linux VM is used, providing a pre-configured out of the box working instance of ODI 12C running against an Oracle 12C database instance.
Install ODI Linux VM
Follow the instruction on the ODI 12C Linux VM Getting Started guide , to perform the installation on one of the supported Virtual Box guest OS’es (Windows, Linux , Mac OS X), as a result of which the running VM provided the ODI Studio application
Microsoft Excel ODBC DSN Setup
On the Windows 32 or 64bit machine an Excel ODBC Driver DSN needs to be setup for connecting to the target Excel spreadsheet. Given the following sample SpreadSheet:
The first the Spreadsheet needs to be prepared such that ODI can harvest its metadata, by defining a name for the cells in the MS Excel Spreadsheet to the accessed, in this case we call is HEALTH_DATA:
The steps being:
- Select all columns
- Go to “Formulas”
- Select “Define Name”
- Type a name for the table (metadata)
The Excel ODBC DSN Can now be created using the Microsoft ODBC Administrator:
Note for 64bit ODBC Drivers it is located in %windir%\system32\odbcad32.exe and for 32bit ODBC Drivers it is located in %windir%\SysWOW64\odbcad32.exe
We will use the 64bit ODBC Driver in this example, but the steps are exactly the same for the 32bit ODBC Driver.
Go to the System DSN tab of the ODBC Administrator and click on the Add button to add a new DSN:
Select the Microsoft Excel Driver from the list of ODBC Drivers presented:
Give the DSN a name and use the Select WorkBook button to point to the Excel Workbook (Spreadsheet) to be used ie the HEALTH_DATA spreadsheet in this case:
Click the OK button to save the configured DSN:
The Excel ODBC DSN setup is now complete and ready for use with the OpenLink JDBC-ODBC Bridge.
OpenLink JDBC-ODBC Bridge Installation
As we are using the ODI Linux VM and the target Excel Spreadsheet resides on a Windows machine, the OpenLink Multi-Tier JDBC-ODBC Bridge Driver for Linux glibc 2.12 as the client and Windows (32 or 64 bit) as the server, must be use to provide the required connectivity. Links to download the required 32 or 64 bit components are including 15 day evaluation license are:
- Windows 64bit Multi-Tier JDBC-ODBC Bridge Download
- Windows 32bit Multi-Tier JDBC-ODBC Bridge Download
OpenLink Multi-Tier JDBC-ODBC Bridge Driver Architecture diagram:
Note the OpenLink also provide a Single-Tier JDBC-ODBC Bridge Driver, which can be used if ODI and the Excel Spreadsheet (ODBC Driver) are installed on the same machine ie Windows.
OpenLink Multi-Tier JDBC-ODBC Bridge — Windows ODBC Database Agent installation
The Windows Multi-Tier ODBC Bridge Database Agent needs to be installed to provide the connection to the Excel ODBC DSN configured in the previous section.
This requires the installation of the OpenLink Request Broker (wabrzzzz.msi)and ODBC Bridge Agent (waaozzzz.msi) MSI installers from the download server links above:
Following the Windows Multi-Tier Installation Guide for details of installation, ensuring that the evaluation license provided is select when prompted for it during the installation. A Typical installation type can be performed in both case installing the necessary components required.
Once complete the OpenLink Request Broker (oplrqb.exe) will automatically be running as a Windows service, listening for incoming connection requests from the OpenLink Data Access Clients (i.e. JDBC/ODBC/ADO.Net etc).
OpenLink Multi-Tier JDBC-ODBC Bridge — Linux JDBC Client installation
The OpenLink Mutli-Tier JDBC Client for Java 8 (opljdbc4_2.jar) is required for use with the ODI Linux VM and can be downloaded and installed on the Linux VM following the instructions in the OpenLink Generic JDBC Driver Installation and Configuration .
Open a Linux terminal shell and run the following command to installer the JDBC Client components in a location of your choice:
[oracle@ODIGettingStarted ~]$ mkdir uda
[oracle@ODIGettingStarted ~]$ cd uda
[oracle@ODIGettingStarted ~]$ wget http://opldownload-staging.s3.amazonaws.com/uda/components/7.0/x86_64-generic-linux-glibc212-64/opljdbc.taz
[oracle@ODIGettingStarted uda]$ sh install.sh
Started installation on : ODIGettingStarted
Extracting (opljdbc.taz) …
Setting up CLASSPATH for Java connectivity.
You can choose between the following Java JDK versions:
1) JDK 1.1.x
2) JDK 1.2.x
3) JDK 1.3.x
4) JDK 1.4.x
5) JDK 1.5.x
6) JDK 1.6.x
7) JDK 1.7.x
8) JDK 1.8.x
Please enter your choice [ENTER=6] : 8
Creating scripts …
End of installation
[oracle@ODIGettingStarted uda]$ ls -ltr jdk1.8
-rw-r — r — . 1 oracle oracle 263873 Jul 28 2015 opljdbc4_2.jar
-rw-r — r — . 1 oracle oracle 168464 Jul 28 2015 megathin4_2.jar
Copy the OpenLink Java 8 JDBC Client jar (opljdbc4_2.jar) to the $HOME/.odi/oracledi/userlib directory on Unix based (Linux, Mac OS X etc) systems:
[oracle@ODIGettingStarted uda]$ cp jdk1.8/opljdbc4_2.jar $HOME/.odi/oracledi/userlib
[oracle@ODIGettingStarted uda]$ ls -ltr $HOME/.odi/oracledi/userlibtotal 272
-rw-r — — -. 1 oracle oracle 332 Oct 2 2013 additional_path.txt
drwxr-x — -. 2 oracle oracle 4096 Jun 19 2014 component
drwxr-x — -. 4 oracle oracle 4096 Jul 1 19:32 pattern
-rw-r — r — . 1 oracle oracle 263873 Aug 4 10:33 opljdbc4_2.jar
Note on Windows systems these files would be placed in the ..\AppData\Roaming\odi\oracledi\userlib directory
The OpenLink JDBC Client installation and configuration for use with ODI is now complete.
ODI Studio Setup
Start the ODI 12c Studio application by double click on on the icon on the Linux VM desktop:
Check ODI Studio is actually picking up the OpenLink JDBC Client (opljdbc4_2.jar) for use on startup, by going to the Help -> About –> External Components menu and type “opl” to filter the list. You should see the following:
If you don’t, then make sure you have copied the jar file to the right folder.
ODI Topology — Connection
The next step is to configure the Microsoft Excel technology module in the Topology navigator, to use the OpenLink JDBC Driver (opljdbc4_2.jar).
This requires a connection be made to the ODI repository which can be done by clicking on the Connect to Repository icon in the left frame window, at which point you will be prompted for a Wallet Password, that is welcome1 by default and then for the ODI Login credentials that are user = SUPERVISOR and Password = SUPERVISOR by default. The ODI initialisation will take place and the connection completed:
Click on the blue factory icon in the top right of the Topology tab in the left Frame wndow and ensure the option to Hide Unused Technologies in not checked, otherwise you would see the Microsoft Excel technology module:
Add New Excel Data Server module:
Give Data Server a new in the Definitions tab
Add the OpenLink JDBC Driver name and connection string parameters (jdbc:openlink://hostname:5000/SVT=ODBC/DATABASE=ODBCDSN) for connecting to the Request Broker on thewindows-hostname on port 5000 , with the connect params SVT = ODBC and Database = ExcelDSN ; with ExcelDSN being the name of the Excel ODBC Driver DSN created on the Windows machine.
Click on the File -> Save menu item to save the Data Server and right click on it and select the Test Connection menu item to test the connection to the Excel Spreadsheet on Windows:
From the Toplogy tab right click on the newly create Excel Data Server and select the New Physical Schema menu item to create a physical schema for the data server:
In the Definitions tab use the default name given for the schema or a name of your choosing
In the Context tab create a context name for the physical schema and associate it with a Logical Schema name of your choice:
In the Logical Architecture drop down in the left frame Window create a new Logical Schema for the Excel data server and associate it with the physical Schema name created in the previous step
Select the Models drop down from the left frame menu and right click on it to add a new Model giving it a name of your choosing. The Code name is automatically generated based on the Name. Select the Microsoft Excel module from the Technology drop down menu and then from the Logical Schema drop down select the logical Schema created in the previous step
From the Selective Reverse-Engineering tab tick the 4 check box options presented as below; then click that File -> Save menu item to save the options.
Click on the Reverse Engineer tab in the Model toolbar menu. Oracle Data Integrator launches a reverse-engineering process for the selected datastores. A progress bar indicates the progress of the reverse-engineering process. At the end of the process you will see the Metadata for the Excel Spreadsheet defined on Windows has been imported.
Right Click on the imported SpreadSheet, HEALTH_DATA in this case select the View Data menu option:
The data in the SpreadSheet will now be dispalyed in the right pane window.
There you have it the Excel SpreadSheet data available in ODI 12c …