Connecting DSX — RStudio to on-prem DB2 for z/OS database
IBM Data Science Experience provides an environment that brings together all of a data scientist’s tools into one location. It includes the most popular Open Source tools and IBM value-add functionality with integration of community and social features to make data scientists more successful.
A wide variety of cloud data sources are easily connected to DSX, or a user could upload data from DB2 on z/OS (by creating a copy) into the environment. However, the ability to access to data in-place on the system of record, has great appeal.
In this guide, I will outline the steps needed to configure DSX to access an on-prem DB2 for z/OS database using JDBC and provide the ability to utilize RStudio to perform advanced analytics operations on this data.
Here is an outline of the steps we will go through in the guide to set up the DSX environment for DB2 for z/OS connectivity:
- Examine the RStudio interface.
- Identify the JDBC driver files and configure the initial environment with RStudio.
- If required, setup a Bluemix Secure Gateway.
- Use the RJDBC package to connect with a DB2 for z/OS database, and run a query.
Data Science Experience — RStudio Interface
(Note: The screenshots in this tutorial are from a Mac OSX environment running Firefox. If you are running from Windows or Linux, there may some differences in the appearance of the native dialogs and browser interfaces.)
Once setup with your account, click on the DSX Logo at the top of your screen, and this will take you to the main Community page. Once there, if you click the far left drop-down menu, you will see your project options, including RStudio. Click on the the RStudio option, and this will open up your RStudio instance.
If you have opened DSX RStudio before, the instance opened will have retained any history or enviroment settings that you had saved/autosaved from before. If you are familiar with RStudio and have used a native version of RStudio on Windows/Mac OSX, it will look exactly the same. The only difference is that the underlying infrastucture is running in a Bluemix virtual machine. There is limited access to the underlying operating system, but we can access the filesystem to upload the JDBC driver and license file and configure the connection for DB2 for z/OS.
The RStudio interface should look similar to what is shown below……
(For a complete guide to using RStudio, please see the documentation linked to in the ‘Help’ Section)
Next, we will confirm that the correct packages are installed/loaded that enable JDBC access to our on-prem database.
In the bottom right panel of the RStudio desktop you will see the ‘Packages’ tab, and within that is an ‘Install’ tab. If you click “Install”, you will get a dialog box that allows you to load the required packages from the CRAN repo. You will need to install ‘RJDBC’. If the ‘Install dependencies’ box is checked, you will also be installing ‘rJava’ and ‘DBI’ (which load automatically). Once you install these packages, you should see them in the User Library. If they are unchecked, please check them, and they will be enabled in the environment.
Next, we will need to upload the appropriate DB2 for z/OS data server driver .jar files to the DSX environment. The files we will need are: db2jcc.jar and db2jcc_license_cisuz.jar
If you already have a licensed version of DB2 Connect installed, these driver files are already available on one of your existing client environment setups. Ask your DB2 system administrator for the location of these files in your client environment.
This guide is based on DB2Connect Version 11.1 for Linux, Unix and Windows (client) and DB2 for z/OS Unlimited Edition (server). If you are not currently licensed for DB2Connect, see the Appendix at the end of this guide for instructions on how to obtain the correct data server drivers and license files that are required.
From the RStudio desktop (shown above), go to the lower right panel again, to the “Files” section. Then, click “Upload” and you will get a dialog box that allows you to browse your local file system to upload the driver files. For both db2jcc.jar and db2jcc_license_cisuz.jar, upload these to the current directory of your DSX/RStudio environment.
Connecting to the z Server over TCPIP — Secure Gateway
With the JDBC driver files in place, we now need to configure access to the server which is running the DB2 for z/OS instance. The assumption for most IBM customers is that their target z server is running on a secure internal network, which is accessed externally through a VPN or from an internal private network access point. If this is the case, you will also need to set up a Bluemix Secure Gateway, that will allow the Bluemix/DSX cloud app to access your z server through a firewall, using the same VPN/internal network acces point that you use for other client connections from a desktop.
For more general information on Bluemix Secure Gateways, please go to: https://console.ng.bluemix.net/docs/services/SecureGateway/secure_gateway.html
If you do not need to configure a Secure Gateway, you can skip to the “Connecting to DB2/zOS with JDBC” section.
To configure the secure gateway, please go to https://console.ng.bluemix.net/dashboard/services/, log in to Bluemix with the same credentials you are using for DSX, and then use the drop-down menu on the left to go to the ‘Services’ and then ‘Dashboard’ menu.
Once within the Services dashboard, click the “Create Service +” button.
Next, we will go to “Integrate” and then choose the “Secure Gateway” service from the menu.
Leave the service “Unbound” and click “Create”.
And, then “+ Add Gateway’
Choose a name for your gateway, uncheck both of the boxes listed, and then click “Add Gateway”.
Now, we can setup the destination details, and the client. Click on the gateway.
Then click on “+ Add Destination”.
Choose “On-Premises” and click Next.
On the next screen add your host name/IP , and the DB Port of your DB2z Subsystem. Click ‘Next’
Then, select TCP protocol on the next screen, and click “Next” again.
The next screen setting is related to server side authentication. If you typically connect with just a user and password, then choose “None”. If you are not sure whether or not your server requires additional authentication, ask your System Administrator, and you can change this later. Click ‘Next’.
You can skip over the next setting, unless you want to privatize the IP address and port or your connection. Click ‘Next’.
To complete the destination settings for the back-end server, we just need to give a name for our connection. Choose a name for your destination that describes the DB2 Subsystem you are connecting to, and then click “Finish”.
Now let’s install the client for our secure gateway. Choose “+ Add Clients”
Choose your preferred method of hosting the gateway client. For this demonstration, we will be using IBM Installer — Mac OSX version on a laptop that has a VPN connection to access an on-prem z server. The details for different OS installs and Using Docker or IBM DataPower are covered in detail in the online instructions. For specific install details, See the “Bluemix Documentation” link.
At this time, copy the Gateway ID, for input later into the gateway client command-line interface.
After Downloading the client, double-clicking the image, you might have to go to your ‘Finder’ to see the mount point for the image.
Click on the mount-point and then drag the ‘ibm’ folder into the Applications Folder.
Now, we can go into the ‘IBM’ folder within Applicatons, and double-click to run the segw.command to open the CLI for the gateway.
Enter the gateway ID you copied earlier, and then enter ‘none’ for the security token.
Next, note the web address of the gateway interface — localhost:9003/dashboard
Enter that web address in your web browser.
Below, we see the gateway web UI. You can see all the details of your secure gateway from this interface. There is one more setting, for the access control list. Please click “Access Control List”, and then enter your destination hostname/IP and port once again.
Below, once you have added the host/IP and port, you can close the browser window, or minimize it.
We are almost done. The Secure Gateway config is complete.
Now we will return to the main Secure Gateway dashboard to copy the cloud hostname/port for the gateway. It will be used for the JDBC connection string in DSX/RStudio. From the Services Dashboard, click on the gateway you created.
On this screen, click on the ‘settings gear’ of the destination server connection details. A dialog box will open.
Copy the ‘Cloud Host : Port’ address to use in your JDBC connection in DSX/RStudio.
Return to the RStudio desktop, and we can setup the JDBC driver and run a query.
Connecting to DB2/zOS with JDBC
In the RStudio desktop, we are now ready to enter the JDBC commands at the console prompt to setup the connection to our DB2 for z/OS database. Make sure the RJDBC, DBI, and rJava packages have loaded.
Also, at this time, make sure you are running any VPN connection that is normally required to access your backend system from a desktop client.
Enter the following commands in the RStudio console, noting the results you get for the classpath display:
drv <- JDBC("com.ibm.db2.jcc.DB2Driver","/home <br> /rstudio/db2jcc.jar")
(* note the addition of the license file….its position in the classpath is important *)
Originally published at datascience.ibm.com on February 9, 2017.