Using dplyr & dbplyr with R language and Oracle Database

Maarten Smeets
Aug 23, 2017 · 8 min read

R uses data extensively. Data often resides in a database. In this blog I will describe installing and using dplyr, dbplyr and ROracle on Windows 10 to access that data using R.

Accessing the Oracle Database from R

dplyr makes the most common data manipulation tasks in R easier. dplyr can use dbplyr. dbplyr provides a transformation from the dplyr verbs to SQL queries. dbplyr 1.1.0 is released 2017–06–27. See here. It uses the DBI (R Database Interface). See here. This interface is implemented by various drivers such as ROracle. ROracle is an Oracle driver based on OCI (Oracle Call Interface) which is a high performance native C interface to connect to the Oracle Database.

Installing ROracle on Windows 10

I encountered several errors when installing ROracle in Windows 10 on R 3.3.3. The steps to take to do this right in one go are the following:

  • Determine your R platform architecture. 32 bit or 64 bit. For me this was 64 bit
  • Download and install the oracle instant client with the corresponding architecture (here). Download the basic and SDK files. Put the sdk file from the sdk zip in a subdirectory of the extracted basic zip (at the same level as vc14)
  • Download and install RTools (here)
  • Set the OCI_LIB64 or OCI_LIB32 variables to the instant client path
  • Set the PATH variable to include the location of oci.dll
  • Install ROracle (install.packages(“ROracle”) in R)

Encountered errors

You probably tried to install the ROracle package which Oracle provides on an R version which is too new (see here). This will not work on R 3.3.3. You can compile ROracle on your own or use the (older) R version Oracle supports.

This can be done by installing RTools (here). This will install all the tools required to compile sources on a Windows machine.

Next you will get the following question:

If you say y, you will get the following error:

In order to fix this, you can download and install the Oracle Instant Client (the basic and SDK downloads). Mind that when running a 64 bit version of R, you also need a 64 bit version of the instant client. You can check with the R version command. In my case: Platform: x86_64-w64-mingw32/x64 (64-bit). Next you have to set the OCI_LIB64 variable (for 64 bit else OCI_LIB32) to the specified path. After that you will get the error as specified below: Next it will fail with something like:

This is caused when oci.dll from the instant client is not in the path environment variable. Add it and it will work! (at least it did on my machine). The INSTALL file from the ROracle package contains a lot of information about different errors which can occur during installation. If you encounter any other errors, be sure to check it.

How a successful 64 bit compilation looks

Testing ROracle

You can read the ROracle documentation here. Oracle has been so kind as to provide developer VM’s to play around with the database. You can download them here. I used ‘Database App Development VM’.

After installation of ROracle you can connect to the database and for example fetch employees from the EMP table. See for example below (make sure you also have DBI installed).

This will yield the data in the EMP table.

Using dplyr

dplyr uses dbplyr and it makes working with database data a lot easier. You can see an example here. Installing dplyr and dbplyr in R is easy:

Various functions are provides to work with data.frames, a popular R datatype in combination with data from the database. Also dplyr uses an abstraction above SQL which makes coding SQL for non-SQL coders more easy. You can compare it in some ways with Hibernate which makes working with databases from the Java object world more easy. Some functions dplyr provides:

  • filter() to select cases based on their values.
  • arrange() to reorder the cases.
  • select() and rename() to select variables based on their names.
  • mutate() and transmute() to add new variables that are functions of existing variables.
  • summarise() to condense multiple values to a single value.
  • sample_n() and sample_frac() to take random samples.

I’ll use the same example data as with the above sample which uses plain ROracle

The output is something like:

If I now want to select specific records, I can do something like:

Which will yield

A slightly more complex query:

Will result in the number of employees per department:

You can see the generated query by:

Will result in

If I want to take a random sample from the dataset to perform analyses on, I can do:

Which could yield something like:

Executing the same command again will result in a different sample.

Finally

There are multiple ways to get data to and from the Oracle database and perform actions on them. Oracle provides Oracle R Enterprise. Oracle R Enterprise is a component of the Oracle Advanced Analytics Option of Oracle Database Enterprise Edition. You can create R proxy objects in your R session from database-resident data. This allows you to work on database data in R while the database does most of the computations. Another feature of Oracle R Enterprise is an R script repository in the database and a feature to allow execution of R scripts from within the database (embedded), even within SQL statements. As you can imagine this is quite powerful. More on this in a later blog!

Originally published at javaoraclesoa.blogspot.com on August 23, 2017.

Oracle Developers

Aggregation of articles from Oracle & partners engineers, Groundbreaker ambassadors & the developer community on all things Oracle Cloud and its technologies. The views expressed are those of authors solely and do not necessarily reflect Oracle's. Contact @jimgris or @brhubart

)

Maarten Smeets

Written by

Oracle Developers

Aggregation of articles from Oracle & partners engineers, Groundbreaker ambassadors & the developer community on all things Oracle Cloud and its technologies. The views expressed are those of authors solely and do not necessarily reflect Oracle's. Contact @jimgris or @brhubart

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade