Intro to Teradata’s R package — tdplyr Part 1

Published in

Teradata Data Science Capabilities

3 min readOct 28, 2019

Intro

tdplyr is Teradata’s Vantage R Client library. tdplyr is a fast, consistent tool for working with data frame like objects within Teradata and it takes advantage of the Big Data and Machine Learning analytics capabilities of Vantage. Teradata in the past few months has made available and published updates to tdplyr on our discussed them on the GitHub site

Getting Started

A good place to start is by checking out this Teradata TechBytes video on Youtube. You can also check out our documentation site for the latest info.

Installation

The Teradata R package (tdplyr) contains proprietary code and cannot be offered on CRAN. It is available from Teradata’s R package repository, run the following script to download and install in your R environment.

Rscript -e "install.packages('tdplyr',repos=c('https://teradata-download.s3.amazonaws.com','https://cloud.r-project.org'))"

The above command also installs the package dependencies dplyr, dbplyr, DBI, magrittr and teradatasql packages which are available from CRAN or Teradata’s R package repository.

Some Background

In the latest release, we are recommending using Teradata SQL Driver for R over the ODBC driver. The driver is maintained in a separate package teradatasql. The new driver implements the DBI Specification and moves us in a more flexible direction with respect to R applications interacting with Vantage. Some features I am very excited about include support for more data types and enhanced data type mappings and an interface to FastLoad as well as password encryption. At the time of publication, FastExport is not supported, but it is on the roadmap.

Connecting to Vantage

While there are a few ways to connect the easiest is to use the native driver.

library(tdplyr) 
# Create context using the Teradata Native Driver.
 
con <- td_create_context(host = <host-name>, uid=<username>, pwd=<password>, dType = "native")

We know data scientists often get access to data files from various sources. wanted to make it easier for you to move them from the client-side to Vantage. This example uses the copy_to() function to create a table named “iris_flowers” in the “TDUserDB” database of Vantage. The table contains the same data as the “iris” dataset and can be manipulated using Vantage commands.

copy_to(con, iris, name="iris_flowers", overwrite=FALSE)

copy_to() function will work for small datasets if you are wanting to copy a larger dataset (> 100,000 rows) it would be best to use the FastLoad protocol. Here is an example on to use FastLoad to load a large file into Teradata Vantage with R.

Interacting with tables

Once you have a connection to a Teradata Vantage system you can create virtual dataframes using tbl() command. The result is a tdplyr tibble object referencing an existing table in a Teradata Vantant system.

The following command creates a tdplyr tibble object “tddf_iris” from the existing table “iris_flowers” in Vantage.

tddf_iris <- tbl(con, "iris_flowers")

You can create a tibble of a table from your non-default database by using the in_schema() command.

itddf_iris <- tbl(con, in_schema("diff_database", "iris_flowers"))

dplyr-like Verbs

We are still expanding tdplyr for support of more dplyr capabilities. Here is a list of the current dplyr-like verbs for release 16.20.00.04.

In the next post, I will give some examples of the usage of dplyr-like functionality.