Accessing Snowflake with R Studio via ODBC on SPCS

Gabriel Mullen
Snowflake
Published in
4 min readDec 18, 2023

--

Code example repo here.

In a previous post, I was able to deploy a R Studio container into Snowpark Container Service. This allows me to run R code directly in Snowflake, but I still need to connect to Snowflake to grab data. So when I attempted to use the dbConnect syntax from the original article I was following, I ran into some issues and had to shore those up. I wanted to share those additional steps.

My first issue was that I didn’t have the Snowflake driver installed, which was obvious in hindsight. So we need to install that, and the ODBC Manager. We also need to provide the ODBC information and location. I need to do all of this as part of the docker build process. So the first step is to define the driver information. In order to grab this information I actually had to install the driver first with the following. I then used docker run to login into the instance and used FIND to locate where the driver was installed.

RUN wget https://sfc-repo.snowflakecomputing.com/odbc/linux/3.1.4/snowflake-odbc-3.1.4.x86_64.deb
RUN dpkg -i snowflake-odbc-3.1.4.x86_64.deb

Once I had the driver information I was able to populate the odbc driver information as follows.

odbc.ini

[ODBC Data Sources]
snowodbc = SnowflakeDSIIDriver


[snowodbc]
Driver = /usr/lib/snowflake/odbc/lib/libSnowflake.so
Description =
server = va_demo103.us-east-1.snowflakecomputing.com
role = gmullen_rl

odbcinst.ini

[ODBC Drivers]
SnowflakeDSIIDriver=Installed

[SnowflakeDSIIDriver]
APILevel=1
ConnectFunctions=YYY
Description=Snowflake DSII
Driver=/usr/lib/snowflake/odbc/lib/libSnowflake.so
DriverODBCVer=03.52
SQLLevel=1

So the final Dockerfile to build the image is as follows. We use the ADD command to copy files locally to the docker image. We use the RUN command to execute processes during the build process. We need to update apt so it has an accurate list of packages; otherwise it doesn’t have the unixodbc package we need. We then install unixodbc to interface with the driver API and the odbcinst1debian2 to provide the correct version of the odbc driver manager. Then we install the Snowflake driver and install the necessary r packages in advance of uploading (docker push) to Snowflake.

FROM rocker/rstudio:4.3.2

LABEL org.opencontainers.image.licenses="GPL-2.0-or-later" \
org.opencontainers.image.source="https://github.com/rocker-org/rocker-versioned2" \
org.opencontainers.image.vendor="Rocker Project" \
org.opencontainers.image.authors="Carl Boettiger <cboettig@ropensci.org>"

RUN /rocker_scripts/install_tidyverse.sh
ADD odbc.ini /etc/odbc.ini
ADD odbcinst.ini /etc/odbcinst.ini
RUN apt update
RUN apt install -y unixodbc odbcinst1debian2
RUN wget https://sfc-repo.snowflakecomputing.com/odbc/linux/3.1.4/snowflake-odbc-3.1.4.x86_64.deb
RUN dpkg -i snowflake-odbc-3.1.4.x86_64.deb
RUN R -e "install.packages(c('keyring','DBI','odbc','dplyr','dbplyr'), repos = 'https://cran.rstudio.com/')"

Then we build the docker image and push it to the Snowflake repository.

docker build --platform linux/amd64 --tag sfsenorthamerica-va-demo103.registry.snowflakecomputing.com/gmullen_db/public/gmullen_img_repo/tidyverse:v2 -f dockerfiles/tidyverse_4.3.2.Dockerfile .
docker push sfsenorthamerica-va-demo103.registry.snowflakecomputing.com/gmullen_db/public/gmullen_img_repo/tidyverse:v2

Then we create the SERVICE in SPCS. I started getting lazy because I was iterating so many times on this. So I decided to stop modifying the YAML file and I defined the configuration options directly in the service like so:

CREATE SERVICE tidyverse
IN COMPUTE POOL gmullen_pool
FROM SPECIFICATION $$
spec:
container:
- name: tidyverse
image: sfsenorthamerica-va-demo103.registry.snowflakecomputing.com/gmullen_db/public/gmullen_img_repo/tidyverse:v2
env:
DISABLE_AUTH: true
endpoint:
- name: e1
port: 8787
public: true
$$;

Finally, now that I have the ODBC driver setup I can make a connection to Snowflake:

> library(DBI)
> library(odbc) # Connect to ODBC
> library(dplyr) # Dplyr for data manipulation

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

> library(dbplyr) # dbplyr for connection to database

Attaching package: ‘dbplyr’

The following objects are masked from ‘package:dplyr’:

ident, sql

> con <- dbConnect(odbc::odbc(), Driver="SnowflakeDSIIDriver",
+ Server = "sfsenorthamerica-va_demo103.snowflakecomputing.com",
+ UID = "gmullen", PWD = rstudioapi::askForPassword("Database password:"),
+ Database = "gmullen_db", Warehouse = "gmullen_vwh", Schema = "public")

Finally I am able to query some sample data

Hope that helps anyone trying to connect RStudio via SPCS, but it might also be handy with any container you’re spinning up that might need to access data via ODBC.

Addendum:
My colleague Rebecca O'Connor took the extra step of authenticating this process with an OAuth token. Here is that extra bit of code.

library(DBI)
library(odbc) # Connect to ODBC
library(dplyr)
directory = '/snowflake/session/token'
file_content <- readLines(directory)
sfConn <- DBI::dbConnect(odbc::odbc(), "snowodbc",
uid = "JUPYTER",
authenticator="OAUTH",
token = file_content,
warehouse = "CONTAINER_HOL_WH",
database = "FLOOD_AND_HYDROLOGY_DATA",
schema = "FLOODS_AND_HYDROLOGY"
)

--

--

Gabriel Mullen
Snowflake

Sales Engineer at Snowflake. All content are solely that of Gabriel Mullen.