Connecting to snowflake tables using R — an intuitive approach
I spent a good part of my Monday trying to find the best way to interface the data stored in snowflake tables to R. While I did find a lot of articles, code snippets in the web, none of them were tailor made to suit my use case which was — to load a data table from Snowflake as a dataframe in R by establishing a connection via a third party JDBC driver, in a corporate setting.
I use DBeaver, a SQL client and a database administration tool for interacting and interfacing with data warehouse(s) such as snowflake, SAP HANA etc. In this specific use case, I had to retrieve a sub table from snowflake for analysis in R. Since I had already connected to Snowflake using Dbeaver, I wanted to leverage this connection for loading the data in R, instead of installing new drivers. Below is the step by step approach that I followed.
1. Commencing the JDBC connection via R. Use the snowflake JDBC driver installed by Dbeaver. Here I am using the JDBC function in R — which creates a new DBI driver that can be used to start JDBC connections.
Pro Tip: Make sure your client driver is updated to the latest version and you are connected to your corporate vpn.
2. Establishing the connection to snowflake. Use the dbConnect function to connect to snowflake going through the appropriate authorization procedure. You will need the corporate snowflake URL, port number, and also the user name and password for authentication purposes.
Pro Tip: To access the specified the database in the URL you need to append the user name and the role parameter in the URL using “&” to specify that the particular user and role has usage privileges on the given database. Failing to do so may lead to the connection pointing to the “INFORMATION SCHEMA” which is the default table for the database.
3. Selecting the right table. More often than not the snowflake warehouse would house numerous databases and it is not always possible to remember the exact name of the database that you want to access. Here the SHOW DATABASE SQL command would come in handy, followed by the USE DATABASE command. You can execute the SQL queries in R using the dbGetQuery function.
4. Saving to a dataframe in R. Once you are connected to the right database, you can write a SQL query to extract the relevant data. You can save the output of the query to a data frame in R.
I always prefer to bank on the user friendliness of R and want my code to be as intuitive as possible. Hope this approach helps the readers.