Arrow Database Connectivity (ADBC) support for Snowflake

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

6 min readJun 22, 2023

Anurag Gupta Dewey Dunnington Matt Topol Kae Suarez

Today, we’re excited to announce that Snowflake is starting to experiment with Arrow Database Connectivity (ADBC) support for Snowflake. ADBC makes it quicker and easier to connect to Snowflake using languages that don’t already have bespoke Snowflake connectors. Language-specific bindings can now be created to make these connections much easier — ADBC allows connecting from any programming language that supports ADBC API to use the ADBC driver for connecting to Snowflake API. Using the ADBC driver, Snowflake can now be queried efficiently right away from any language that can natively call C functions, such as R, Rust, Ruby, Java, C#, and C++.

Arrow Database Connectivity (ADBC) is an API standard for bulk columnar database access that uses the Arrow in-memory format for result sets and query parameters. ADBC is a sub-project of Apache Arrow. It is similar to existing standards like ODBC and JDBC, but the data stays in columnar format. This makes it ideal for any bulk columnar analytics workflows, avoiding the cost of transposing data to a row-oriented format and back, making it much more efficient than ODBC/JDBC.

Key benefits

Arrow columnar memory format has become the de facto open-source standard for high-performance analytics for Snowflake customers.
Performance: Highly efficient Arrow-native bulk columnar support. No unnecessary data transpositions.
Multi-Language support: Quick and easy path to support new languages like Rust, R, C++, etc.
Developers can install drivers as packages without needing administrator privileges.

How does the ADBC Driver work?

The ADBC Driver (the Driver) for Snowflake is written on top of the Snowflake Go connector. The Driver provides a C interface for interacting with a Snowflake database and retrieving results in Arrow in-memory format. Under the hood, the queries from a client are sent through the ADBC Interface to the ADBC Driver for Snowflake, which does the heavy lifting of interacting with Snowflake’s REST API. If this sounds like ODBC, you’d be correct. The concept is virtually the same, except that ADBC returns Arrow data. It retrieves Arrow data from Snowflake and returns Arrow data to the user without the user having to handle all the conversions and paying the time cost. Using the Driver with Snowflake means Arrow-in/Arrow-out.

Example: Using Snowflake from R

Let’s consider using Snowflake from R. Before now, you’d typically use the R ODBC package with the Snowflake ODBC driver or might have to implement the connectivity layer yourself that would use Snowflake’s APIs. To remedy this and show the usefulness of ADBC, we’ve added R bindings for ADBC and created an R package you can download.

Essentially, Snowflake ADBC support means that R users have an easy-to-install, performant Snowflake Go connector. You can get started with just a few lines:

# When nanoarrow 0.2 .0 + ADBC Libraries 0.5 .0 are released
# remotes::install_github("apache/arrow-adbc/r/adbcsnowflake", build = FALSE)
library(adbcdrivermanager)
uri < -Sys.getenv("ADBC_SNOWFLAKE_TEST_URI")
db < -adbc_database_init(adbcsnowflake::adbcsnowflake(), uri = uri)
con < -adbc_connection_init(db)
con |>
  read_adbc("SELECT R_REGIONKEY, R_NAME FROM REGION ORDER BY R_REGIONKEY") |>
  as.data.frame()
# > R_REGIONKEY R_NAME
# > 1 0 AFRICA
# > 2 1 AMERICA
# > 3 2 ASIA
# > 4 3 EUROPE
# > 5 4 MIDDLE EAST

Just as Snowflake’s Python connector allows Snowflake users to leverage the wide ecosystem of Python-based web frameworks like Django, Streamlit, Flask, and more; the accessible R–Snowflake connector enabled by ADBC lets users build Snowflake-native web apps with popular R-based web frameworks like Shiny. Getting started with Snowflake and Shiny is also just a few lines of code. Snowflake ADBC support means R users have an easy-to-install, performant Snowflake Go connector.

Here is a short demo of the R based web application built using Shiny querying data from a Snowflake database:

Please see the code below or link to the Github gist.

static int addMetadata(AVFrame * frame) {
  AVDictionary * frame_metadata = av_frame_get_metadata(frame);
  int frame_dict_count = 0;
  AVDictionaryEntry * tag = NULL;

  av_dict_set( & frame_metadata, "test0", "test", 0);
  av_dict_set_int( & frame_metadata, "test1", 0, 0);

  frame_dict_count = av_dict_count(frame_metadata);
  av_log(ctx, AV_LOG_INFO,
    "Metadata count: %d\n", frame_dict_count);
  while ((tag = av_dict_get(frame_metadata, "", tag, AV_DICT_IGNORE_SUFFIX)))
    av_log(ctx, AV_LOG_INFO, "%s=%s\n", tag -> key, tag -> value);

  av_frame_set_metadata(frame, frame_metadata);

}
# Until nanoarrow 0.2 .0 is released
# remotes::install_github("apache/arrow-nanoarrow/r", build = FALSE)
# Until ADBC Libraries 0.5 .0 is released
# remotes::install_github("apache/arrow-adbc/r/adbcdrivermanager", build = FALSE)
# remotes::install_github("apache/arrow-adbc/r/adbcsnowflake", build = FALSE)
library(shiny)
library(adbcdrivermanager)
library(gt)
# Set up the connection
uri < -Sys.getenv("ADBC_SNOWFLAKE_TEST_URI")
db < -adbc_database_init(adbcsnowflake::adbcsnowflake(), uri = uri)
con < -adbc_connection_init(db)
# Get initial regions dropdown content
regions < -con |>
  read_adbc("SELECT R_REGIONKEY, R_NAME FROM REGION ORDER BY R_REGIONKEY") |>
  as.data.frame()
# Define user interface
ui < -fluidPage(
  titlePanel("TCPH Nations by Region queried from Snowflake"),
  sidebarLayout(
    sidebarPanel(
      selectInput("region", "Region", setNames(regions$R_REGIONKEY, regions$R_NAME)),
    ),
    mainPanel(
      tags$p("Snowflake SQL query:"),
      htmlOutput("query", container = tags$pre),
      tags$p("Output:"),
      gt_output("nations_table")
    )
  )
)
# Define server logic
server < - function (input, output) {
  # Show the query we 're about to issue
  output$query < -renderText({
    region_id < -input$region
    sprintf(
      "SELECT N_NATIONKEY, N_NAME FROM NATION WHERE N_REGIONKEY = %d",
      as.integer(region_id)
    )
  })
  # Render the customers table
  output$nations_table < -render_gt({
    # Get the region_id from the regions dropdown
    region_id < -input$region
    # Query Snowflake
    nations < -con |>
    read_adbc(
      sprintf(
        "SELECT N_NATIONKEY, N_NAME from NATION WHERE N_REGIONKEY = %d",
        as.integer(region_id)
      )
    ) |>
    as.data.frame()
    # Render a pretty table using gt()
    nations |>
    gt() |>
    opt_interactive(use_compact_mode = TRUE)
  })
}
# Run the application
shinyApp(ui = ui, server = server)

Interacting from R with the ADBC Driver for Snowflake

For R users looking to experiment like the example above, the ADBC project builds an R package that includes a statically-linked version of the Snowflake driver and automatically coordinates the build and link process to ensure compatibility with the user’s environment. This means that the only configuration an R user has to worry about is `install.packages()`.

For R users looking to work with the ADBC Driver for Snowflake’s C interface, it gives a shared object exposing a single function: `AdbcDriverInit()`. Given the ability to build and/or install a shared object compatible with your R installation, that shared object is all that is needed to load a driver from R: the `adbcdrivermanager` package can accept a path to a shared object with the appropriate initializer function exposed. This architecture is similar to ODBC, where drivers are shared objects with a driver manager that knows how to load them.

Expanding access to Snowflake from other languages,

The ADBC Driver for Snowflake implements C API such that the interface can be used from other languages. Many languages already have ADBC driver bindings, like Java, C, Go, R, or Python. C# and Rust is currently in the works.

Snowflake adopted Arrow for faster database access clients, keeping the data in standard columnar data structures all the way, and now added ADBC support for cross-language API support working closely with the Voltron Data Team. Voltron Data helps enterprises design and build composable data systems with open standards like ADBC, Arrow, Ibis, Substrait, and more.

Get started today!

As ADBC support is still in its experimental phase, it isn’t covered by official Snowflake support today. However, we’d love to use this as an opportunity to start the conversation around ADBC support in Snowflake.

If you want to try using the ADBC Driver for Snowflake from R, like the example above, here’s a link to the Github Gist to get started.

To try connecting to Snowflake from Go using the ADBC Driver for Snowflake, you can fetch the package using the command below.

go get github.com/apache/arrow-adbc/go/adbc/driver/snowflake

If adding connectivity to Snowflake from another programming language is of interest to you or if you would like to contribute to the efforts, please reach out to us via Github: http://github.com/apache/arrow-adbc or developers@snowflake.com

Additional resources

For more information on ADBC, please see the documentation: ADBC 0.5.0 documentation

For Snowflake Drivers, please see: Drivers | Snowflake Documentation