Plotly and NVIDIA Partner to Integrate Dash and RAPIDS

Published in

Plotly

4 min readMay 19, 2020

We’re pleased to announce that Plotly and NVIDIA are partnering to bring GPU-accelerated Artificial Intelligence (AI) & Machine Learning (ML) to a vastly wider audience of business users. By integrating the Plotly Dash frontend with the NVIDIA RAPIDS backend, we are offering one of the highest performance AI & ML stacks available in Python today. This is all open-source and accessible in a few lines of Python code.

NVIDIA’s CEO Jensen Huang mentioned some of the early fruits of this partnership in the first minute of his GTC 2020 Kitchen Keynote last week, and today we’re more formally announcing our partnership.

A typical business intelligence (BI) dashboard or analytical application combines graphs, maps, and controls to provide interactive access to queries and AI models running on large, complex datasets. Any organization delivering goods or services at scale will have millions of records to analyze, spread out in time and space and across various more-abstract dimensions. Building a performant application on top of such a dataset usually requires a multi-team, multi-week effort and results in a complex, multi-tiered architecture. New technologies like Dash and RAPIDS are changing this landscape, empowering individual Python developers to easily and quickly build analytical applications that are more performant than their complex counterparts.

This COVID-19 Dash + RAPIDS app aggregates over 300M rows in GPU memory (source code on Github)

Plotly’s Dash is an open-source framework developed by Plotly, which enables developers to build interactive, data-rich analytical web applications in pure Python, with no Javascript required. Traditional “full stack” analytical application development is done in teams with some members specializing in back-end/server technologies like Python, some specializing in front-end technologies like React and some specializing in data science. Dash provides a tightly-integrated back-end and front-end all controlled from simple Python functions. This means that data science teams producing models and analyses no longer need to rely on back-end specialists to expose these models to the front-end via APIs, and no longer need to rely on front-end specialists to build user interfaces to connect to these APIs.

RAPIDS is a collection of open-source libraries developed with NVIDIA to accelerate Python data science workloads by running them on GPUs rather than CPUs. RAPIDS provides massively sped-up, drop-in replacements for the most popular Python data science libraries, such as Pandas, Scikit-Learn and NetworkX. RAPIDS shortens the hypothesis-query-validation loop for data scientists by quickly running queries over full disaggregated datasets on local and remote workstations, rather than having to wait for results, or spending time aggregating or sampling datasets, or loading them into remote databases to get to the desired level of query performance. This means that with RAPIDS, data scientists can be less reliant on data engineering or database administration specialists, who are usually responsible for these aggregation and extract-transform-load (ETL) tasks.

Using Dash and RAPIDS together, individual data scientists no longer need to compromise between the iron triangle of performance, aggregation and time-to-delivery. They can now work independently across the entire analytical stack, from raw data to user interface, to quickly deliver applications to their users. A Dash + RAPIDS application is typically less than a thousand lines of easy-to-read, pure-Python code to create a smoothly-interactive interface for users, and can run on single or multi GPU-powered nodes. These interfaces can transparently aggregate data for big-picture overviews at a whole-business scale, while allowing for intuitive slice, dice, and drill-down operations that operate just as fast, on the same raw dataset. Whereas in the past, in order to get good performance on a single node, users could only access pre-aggregated data at a state or county, or daily or hourly level, applications can now re-aggregate data on the fly at any level, or grant access to individual records, to uncover fine-grained patterns that are normally obscured by aggregates. RAPIDS can also be used to quickly train, retrain and execute machine-learning models on these same large, fine-grained datasets.

The RAPIDS team has just published an article detailing how they built a Dash + RAPIDS app that allows interactive exploration of a 300-million-row dataset: one row per person living in the US. Plotly and the RAPIDS team will be publishing more articles in the coming days and weeks showcasing some of the fruits of our collaboration, so we encourage you to follow Plotly on Medium or Twitter (and RAPIDS on Medium or Twitter) to catch those updates. In the meantime, contact us if you have any questions about Dash or would like to learn more about the partnership!

Plotly and NVIDIA Partner to Integrate Dash and RAPIDS

Written by Plotly