Accelerating Cross Filtering with cuDF

Data Visualization Fosters Complex Communication 💬

RAPIDS is all about enabling data scientists with enterprise grade tools and GPU performance. Visualization being a key component in a data scientist’s toolbox, we are naturally working on ways to accelerate that experience.

I’m a huge advocate of data visualization. It’s one of the closest things we have to a universal language that can distill and communicate complex, supported ideas to a broad domain of audiences. And although the types of visualizations out there are as varied as the types of data, the ability to quickly explore concurrent views of multivariate data is universally useful. In short, we wanted cross filtering.

Inspired by the original javascript library, we wanted to improve it with GPU acceleration and multi-gigabyte data sizes and keeping a straightforward javascript API. Simple, right? Introducing cuXfilter ( ku-cross-filter ), a proof of concept utilization of the RAPIDS cuDF library.

GTC Mortgage Visualization Demo using cuXfilter 🗺️

Debuted during the RAPIDS launch at GTC EU 2018, cuXfilter is at a point where we would love to see what the open source community can make of it. A good showcase of the versatility of cuDF and cuXfilter, the GTC mortgage visualization demo is included in the repository and links to its dataset are on this demos page. In that single GPU demo, we show smooth sub-second interact with 146,000,000+ million rows of data using deck.gl, react, victory charts, and other open source javascript libraries.

cuXfilter Architecture Overview 🏢

Because getting deep access to a GPU from a browser is not trivial, cuXfilter needs a full backend and a client-side API. We use a Sanic server to access python cuDF functions, an Express server with node.js to handle chaining calls, and a socket.io connection to the cuXfilter API to connect to any javascript-based visualization library. So, while updating a chart’s data by moving a slider is a simple front-end interaction, it actually sets off a series of chained actions in this backend that make GPU cross filtering behave as expected.

We’ve just started exploring ways to use RAPIDS for visualization, both for data computation and rendering acceleration. CuXfilter is our first experiment. We started this project because we wanted to see how we could use the impressive deck.gl / kepler.gl visualization libraries and are eager to keep working with the Uber viz team. Yet, visualization is a big space, and we are eager to further engage and integrate with the wider open source data visualization ecosystem — specifically those in Python. Whether you’re a developer or end user of a library, we’d love to hear from you on how we can continue to grow!

RAPIDS ⚡GPU Acceleration

Want to make your own visualizations with cuXfilter? Have some ideas on how to better the architecture or apply GPU acceleration? We want to hear it, so raise some issues or PRs on the cuXfilter github. Curious about other ways you can contribute, find more at rapids.ai.