RAPIDS AI
Published in

RAPIDS AI

Time Moves Forward — RAPIDS Hits Its Three Year Milestone

RAPIDS Release 21.10

October 15, 2018, is a day that some of us on the RAPIDS team, and some team alumni, will remember for the rest of our lives. This was the day RAPIDS went live at GTC Munich. The vision for RAPIDS grew out of two facts: the PyData ecosystem was becoming the lingua franca of Data Science; and then-recent developments in the application of neural net methods, which quickly grew to billions of nodes in a single model, demonstrated the potential for GPUs to deliver enormous speedups for data-centric problems. RAPIDS was built by data scientists for data scientists.

Release 21.10 continues to work toward the vision of making GPU speed available to everyone, whether you are quickly experimenting with engineering features for a model, doing ETL for the world’s most data-intensive enterprises, solving NLP problems, or offering a revolutionary end-to-end suite of tools for doing cyber security data science.

Let’s get into the RAPIDS core library updates.

RAPIDS cuDF (DataFrames)

For everyone who has to work with time formats, cuDF includes a huge number of features to make your life relatively easier no matter what speed you’re going. These include Series.dt.is_quarter_start, Series.dt.is_quarter_end, Series.dt.is_month_start, Series.dt.is_month_end, and Series.dt.is_leap_year. There is now support for rolling and groupby.rolling variance and standard deviation, and groupby first and last aggregations. You can calculate days_in_month, use Nulls in Time Series Generator, and use Series.ceil()on DateTime series as well.

RAPIDS cuML (Machine Learning)

The value of time, your time as a data scientist, has always been core to the mission of the cuML team. This release includes several speed-ups to existing methods. GLM is now faster via an improved eigendecomposition algorithm. Random Forest Poisson has added Impurity Criterion and been refactored to be faster than ever. Exact Nearest Neighbors is faster via a 2-Dimensional Random Ball Cover algorithm. This release also graduates the hierarchical DBSCAN (HDBSCAN) implementation from “experimental” to fully-supported status. Check out the associated HDBSCAN blog for more details.

Beyond improvements to existing algorithms, there are new features and methods. ARIMA now supports missing observations and padding; Random Forest now has vector leaf predicting; support for categorical variables has been added to the Forest Inference Library (FIL); cuML now has complete Naive Bayes capabilities with the addition of Categorical Naive Bayes; and there are three new distance metrics: Kullback–Leibler divergence, Jensen-Shannon Divergence, and the Russel-Rao Coefficient.

RAPIDS cuGraph (Graph Analytics)

The cuGraph team is releasing pylibcugraph: a python library for supporting cuGraph as a backend to other python libraries (e.g. CuPy). The Sorensen coefficient is now available to users, and the team has continued to work on improving memory use and performance as well as general code clean-up.

RAPIDS Node.js (Visualisation)

We’re excited to be having Allan Enemark present on “GPU Accelerating Node.js with the Node-RAPIDS Data Science Framework” at NodeConf in mid-October. He and his team have continued to make strides in bringing the speed of GPUs to the Node community since the initial release in July. If you use Node or know others that do, spread the word about this amazing new tool to speed things up.

Conclusion

A lot of data science and statistics is about making a good guess about what will happen in the future. As we look forward to celebrating the four-year anniversary of RAPIDS this time next year, our best guess is that the tools and software will be more comprehensive and faster. You can join us on this journey! Request features or (even better!) contribute your thinking and code on GitHub. Follow us on Twitter. Check out the RAPIDSFire podcast. I’ll talk to you in December. Until then, keep moving forward.

--

--

--

RAPIDS is a suite of software libraries for executing end-to-end data science & analytics pipelines entirely on GPUs.

Recommended from Medium

Getting Started with NFT OnChained (Part 1)

The Affordability of Canadian Cities — An Excel Analysis

https://www.equinix.es/resources/analyst-reports/idc-cloud-first-strategy-digital-transformation-spa

IDC: Un enfoque “Cloud-First” para generar valor en la estrategia de digitalización de su empresa

How Fast Is the Corona Virus Spreading in Your Country Compared to the Rest of the World?

Data Science:- 2. Data Preprocessing using Scikit Learn

DataFrame parser

Heap Data Structures Quiz

Cape Python: Apply Privacy-Enhancing Techniques to Protect Sensitive Data in Pandas and Spark

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Paul Mahler

Paul Mahler

Data Scientist and Technical Product Manager. Triathlon enthusiast.

More from Medium

RAPIDS Release 22.02

GraphColorFlow — a short tutorial

Deepchecks ❤️ Weights & Biases for Testing ML within the Training Workflow

Multivariate Time Series Forecasting using XGBoost