Spark 3.0 + NVIDIA GPUs — The Impossible Is Now Possible

Paul Lashmet
Product AI
Published in
2 min readDec 25, 2021

Scaling out ML models used to be a time intensive endeavor. Using fraud analytics as an example, this post will show how the combination of Spark 3.0 with NVIDIA GPUs makes that possible with very little code change.

The Opportunity — Spark

Apache Spark is an incredible infrastructure for unified large scale, horizontally scalable data ingestion, data engineering, and machine learning. It is easily scalable to hundreds of nodes and can handle virtually any type of enterprise data transformation and compute workload. But with certain applications, the scale of the data is just so enormous that processing the data in an effective, economical, and timely fashion is extremely difficult.

The Challenge — Time

Fraud analytics, especially on bank wire transfer data that is notoriously complex, is one of those problems. A wire transfer doesn’t always go from Bank A to Bank B. It often traverses a global network of banks, so while a Bank C might not be the recipient of a wire transfer it can be party to the transaction as a middleman. The risk is that once any bank wires money into any account as part of a (yet unclassified) fraudulent transfer, they are now responsible.

Wire transfer records contain a wealth of unstructured information that can be extracted, cleaned and classified. But to facilitate effective fraud analytics, that internal data must be combined with external data sets in near real-time.

Imagine the Possibilities

What if you could make your existing Spark jobs fast enough to mitigate fraud without having to re-write them? One way to do this is to use the latest version of Spark along with NVIDIA’s RAPIDS suite of software libraries. This will provide end-to-end data pipelines that are run entirely on GPUs. Data engineering in real-time is now a possibility because RAPIDS includes the familiar Dataframe API to transform large scale data. That, combined with a multi-node multi-GPU infrastructure, will provide hyper-accelerated data transformations at a scale that was previously too costly and difficult. This new found scale will make the impossible possible and, more importantly, help prevent wire fraud.

--

--

Paul Lashmet
Product AI

Paul Lashmet is a business integration architect and financial services subject matter expert.