Photo by Joshua Sortino on Unsplash

The End-to-End Data Analytics Workflow Requires Generality

I came across an article from NVIDIA talking about their TPCx-BB benchmark results on A100. As a data scientist, I was immediately intrigued because I’m a big fan of the Transaction Processing Performance Council (TPC) benchmarks, which provide reasonable and objective performance metrics. Also, the TPC has clear rules about how their benchmarks are used and how results are reported to ensure that results from different vendors can be directly compared. I’ll say more about this later, but first let’s talk about the end-to-end data analytics workflow.

I’ve drawn a rough sketch of the end-to-end data analytics workflow based on…

Photo by Nicolas Picard on Unsplash

There’s a Better Way to Do Large-Scale Graph Analytics

Benchmarking isn’t my favorite topic, but I have a passing interest in graph analytics benchmarking:

I’ll occasionally dissect benchmarks that I think are inaccurate or misleading:

And I’ll also dissect benchmarks that only tell part of the story. I was half-listening to Jensen Huang’s NVIDIA GTC 2020 Keynote from May 14, 2020 when one of his performance claims caught my attention. At about the 19:30 minute mark of Part 6, the presentation turns to large-scale graph analytics, and claims that a DGX A100 rack can compute PageRank (PR) on a 128-billion-edge web graph at 688 billion edges per second. I…

Photo by Alina Grubnyak on Unsplash

Or, Why Benchmark Reproducibility Matters

Benchmarking the Louvain Algorithm

If you’ve read my last two articles, Measuring Graph Analytics Performance and Adventures in Graph Analytics Benchmarking, you know that I’ve been harping on graph analytics benchmarking a lot lately. You also know that I use the GAP Benchmark Suite from the University of California, Berkeley, because it’s easy to run, tests multiple graph algorithms and topologies, provides good coverage of the graph analytics landscape — and, most important — gives comprehensive, objective, and reproducible results. However, GAP doesn’t cover community detection in social networks.

The Louvain algorithm [1] for finding communities in large networks is a possible candidate to…

Photo by Clint Adair on Unsplash

It’s Important to Use a Benchmark for Its Intended Purpose

With all the attention graph analytics is getting lately, it’s increasingly important to measure its performance in a comprehensive, objective, and reproducible way. I covered this in a previous article, in which I recommended using an off-the-shelf benchmark like the GAP Benchmark Suite from the University of California, Berkeley. There are other graph benchmarks, of course, like LDBC Graphalytics, but they can’t beat GAP for ease of use. There’s significant overlap between GAP and Graphalytics, but the latter is an industrial-strength benchmark that requires a special software configuration.

Personally, I find benchmarking boring. But it’s unavoidable when I need performance…

Photo by Alina Grubnyak on Unsplash

The Diverse Landscape of Graph Analytics Requires a Comprehensive Benchmark

What Is Graph Analytics And Why Does It Matter?

A graph is a good way to represent a set of objects and the relations between them (Figure 1). Graph analytics is the set of techniques to extract information from connections between entities.

Henry Gabb

Senior Principal Engineer at Intel Corporation

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store