How to Benchmark ANN Algorithms

An investigation into the performance of various approximate nearest-neighbor algorithms.

Braden Riggs
GSI Technology

--

By Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Approximate Nearest-Neighbors for new voter party affiliation. Credit: http://scott.fortmann-roe.com/docs/BiasVariance.html

Introduction:

The field of data science is rapidly changing as new and exciting software and hardware breakthroughs are made every single day. Given the rapidly changing landscape, it is important to take the appropriate time to understand and investigate some of the underlying technology that has shaped and will shape, the data science world. As an undergraduate data scientist, I often wish more time was spent understanding the tools at our disposal, and when they should appropriately be used. One prime example is the variety of options to choose from when picking an implementation of a Nearest-Neighbor algorithm; a type of algorithm prevalent in pattern recognition. Whilst there are a range of different types of Nearest-Neighbor algorithms I specifically want to focus on Approximate Nearest Neighbor (ANN) and the overwhelming variety of implementations available in python.

My first project with my internship at GSI Technology explored the idea of benchmarking ANN algorithms to help understand how the choice of implementation can change depending on the type and size of the dataset. This task…

--

--

Braden Riggs
GSI Technology

Australian Data Scientist/Enthusiast | Developer Advocate@ Dolby.io | in/briggs599 | @BradenRiggs1