ACM ReQuEST @ ASPLOS’18: an interactive dashboard with all the submissions

Breathing life into AI/ML marketplaces

We have grown to like online marketplaces such as Amazon and eBay, where the variety of goods, competitive pricing and honest customer feedback all make for gratifying shopping experience. Given the growing interest in Artificial Intelligence (AI) and Machine Learning (ML) technologies, it is only natural to extend the marketplace concept to products and services in this space.

In fact, AI/ML marketplaces already exist. For example, Amazon Web Services (AWS) provide an to help their customers subscribe and securely deploy stable containerized algorithms and models for popular categories including Computer Vision, Natural Language Processing and Speech Recognition. For example, at the time of this writing, the marketplace 243 free, free-trial and paid packages from 43 vendors. In addition, vendors provide information to help customers guide their choice of model via the product details such as descriptions, usage instructions and pricing per hour for each instance.

Even with the available information, customers may find it daunting to make the right selection from the marketplace. For example, searching for ResNet50, a popular image classification model, returns several choices:

- “GluonCV ResNet50 Classifier” by AWS reaches ”the state-of-the-art-performance with accuracy of 79.15% vs. 75.3% in the original paper”.

- “MXNet ResNet50 Inference” by Intel is “accelerated with MKL-DNN” and is “faster and higher speed”.

- “Xilinx ML Suite” uses f1* FPGA instances.

With the lack of quantitative data available for comparison and the lack of feedback, customers may wonder which choice will meet their particular service quality targets under the given budget constraints? For example, would using FPGA instances be faster and/or cheaper than using CPU instances? Add to this possible questions around retraining via or deployment via , and it is clear that further steps are needed to help customers make informed decisions.

These issues are definitely not unique to AWS or other ML marketplaces. We have been grappling with similar issues in various industrial and academic contexts for many years. If you consider the diversity and ever changing nature of ML models, datasets, frameworks, software environments, hardware platforms and so on, it is hard not to despair and see any light at the end of the tunnel!

Community-driven omni-benchmarking to the rescue!

We believe that the above issues can be gradually solved with help from the community. The idea is simple: make benchmarking so easy that it can be crowdsourced!

To test this idea, we co-organized the in 2018, where we invited the community to submit complete implementations (code, data, scripts, etc.) for the popular . was to put every submitted implementation (artifact) through the . We also created an automated, customizable and reproducible workflow for every artifact to unify evaluation of accuracy, latency (seconds per image), throughput (images per second), peak power consumption (Watts), price and other metrics. We published the unified workflows on and added snapshots to the . Importantly, rather than declaring a single winner or , we decided to plot all the results on a public . Using the dashboard, anyone can apply their own criteria to explore the solution space and look for solutions (e.g. to find the most energy efficient solution achieving the required accuracy).

Working with a , we easily deployed two CK workflows in the AWS cloud: showing ~50x speedups of Intel-optimized Caffe over vanilla Caffe with negligible accuracy loss for 8-bit inference on an , and on a Xilinx FPGA. In addition, a proof-of-concept integration of CK with was developed and at the .

CK workflows naturally help vendors deploy optimized packages in the cloud, but it’s only the beginning. Importantly, running different CK-instrumented packages on different cloud instances generates a wealth of experimental performance data. Finally, predictive models and interactive visualizations distilled from the performance data can completely transform the marketplace experience for customers!

What’s next?

We are only at the beginning of a long-term journey, so we invite you to join our community effort to help bring order to the ML chaos!

For example, we are working on applying the above ReQuEST methodology and modular CK infrastructure to automate and unify the . Due to the broad community interest surrounding ML, we hope to reach a critical mass to create . This would allow researchers to build upon such components in their reproducible and interactive papers while eventually solving the thorny issue of benchmarking techniques across the latest software and hardware. With over 30,000 ML papers to be published in 2019 but only 1 in 7 to come with any means to reproduce them, the time is ripe to disrupt the status quo!

Finally, our approach is not limited to ML or traditional computing systems: we have successfully applied it in several while sharing code and data as reusable CK components.

Please if you would like to know more and join our growing community!

Engineer, researcher and entrepreneur, passionate about optimally using valuable resources such as computer systems… and human talent.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store