We have grown to like online marketplaces such as Amazon and eBay, where the variety of goods, competitive pricing and honest customer feedback all make for gratifying shopping experience. Given the growing interest in Artificial Intelligence (AI) and Machine Learning (ML) technologies, it is only natural to extend the marketplace concept to products and services in this space.
In fact, AI/ML marketplaces already exist. For example, Amazon Web Services (AWS) provide an ML & AI marketplace to help their customers subscribe and securely deploy stable containerized algorithms and models for popular categories including Computer Vision, Natural Language Processing and Speech Recognition. For example, at the time of this writing, the marketplace contains 243 free, free-trial and paid packages from 43 vendors. In addition, vendors provide information to help customers guide their choice of model via the product details such as descriptions, usage instructions and pricing per hour for each instance.
Even with the available information, customers may find it daunting to make the right selection from the marketplace. For example, searching for ResNet50, a popular image classification model, returns several choices:
- “GluonCV ResNet50 Classifier” by AWS reaches ”the state-of-the-art-performance with accuracy of 79.15% vs. 75.3% in the original paper”.
- “MXNet ResNet50 Inference” by Intel is “accelerated with MKL-DNN” and is “faster and higher speed”.
- “Xilinx ML Suite” uses f1* FPGA instances.
With the lack of quantitative data available for comparison and the lack of feedback, customers may wonder which choice will meet their particular service quality targets under the given budget constraints? For example, would using FPGA instances be faster and/or cheaper than using CPU instances? Add to this possible questions around retraining via AWS SageMaker or deployment via AWS SageMake Neo, and it is clear that further steps are needed to help customers make informed decisions.
These issues are definitely not unique to AWS or other ML marketplaces. We have been grappling with similar issues in various industrial and academic contexts for many years. If you consider the diversity and ever changing nature of ML models, datasets, frameworks, software environments, hardware platforms and so on, it is hard not to despair and see any light at the end of the tunnel!
Community-driven omni-benchmarking to the rescue!
We believe that the above issues can be gradually solved with help from the community. The idea is simple: make benchmarking so easy that it can be crowdsourced!
To test this idea, we co-organized the 1st ACM ReQuEST tournament at ASPLOS in 2018, where we invited the community to submit complete implementations (code, data, scripts, etc.) for the popular ImageNet object classification challenge. Our goal was to put every submitted implementation (artifact) through the established Artifact Evaluation process. We also created an automated, customizable and reproducible Collective Knowledge workflow for every artifact to unify evaluation of accuracy, latency (seconds per image), throughput (images per second), peak power consumption (Watts), price and other metrics. We published the unified workflows on GitHub and added snapshots to the ACM Digital Library. Importantly, rather than declaring a single winner or naming-and-shaming, we decided to plot all the results on a public interactive dashboard. Using the dashboard, anyone can apply their own criteria to explore the solution space and look for Pareto-optimal solutions (e.g. to find the most energy efficient solution achieving the required accuracy).
Working with a Senior Consultant at AWS, we easily deployed two CK workflows in the AWS cloud: one from Intel showing ~50x speedups of Intel-optimized Caffe over vanilla Caffe with negligible accuracy loss for 8-bit inference on an Intel Skylake CPU, and another one based on TVM on a Xilinx FPGA. In addition, a proof-of-concept integration of CK with AWS SageMaker was developed and presented at the O’Reilly AI conference in London.
CK workflows naturally help vendors deploy optimized packages in the cloud, but it’s only the beginning. Importantly, running different CK-instrumented packages on different cloud instances generates a wealth of experimental performance data. Finally, predictive models and interactive visualizations distilled from the performance data can completely transform the marketplace experience for customers!
We are only at the beginning of a long-term journey, so we invite you to join our community effort to help bring order to the ML chaos!
For example, we are working on applying the above ReQuEST methodology and modular CK infrastructure to automate and unify the MLPerf benchmark. Due to the broad community interest surrounding ML, we hope to reach a critical mass to create open repositories of reusable ML/SW/HW components. This would allow researchers to build upon such components in their reproducible and interactive papers while eventually solving the thorny issue of benchmarking techniques across the latest software and hardware. With over 30,000 ML papers predicted to be published in 2019 but only 1 in 7 to come with any means to reproduce them, the time is ripe to disrupt the status quo!
Finally, our approach is not limited to ML or traditional computing systems: we have successfully applied it in several reproducible competitions on quantum computing while sharing code and data as reusable CK components.
Please get in touch if you would like to know more and join our growing community!