Introducing Redpoint’s ML Workflow Landscape

Published in

Memory Leak

5 min readJul 10, 2018

ML is not only a hot buzz word but is becoming an expected component of modern businesses. Companies that want to stay competitive invest in ML infrastructure that supports the data prep, training, serving, and management of ML algorithms. Overarching ML workflow themes include accelerating time to value and production optimization. We delineate sub-category trends below. As with any new market, heightened interest results in a deluge of options for the operator. We categorize ~280 offerings from academic research to open source projects to commercial offerings (both startup and established) to provide a comprehensive picture of the ML workflow landscape. We are excited about innovation in the space and look forward to speaking with startups offering ML-focused solutions.

Businesses adopt ML technology to remain competitive. According to McKinsey, businesses leveraging AI have “an insurmountable advantage” over rivals and 71% of extensive AI adopters expect a 10% increase in revenue. Deloitte predicts the number of ML pilots and implementations will double in 2018 compared to 2017, and double again by 2020. The c-suite’s emphasis on AI and ML will only increase.

Historically we’ve seen advancements in software pressure infrastructure to modernize, generating large new markets. Previously data-hungry analytics applications drove the big data market, which IDC estimates achieved $130.1B in 2016 and will grow to more than $203.0B in 2020, a ~12% CAGR.

We see ML as the same forcing function. These algorithms demand new and improved workflows and infrastructure. IDC forecasts that AI and ML spend will expand from $12B in 2017 to $58B by 2021, a nearly 50% CAGR.

Focusing on the ML workflow layer, Gartner estimates that the data science platform market will increase from $3.0B in 2017 to $4.8B by 2021, a 6% CAGR over the period. As data science platforms represent only one component of the total ML workflow market, we believe the TAM is significantly larger.

ML workflows build on big data pipelines that include data sources, ingest, processing, and data management. ML pipelines also require data prep, training, serving, and model management. Solutions that accelerate a model’s time to value and optimize it in production provide tremendous value. Below we delineate interesting trends across each of the workflow stages.

Data prep: According to CrowdFlower, data scientists spend ~82% of their time dealing with data. Cleaning and organizing data represents the largest chunk at 60% and suggests improvements in this area could have a significant impact. It is not only important to have clean data, but ML algorithms need a lot of it. We see the rise of synthetic data and third-party data sharing platforms to help expand companies’ training data.

Feature engineering “attempts to increase the predictive power of learning algorithms by creating features from raw data that help facilitate the learning process.” We hear most teams perform feature engineering manually but are looking for software to automate the process. Additionally, automation standardizes feature engineering, so features are defined the same way across training and production and in the same language in a single platform. Feature selection can have a large impact on the predictive power of ML algorithms. Autogenerated features need to produce models that match or surpass the accuracy of individuals’ manual work. Ideally software solutions allow domain experts to curate the features.

Training: Training algorithms can often be expensive and take a long time. With local training, the whole model and data can be fit into the memory of a single machine with multiple cores. Larger models and data sets require distributed training, storing the data or model across multiple machines. The data parallelism technique distributes data across multiple machines while model parallelism splits the model across machines. Despite distributed training, our channel checks suggest training still takes too long as ML engineers run training over days or at night. The need to shorten the training period has led to significant investments in ML-specific silicon over the past few years.

Hyperparameter optimization, the tuning of weights to achieve an optimal model, represents a key training step. We’ve heard replacing a probabilistic model with ML is such a significant improvement to begin with that tweaking to improve accuracy from 80% to 85% is nice but not a main pain point. We’ve seen a plethora of hyperparameter optimization offerings and believe these will be embedded in larger platforms.

Serving: It is often challenging to transfer a trained model to production. Sometimes there is a handoff from the developer to operations. It can take weeks for operations to set-up the appropriate production environment. Often teams write a series of scripts, glue code, and ad-hoc commands to solve the problem. Non-standard approaches reduce the frequency models can be pushed to production, slowing development cycles. Additionally, it can be hard to scale an algorithm for production environments that may have more data coverage than training environments. Channel checks suggest teams desire solutions that ease serving, and those that interoperate with Kubernetes are well-positioned.

Model management: Model management is a comprehensive category that includes experiment and production tracking, management, monitoring, and collaboration. Some teams deploy multiple versions of a similar model in production for A/B testing; hence, teams require the ability to compare model accuracy, which is ideally tied to a business KPI dashboard. Deployment metrics like throughput, latency, and cost are crucial. Productionized models’ configuration and environments should be captured so models and results can be reproduced. Collaborators should be able to comment on models and gain visibility across the team.

Interpretability helps explain predictions and enforce that the model will work. Legal or ethical requirements often demand interpretability. Interpretable models help ML engineers iterate to produce better models. For a bad model, interpretability helps debug it to build trust. Depending on the use case, there can be a trade-off between model interpretability and accuracy. Solutions that facilitate interpretability either at a global or local-level present a clear value-proposition for ML engineers.

Our ML workflow landscape highlights ~280 solutions encompassing established vendors, startups, and open source projects. The exhibit categorizes offerings across twenty-two domains. We acknowledge it is challenging to put many of the businesses in only one category as they are rapidly broadening their scope to include additional spaces.

Platforms that originally focused on data scientists are attempting to expand their reach to the newer ML engineer audience, individuals with strong software engineering skills who are involved in software architecture and design and have an understanding of running production systems. Simultaneously, ML-first platforms emerged. Many of the platforms also offer model management so these segments may converge overtime.

Expectations that AI and ML will be part of a modern business are only increasing. Like analytics applications that drove next-generation big data infrastructure, we believe ML algorithms will advance ML pipelines. The total addressable market is growing, and there are numerous trends that can catalyze startups to become enduring businesses.

Introducing Redpoint’s ML Workflow Landscape

Written by Astasia Myers