WALTS: Walmart AutoML Libraries, Tools and Services

Published in

Walmart Global Tech Blog

14 min readNov 7, 2022

Authors: Kunal Banerjee, Rahul Bajaj, Sachin Parmar

Automated Machine Learning (AutoML) is an upcoming field in machine learning (ML) that searches the candidate model space for a given task, dataset and an evaluation metric and returns the best performing model on the supplied dataset as per the given metric. AutoML not only reduces the manpower and expertise needed to develop ML models but also decreases the time-to-market for ML models substantially. We have designed an enterprise-scale AutoML framework called WALTS to meet the rising demand of employing ML in retail or any other business of interest, and thus help democratize ML within our organization. In this blog, we elaborate on how we explore models from a pool of candidates and underline how it has helped us with a business use-case.

To give an overview of the AutoML process, its current landscape, and showcase the benefits of WALTS, we will be covering:
· What is AutoML?
· What are the advantages of AutoML?
· Who are the current players in this field?
· What is WALTS?
· What are the advantages of WALTS?
· What is the architectural overview of WALTS?
· How can WALTS help an organization?

What is AutoML?

AutoML takes the pressure off from its users for designing an ML model for her dataset by coming up with a suitable model on its own. As shown in the figure at the top, AutoML typically takes three inputs from the user: (i) a dataset, (ii) the ML task that is to be performed on this dataset, and (iii) the metric that is to be used to determine which model has performed best. For example, the dataset can be ImageNet, the task can be image classification and the metric can be accuracy.

Once given the input, the AutoML tool will explore various ML algorithms — these algorithms may range from classic ones (e.g., linear regression, support vector machine, decision tree) to advanced neural networks. At the end of the exploration, the tool will report the model which performed the best according to the user-supplied metric. This algorithm exploration part, however, is kept as a black box in most of the current AutoML tools.

What are the advantages of AutoML?

There are multiple benefits of adopting AutoML as mentioned below.

1. Democratizing AI: This is perhaps the biggest contribution of AutoML and the primary motivation for conceptualizing it. Now you don’t need to be an ML expert to use ML! With the help of AutoML, anyone can get an ML model that suits their needs without investing into elaborate model development process.

2. Reduced Time-to-Market: This is the second advantage that AutoML brings as described below.

a. Less time spent in development: AutoML relieves its users from the pain of investigating individual ML models and later doing a comparative analysis to pick the best one. Moreover, tuning the hyper-parameters of an ML model is a crucial task to extract the best performance out of it — typically, a lot of time is spent by human model developers in this step; AutoML additionally takes care of tuning the hyper-parameters for each of the model it explores.

b. Less time spent in fixing bugs in the code: Since the models are developed in an automated fashion in the case of AutoML, the chances of introducing bugs in the code is much less compared to human developed models. Consequently, less time is spent in bug fixing and testing the codes.

3. Realizing Operational Excellence: Adopting AutoML may have the following additional positive impacts with respect to operational excellence.

a. AutoML injects more standardization into the generation of models: As mentioned above, since all the model generations in an AutoML follow a standardized process, it is easier to ensure/enforce some policies that an industry may want.

b. Models may be generated keeping scaling in mind: Some of the AutoML tools may be geared with the techniques to develop models which may be deployed at scale. A data scientist, on the other hand, may have to spend additional effort for scaling a model post its development.

4. Reduced Cost: Investing in AutoML may reduce the overall cost of operations for an industry as mentioned below:

a. Less head count needed: AutoML may alleviate the need for hiring data scientists and data engineers to a great extent for industries. Moreover, data scientists have been notorious for being difficult to retain; AutoML can definitely help solve this pain point [1].

b. An efficient model leads to less inference cost: The models developed by AutoML are often more efficient (being built for scale and optimized for underlying hardware) than those developed by citizen data scientists. An efficient model will invariably lead to less inference cost and thus contribute to the company’s revenues.

Who are the current players in this field?

Auto-WEKA [2] that tried to simultaneously search for the optimal learning algorithm along with best performing hyper-parameter values is considered as the first AutoML tool. Its scope, however, was limited to only classification tasks, and the models and the datasets explored were rather small by today’s comparison. However, since then a lot of well-established companies and startups have ventured into this field and made a lot of progress in terms of ML tasks, models, hyper-parameter tuning and more. Some of the big players include Google, Microsoft, Dataiku, DataRobot and H2O.

What is WALTS?

WALTS is an in-house framework of Walmart which expands to “Walmart AutoML Libraries, Tools and Services”. Its algorithm is given below; note that we use Optuna [7] for hyper-parameter optimization.

To understand what the salient features of WALTS are and why these differentiated features would be worth building for any organization, we need to delve into the following question.

What are the advantages of WALTS?

It is fair to ask that given there are already so many AutoML tools, does it make sense to create yet another one. We have looked into some of these off-the-shelf AutoML tools and found some gaps which we wanted to fill with WALTS. Below we list the advantages of WALTS over other similar tools.

1. Customizability:

Customizability is the biggest benefit that WALTS brings. To illustrate the point, let us look at the example below.

Fig. 3: An example to illustrate customizabitlity of metrics in WALTS.

Suppose there is a collection of balls where 80% of the balls are red in color while the remaining 20% is blue. Now consider there is a model which ALWAYS predicts that the color of ball selected out this collection is red. Clearly, this model will have 80% accuracy; however, it’s also obvious that this model has learnt nothing because it makes the same prediction every time blindly. Nevertheless, a user may be tempted to choose this model looking at the accuracy score of 80%, which may seem decent to an inexperienced audience. However, such scenarios of taking erroneous decisions may be avoided by choosing a more appropriate metric. In cases where there is a class imbalance (for example, in this case the number of red balls is much more than that of blue balls), Balanced Accuracy is a better metric choice than the more common one: Accuracy. Balanced Accuracy is defined as the sum of the recall values of each class divided by the total number of classes. In this case, the recall value for red balls is 1 (because we got all the predictions for red balls correct), whereas the recall value for blue is 0 (because we got none of the predictions for blue balls correct), and thus Balanced Accuracy in this case is (1+0)/2 = 0.5 (or 50%). Obviously, 50% is not as good as 80%, and hence a user is less likely to pick this model.

Currently, all off-the-shelf AutoML services that we explored provide a fixed set of metrics with no provision to extend this set by the users. Some of these, at times, can even be misleading, e.g., accuracy when the classes are imbalanced as explained above. Based on customer requirements, we may easily introduce new metrics, such as balanced accuracy, which is better suited for imbalanced classes.

2. Transparency:

Transparency is the second most important benefit that WALTS brings. We again illustrate the point with the following example.

Fig. 4: An example to illustrate transparency offered by WALTS.

Let us consider the case where we provide a dataset to an AutoML tool, and we are interested in the classification task and choose accuracy as the metric to decide the winning model. Also, let the best performing ML model gives an accuracy of 82% while taking 100ms for each prediction. The second-best ML model falls short by 2% in accuracy in comparison with the best model, but it takes half the time i.e., 50ms, for each prediction.

Now, what if there are some hard latency requirements for the client?

In such cases, the client may be happy with the second-best model if it fits her latency needs.

Typically, the AutoML frameworks report only the winning model. WALTS, in contrast, provides details of all the explored models and their performance. Thus, this kind of support where the client may be willing to make a trade-off between different performance metrics of a model, can only be given by WALTS.

3. Low Precision Models:

Some of the readers may not be familiar with the term “low precision models”, hence we explain this term with the following illustration borrowed from Wikipedia.

Fig. 5: Difference between IEEE single-precision (FP32) and half-precision (FP16) datatypes.

Conventionally, when we train an ML model, the datatype that we use to represent real values is “float” — according to IEEE754 standard, we use 32 bits to capture a floating-point number; these 32 bits are distributed into sign (1 bit), exponent (8 bits) and fraction (23 bits). For convenience, IEEE754 single-precision datatype is abbreviated as FP32.

By low precision datatype, we mean any datatype that uses less than 32 bits to represent a real number, and by low precision model, we mean any model that uses a low precision datatype to compute and store its parameters. IEEE754 half-precision datatype (FP16) uses 16 bits to represent a real number as shown in the figure above. The idea of using low precision models was proposed in [3].

Using low precision models leads to reduced memory footprint and reduced latency. For example, in Nvidia V100, peak FLOPS is 15 TF whereas peak OPS (in FP16) is 120 TF [4] — thus there is a scope to expedite your workload by 8x if one uses FP16 instead of FP32; similarly, in Nvidia A100, peak FLOPS is 19.5 TF whereas peak OPS (in FP16) is 312 TF [5] — thus the scope has increased to 32x for this hardware upon adopting FP16. It may be noted that there are other low precision datatypes in A100, e.g., BFLOAT16 [6], which may provide equal or even more speed ups. From our experience, other AutoML frameworks produce models with FP32 datatype only, and one needs a separate tool, e.g., TensorFlow Lite, to do the subsequent conversion to low precision. On the other hand, WALTS produces low precision models based on the available hardware, e.g., models with FP16 datatype if Nvidia V100 is provided.

4. Scope for Unique Enhancements:

WALTS being an in-house framework, we can easily extend it to include some unique enhancements. One such example is described below.

Fig. 6: An example to illustrate how sparsity may be handled in WALTS.

It is quite common to come across tabular datasets with missing data. One approach to handle such datasets is to drop the rows with missing data and then train a model on it; another approach can be to intelligently impute the missing data with probable values and then train a model on it. The latter approach may be preferred if the level of sparsity in the data is considerable. We found that the AutoML tools favor dropping rows with missing entries — we intend to alleviate this shortcoming with our intelligent data imputation algorithm. There are some other unique enhancements that we are exploring such as label correction in the presence of noisy data, to make WALTS a superior offering.

5. Data Sensitivity:

To comply with government policies, a company may be prevented from uploading sensitive data to the cloud or sharing it with third-party vendors. To develop ML model for such datasets, we may use WALTS running on on-premises devices.

6. Reduced Time-to-Explore:

Commercial AutoML tool’s complex search algorithms may find ad-hoc architectures at the cost of trying out a combinatorial number of different parameters (layers, connections, etc.). Contrarily, WALTS follows a simple policy of exploring only a pre-defined set of models which may be less time consuming. Moreover, the models explored by WALTS being already popular, the clients may be more open to adopting these.

7. Cost Efficiency:

The commercial AutoML tools have the concept of minimum charges although you may be using their service for a few minutes. We, however, do not plan to impose minimum charges for WALTS and follow a pay-per-use policy.

What is the architectural overview of WALTS?

The architectural overview of WALTS is given below.

Fig. 7: Architectural overview of WALTS.

As shown in this figure, the training jobs are triggered via the user interface of WALTS Services that takes the usual parameters — an ML task to accomplish, a dataset and a metric, as inputs. Additionally, we supply a predetermined configuration (which may be customized by the expert users) that includes the maximum number of epochs to train for, the compute power (i.e., the number of CPUs, GPUs), output file location, etc. To achieve reliability, efficiency, and scalability, we are using Kubernetes [8] and Airflow [9] configured with Kubernetes executor. Every experiment in a training job runs as a containerized application on Kubernetes pod as an Airflow worker. The training requests with parameters (e.g., task, models to explore) are converted into an Airflow pipeline (also referred to as Directed Acyclic Graph or DAG) where the nodes correspond to the experiments. Each of these nodes are configured with a custom Git operator which sets up the container by checking out the predefined training code with AutoML algorithms from the Git and installs the libraries required to execute the code. During execution, it runs several algorithms in parallel depending on the availability of resources. The execution of each experiment happens inside a pod that follows a predefined configuration. These Kubernetes pods get created on the fly and this gives us the capability to handle burstable workloads. As the system is multi-tenant and there may be jobs running concurrently, there can be cases where a single job/tenant may try to over-consume the resources. To avoid such scenarios, the system needs to put an upper threshold on the consumption of resources by a tenant. The resource limit per tenant is controlled using the namespace level ResourceQuota in Kubernetes. In case of resource unavailability for a training job for a tenant, the jobs will be queued and will start execution as soon as the resources get added for the tenant or already running experiments complete their execution and release the resources. The Kubernetes infrastructure is shared by all the tenants, and we rely on the Kubernetes cluster auto-scaler for scalability. Sometimes the system experiences a greater load, in such cases, Kubernetes can automatically increase the cluster nodes and deploy pods as necessary to handle the increased demand. When the load decreases, Kubernetes can then reduce the nodes and pods dynamically. Once an experiment is done, all the performance logs are saved in Azure SQL database which can be easily retrieved through queries submitted by the users — this is typically done by them to compare the metrics of various experiments; other artifacts such as models are stored to GCS buckets in Google Cloud Platform. The storage is managed by MLflow [10] that also acts as a bridge for the user to retrieve the desired logs and artifacts.

How can WALTS help an organization?

WALTS can be used to explore and recommend a suitable ML model for many problems within an organization — let it be a classification or regression or entity recognition problem; furthermore, the data may be structured (tabular) or unstructured (image, text). In Walmart, for example, it has been used for spam filtering and dispute categorization for our sellers among others. Our client teams have acknowledged the following contributions that WALTS has made in their operations:
· Improved productivity and increased scalability.
· Efficient resources and infrastructure management.
· Faster experimentation and shorter go-to-market time.
· Bridging the skill gaps and reducing the scope of error in applying machine learning algorithms.

In fact, the development time was reduced by up to 2 months for some teams. A detailed experiment where we compared nine models developed by WALTS and those exact models coded by human experts, revealed that the minimum reduction in development time for a given model was 7.4% while the maximum reduction was 15.1%; the average reduction across all models was 10.4%. We expect similar savings in development time and enhanced performance for other use-cases as well.

Conclusion

For Walmart, there is a huge potential to provide superior customer experience and enhance our operations by leveraging the power of machine learning. AutoML can help a lot in handling the exponential growth in data with minimal intervention from humans. WALTS aims to be a one-stop shop for training popular machine learning models to perform generic tasks such as classification, regression, entity recognition, etc. We plan to add more tasks and models in future based on our user demands. Instead of trying to reinvent the wheel, the teams can leverage WALTS to build fast prototypes with minimal effort. It will help bridge the skill gap, reduce bugs/errors, and save resource/infrastructure costs. WALTS uses state-of-the-art optimization to train the models faster — thereby reducing go-to-market time. We are already seeing its benefits for various Walmart business use-cases such as spam detection and dispute categorization for Walmart sellers and hope to see its further growth with time.

References:

[1] Dataiku, “The importance of AutoML for augmented analytics,” Tech. Rep., 2020. Available: https://pages.dataiku.com/hubfs/PDF/Whitepaper/Importance_of_AutoML-for-Augmented-Analytics.pdf

[2] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Auto- WEKA: Combined selection and hyperparameter optimization of classification algorithms,” in KDD, 2013, pp. 847–855.

[3] P. Micikevicius, S. Narang, J. Alben, G. F. Diamos, E. Elsen, D. García, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, H. Wu, “Mixed Precision Training,” in ICLR, 2018.

[4] Nvidia, “Nvidia Tesla V100 GPU Accelerator,” 2018. Available: https://images.nvidia.com/content/technologies/volta/pdf/tesla-volta-v100-datasheet-letter-fnl-web.pdf

[5] Nvidia, “Nvidia A100 Tensor Core GPU,” 2020. Available: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet.pdf

[6] D. D. Kalamkar, D. Mudigere, N. Mellempudi, D. Das, K. Banerjee, S. Avancha, D. T. Vooturi, N. Jammalamadaka, J. Huang, H. Yuen, J. Yang, J. Park, A. Heinecke, E. Georganas, S. Srinivasan, A. Kundu, M. Smelyanskiy, B. Kaul, P. Dubey, “A Study of BFLOAT16 for Deep Learning Training,” in CoRR abs/1905.12322, 2019.

[7] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next- generation hyperparameter optimization framework,” in KDD, 2019, pp. 2623–2631.

[8] Kubernetes, “Production-grade container orchestration,” 2022. Available: https://kubernetes.io/

[9] Airflow, “Apache airflow,” 2022. Available: https://airflow.apache.org/

[10] MLflow, “An open source platform for the machine learning lifecycle,” 2022. Available: https://mlflow.org/