Leveraging Azure ML and Streamlit to Build and User Test Machine Learning Ideas Quickly: From Idea to Production in a Day

Published in

Henkel Data & Analytics Blog

16 min readJul 30, 2024

What does it take to build a really good data science product? How can you leverage your data science talent as efficiently as possible to provide user value quickly? How can you build, deploy, and get user feedback — in a day? In this article, I take you through the learnings of Henkel’s data science and MLOps teams to open up rapid experimentation opportunities in data science, using Azure Machine Learning and Streamlit. This article summarizes my talk at PyCon DE & PyData Berlin 2024.

What to build? A Minimum Viable Product!

When you try to generate value fast, when you just have a single day, one of the most important questions to answer is what to focus on. If you build, what should you build? Let’s assume that value here means value to someone, to an actual user. At Henkel, this could for example be chemistry experts in our research laboratories. While value to an individual user is a start, significant value is often only achieved when a product is scaled to provide values to many users. For a nuanced view on value, also check out the article “Business Value in Agile Organizations” .

When you have built and rolled out a product previously, you know that scaling a product over a large user base is hard. From my experience, it is a task too ambitious to aim for in a single day. Instead, it makes more sense to focus on a small niche, maybe a single user, but to build out the product development process so that it is ready to serve the needs of multiple users when the solution gains traction and is ready to scale.

This is where the Minimum Viable Product comes in. A Minimum Viable Product is a product that has enough features to be usable and can collect user feedback. A minimum viable product is focused on learning about user needs. This means that there is no space for anything that does not help the user succeed.

Value Creation Through the Build-Measure-Learn Loop and the Data Flywheel

For data science use cases, two main process concepts can help kick off a great product: The build-measure-learn loop and the data flywheel.

Concepts for Developing Minimum Viable Product for Data Science Use Cases, showing the build-measure-learn loop and data flywheel patterns

The outside ring on the diagram is the build-measure-learn loop. This is a concept popularized by Eric Ries’ Lean Startup Methodology. Its purpose is to focus your efforts on building products that fill actual user needs. In addition to the “build, measure, learn” actions, the loop also includes code, data, and ideas as assets. Note that you do not necessarily need code to work through the “build-measure-learn loop”, many ideas can be tested without it. Hopefully, before you touch any code, you have already learned enough to be confident that to get to the next learning from user feedback, you need an actual code asset.

The central idea of the build-measure-learn loop is that you, based on ideas you have gathered about what the user wants, build out a code asset. You then deploy it and get feedback; you measure how the user feels about the value the product provides. You can also say: You user test your solution. You go through the loop relatively fast, so, let’s say, in 2-week sprints (cycles). In data science, this often also involves machine learning model evaluation metrics, like prediction errors, and so on — if this is where your product is set to provide value. With these data, you can then learn about what you should improve to provide even more user value. This is about discovery. In the idea phase, you focus on how to provide value. When you have generated a new idea, you build it out, and the loop continues, iterating in equally fast cycles.

Developing a data science product along the build-measure-learn loop maximizes your chances that the product you are building helps users succeed. Using the build-measure-learn is often the only way to really find out what provides value — which is, in my opinion, just what the definition of a great product is. What else to watch out for when building a product in a day? You need to focus on implementing a way to collect and review user feedback. If you do not collect feedback, you cannot learn about user needs and pain points.

There is a second powerful concept for building data science products: The data flywheel. The ideas at the core of the data flywheel are about continuous improvement and organic growth: Happy users invite additional users to use the tool. Users bring or generate data for the data science asset (for example, a machine learning model) at the core of your product. This leads to the model getting better at serving user needs over time, leading to more happy users who then recruit their friends and colleagues to use the tool, and so on. Building along the data flywheel will help your product scale over time — a requirement for eventually creating large-scale business value.

A Time-Saving Stack for Building in One Day

From a slide deck, it sounds amazing: We focus on user needs, collect and learn from user feedback, scale, and infinite business value awaits. Unfortunately, in practice, data scientists often struggle with several time sinks. What should be easy, is hard and takes time.

This is where a stable and mature technology stack is valuable: Through simple, streamlined, and well-documented interfaces, we take a lot of implementation pains away. We can build and deploy quickly — so that we can get user feedback, learn, and provide value quickly.

Technical challenges around building a data science product and tools for overcoming them

The figure above shows typical challenges around the data science product development journey. As you progress through the data flywheel and the build-measure-learn loop, you can encounter several bottlenecks, preventing you from completing any of the two loop as fast as possible. Let’s discuss these challenges and technical solutions to these challenges next. We will go through environment and collaboration issues, issues about getting lost in modeling, inappropriate user interfaces, and missing feedback about use. All of the solutions to these challenges presented here are based on our experience in building data science products.

But before we dive in, you may ask yourself: Isn’t there a giant challenge missing? Correct! We will not talk about the biggest challenge in data science: When you have no or only insufficient data. Why don’t we talk about it here? We don’t because this is not a challenge that, from our experience, can be solved in a day. When you build a data science product in a day, you should collect the necessary data before you start. This being said, let’s look at solutions for all the other problems.

Environment issues are issues related to Python environments. In many hackathons, our team has lost valuable time trying to install some libraries on different kinds of operating systems, with different package managers. There are many potential points of failure. Package builds can take a long time and need multiple tries. For building in a day, you should avoid this time drag.

Even worse, issues can pop up when collaborating with other data scientists: To go fast, we should all work on the same code base. If we don’t, it will be very time-consuming to merge our work and we might not make it in a day.

Both for environment and collaboration issues, the cloud machine learning platform Azure Machine Learning is a good solution. In Azure Machine Learning, there are predefined environments that you can share with your colleagues. You can also create custom environments. Through integration with Azure DevOps, you can create git repositories from and into which all team members can read and write. Having this managed solution effectively helps to circumvent environment and collaboration issues.

One of data scientists’ favorite time drags is to get lost in modeling. Modeling is a lot of fun and tuning hyperparameters to improve machine learning performance bit by bit is very satisfying. By changing inputs, we change outputs and get immediate feedback from our model evaluation pipeline about whether the model has improved.

Unfortunately, getting lost in data and algorithms early in the product development process is time not well spent. Often, a very simple machine learning model is a good starting point. The user experience counts — how good is good enough to get user feedback? Is being better than a random pick already valuable?

Do you want to avoid losing time in modeling? Use automated machine learning as a first approach!

Azure Automated Machine Learning offers a suite of machine learning modeling pipelines for solving the most common data science tasks, like classification and regression. They are easy to use and tend to provide a reasonable first model in a short amount of time.

Coming back to the figure above, one of the things data scientists often struggle most with when building an MVP is that their solutions have an inappropriate user interface, as a way to provide interaction and experience with the product. What do I mean by that? At Henkel, I work with a lot of excellent chemists. But if I made my machine learning model available to them as an API, many would not be able to use it. Most chemists are used to clickable user interfaces, only a few know how to code. This is not their job, after all, but, I would argue, that it is your job as a data scientist to make your work usable. So, how can you bring your machine learning model to light and serve it to end users? The Streamlit framework makes it relatively easy and extremely fast to program good-looking web applications with Python. This is the type of interface many “non-IT” end users will be familiar with and therefore a great way to make your work accessible.

Finally, I know that for most early data science products, implementing a way to get user feedback is not on top of the agenda. Many early-stage data science products provide no feedback about their use. Earlier, we learned that to build a product users love, it is important to get feedback, completing the build-measure-learn loop. In experimenting with building data science products on the Azure platform, I found that there is an exciting solution for collecting user feedback through leveraging Azure Application Insights and Streamlit. Through this solution, you can even display feedback about the application’s use, such as when a model inference is made, and feedback directly by the user, in a live dashboard. But, be aware:

The absolutely easiest and cheapest way to implement a user interface in an early data science product phase? You are the interface! Show your customer your modeling work output, listen to their feedback, and get from output to impact quickly!

Application Example: Building a Trash Recognizer App

To get hands-on with the stack, let’s now focus on building out an actual application using the concepts and stacks presented above. Note that in this article, I only give you a high-level overview and point out important highlights. Well-documented code, detailed instructions, and more information are available in the GitHub repository. We can subdivide the build process into 5 phases:

Get data
Train model
Build app
Deploy app with model
Collect feedback

The app we want to build fulfills the following user needs: Imagine you work at a recycling facility and you want to evaluate a machine learning model for recognizing different types of trash. In the app, you should be able to upload one or more photos and manually inspect the objects that the machine learning model recognizes as trash. Summing it up in a “user story” (a concept in Agile):

As a recycle facility persona, I want to recognize different types of trash in order to better separate trash for recycling.

A picture says more than a thousand words, so here is a screenshot of the app:

Screenshot of the trash recognizer app we want to build (note how the model is far from perfect, but the app is usable and users can give feedback!)

On top, the user can upload one or more pictures. At the bottom, you see the predictions of the machine learning model. The screenshot shows that, yes, the app successfully recognizes an object: The cat! However, the app classifies the cat as trash, which is incorrect. Nevertheless, the computer vision model already recognizes objects, the app is usable, and users can give feedback! You can improve anything later, but the goal is to get this app built and in front of users as soon as possible.

Getting Data Into Azure ML

Before training a model to recognize trash in Azure Machine Learning, we first need to load data onto the platform. In the case at hand, we use the TACO dataset, an open-source image dataset of trash in the environment.

Important concepts and flow of getting data into Azure Machine Learning

In this blog, we have written about how to get data into Azure ML previously, so I will spare you the tiny details. You can find the data preparation notebook in the GitHub repository. However, if this is your first time looking into Azure Machine Learning, here are some general concepts to remember.

Data is stored in Azure ML as “Data Assets”. Data Assets are not the actual data files — the data files are stored on a blob storage (or other type of storage) attached to the Azure Machine Learning workspace. Data assets are like masks for the actual files. They point to the files, but they also can include metadata, like license information or notes you want your colleagues to know about. You can create data assets either through code (see the notebook), or via uploading data through the Azure ML web interface.

Training a Model on Azure ML

Once you have uploaded data, you can now focus on training a model with Azure Automated Machine Learning. The power of automated machine learning lies in its easy setup and included hyperparameter optimization. You can set up automated machine learning through code or the Azure ML web user interface. Depending on the machine learning problem, Azure ML offers different model types that it will automatically try in hyperparameter optimization. (Hyperparameter optimization means adjusting the inner workings of a model so that the model delivers the desired performance.) The process for training a model via code is straightforward. You can explore it in the notebook on GitHub and find an overview in the picture below.

Submitting a machine learning job in Azure ML

Although automated machine learning sounds very hands-off, you have some control over how Azure runs the automated machine learning job. Knowing how to use this control is powerful, especially when you try to go from idea to production in a single day because you can get the most out of the little time you have.

First, you should know that you can set parameters for the machine learning job based on your domain knowledge. So, for example, if you know that a specific hyperparameter selection strategy will work well for your problem, or if you know that certain ranges of hyperparameters work better than others, you can limit Azure ML to optimize in these ranges only. Second, you should set job limits. For example, you can set a maximum runtime for the machine learning job or a maximum number of trials. This can help you finish your training job on time.

Once the training is finished, you have a model! The model is packaged in MLFlow format — a universal open-source format for packaging machine learning models. But, unfortunately, when packaging the model, Azure ML pins over 200 dependencies down to their subversion (like pandas 2.2.2) into the MLFlow model file. With this tight dependency restriction, my team has found the model impossible to run outside the Azure cloud environment, for example on machines running macOS.

Enter ONNX (Open Neural Network Exchange)! As a byproduct of automated machine learning training, along with the MLFlow model, Azure produces a so-called “ONNX” file. The ONNX file format is an open standard for deep learning models and helps to run and evaluate such models across different frameworks. Especially of interest for us is that in Python, an ONNX-packaged model can be loaded with a single dependency: ONNX Runtime. We have the open-source community and their generous sponsors to thank that we can circumvent any environment issues when using our deep learning models now.

Building a Streamlit App to Serve the Model

In the beginning, I was a skeptical Streamlit user because I had found that, at the time I switched, it worked much differently from the Python app-building frameworks I was used to (Plotly Dash and Voilà at the time). The main difference is that, in Streamlit, an app often re-runs from top to bottom of the Python file when a user has executed an action, for example, clicking a button. This took a little to get used to.

However, in my team, we have gotten to know Streamlit as an extremely quick way of developing user-facing data web apps. While these apps might perform slower and with a smaller feature set than most professionally developed web apps, they are relatively easy to program and deploy — and are therefore great for building Minimum Viable Products.

Building a Streamlit app can be as quick as in the figure shown below. But if you want to get to know the code to build the Trash Recognizer app, including all the code to load and serve ONNX models, you should have a look at the repository.

A simple Streamlit app and its output in the browser

Deploying a Streamlit App on Microsoft Azure

Especially in corporate environments, special care needs to be taken to deploy any data-interfacing apps securely. Since this is not a trivial task, in-depth experience with the Azure platform is needed. Our team has managed to solve the deployment challenge and detailed it in this blog article for you to read.

It is worth noting, that the machine learning model is deployed alongside the app via the ONNX runtime Python module described above. After model training on Azure ML has been completed, you need to download the model ONNX and place it in the project file tree. I have written custom code to use the ONNX file for inference in the deployed app.

Collecting Feedback

We would not be able to measure and learn without collecting feedback. So, let’s get to it! There is an exciting pipeline of tools that you can use to ask the user for feedback and live plot it in a dashboard.

With streamlit-feedback, Python logging, Azure Application Insights, and Azure Dashboards you can create live dashboards from user feedback

Streamlit-feedback is a community-built Streamlit plugin that displays thumbs up/down buttons in a Streamlit app and lets users also add a comment with regards to their feedback. You can see an example of it in action in the Trash Recognizer screenshot above. Once a user submits feedback, streamlit-feedback can execute a callback function.

In this callback function, you can use the Python logging module to log the user feedback (see code here and the Azure documentation for more details). For Azure to use the logged data for building a dashboard, the data needs to be sent from your deployed Streamlit application to the Azure Cloud. There is an interesting way to do this: Via Azure Application Insights.

Azure Application Insights can be linked to Python via a custom handler to Python’s logging module (see code). The custom handler will then stream the app-internal logs to the Azure cloud. Credentials and connection details are set up via a connection string.

Once the data is in the cloud, you can use Azure Dashboards to plot different metrics. One thing I found a little challenging is that Azure Dashboards using a query language called Kusto to query the logs and yield the input data for the plots. It is a query language most data scientists might not be familiar with — but that does not stop the show.

Different types of plots for visualizing feedback (use orange numbers to find code for generating plot, see text)

For example, the Kusto query behind the first plot which shows the no. of model loads per week is:

traces
| where message contains "model loaded"
| summarize model_loaded = count() by week=bin(timestamp, 7d)
| sort by week desc

The GitHub repository contains queries for all plots shown above. It also includes detailed instructions on how to set up a feedback dashboard on Azure.

Conclusion

You now know what it takes to build and user test machine learning ideas in a day: A focus on building what has user value, a Minimum Viable Product. You want to integrate an opportunity to collect feedback, so you can leverage the build-measure-learn loop to continuously improve your product. To set yourself up for successful scaling, you build your application so that it can grow with a data flywheel.

In this article, we covered both procedural and technical concepts to help you unlock user value with data science quickly.

On the technical side, you embrace the Azure platform for low-friction collaboration and use Azure Automated Machine Learning to get to a first model as quick as possible. Users will be able to interface with your model through a Streamlit-built frontend, while you can securely deploy it. Azure Dashboards with feedback collection help you gain insights about your users fast, setting you up for focusing on building out features that are important to transform users into enthusiastic fans of your product.

Looking better! After users complained about the cat being recognized as trash, training the model with additional data improved the model. The box the cat is sitting in is paper trash and belongs into the blue trash can. Well done, model!

The only thing missing now to fully make the build-measure-learn loop work for you is a culture that enables you to continuously learn from user feedback. At Henkel, we are enabling this culture through upskilling in agile ways of working, collaborating in cross-functional teams, a focus on feedback, and product thinking.

Whether shampoo, detergent, or industrial adhesive — Henkel stands for strong brands, innovations, and technologies. In our data science, engineering, and analytics teams we solve modern data challenges for the benefit of our customers.
Learn more at henkel.com/digitalization.