3 Common Problems With Your Machine Learning Product and How to Fix Them
Unlike traditional products, the launch of ML/AI-driven features or products is just the start of a Product Manager’s role. ML models introduce a dependency on data (see our previous post), as well as some level of uncertainty in outcomes. In the real world, this can lead to a number of undesirable outcomes that a Product Manager needs to anticipate and mitigate. In this post, we will discuss how to protect your ML/AI product from time, from adversaries, and from itself.
Protect your ML/AI product from time
The fact that ML learns patterns from data should give a Product Manager pause and lead to the logical follow-up question: What if those patterns change? We call this general problem “model staleness”, which refers to the predictive power of an ML model decreasing over time, as trends or tastes change. Take for example a product that suggests fashion items to shoppers. Due to the short-lived nature of fashion, the product’s suggestions may become irrelevant quickly (although, if you wait long enough, any fashion trend may have an unexpected revival).
To remedy model staleness, you will have to refresh your model over time to allow it to learn from new data. Below is a graph that shows how you can avoid model staleness over time by refreshing the model with new data periodically.
However, there are two important considerations when going through a model refresh: 1) Not all ML/AI products are susceptible to staleness in the same way, and 2) refreshing models can introduce unwanted feedback loops.
- How worried you have to be about refreshing your models depends on how “stationary” your ML problem is. A product that distinguishes cats from dogs is relatively immune to model staleness. The evolution of cats and dogs is not fast enough to render your model useless over the next few years. However, if you are building a product that recommends content to your users (like videos, or fashion items), or one that detects fraud, these types of problems change rapidly and ML models need to adapt.
- Refreshing models means that you update your ML model with new data. Those data can sometimes be influenced by the model itself. If your model recommends videos, and you refresh your new model based on the clicks on videos that the model recommended, it will not learn how to recommend new videos. In these cases, it is important to consider how to break out of these feedback loops by diversifying the recommendations. To give just one example, a recommendation system should not only show content based on previous click patterns but also include diverse and fresh content based on other criteria.
Protect your ML/AI product from adversaries
Adversarial actors, like their attacks, come in all shapes and sizes. To broaden the list of concerns, bad intent is not always necessary to induce a negative effect on your ML product. Three high-level categories of adversarial behavior to consider are theft of your ML models, taking advantage of feedback loops, and coercing your ML models into bad behavior.
Theft of your ML models
If you expose your ML model directly to users (e.g. a model that allows users to predict housing prices by entering information about a specific property), there is a way for adversaries to steal the ML model through observing input and output combinations.
Most products do not directly expose their ML models to their users, but if you think there’s a chance, have your engineering team start with this paper and think carefully about protections. One way of protecting against such attacks is rate limiting (i.e. not allowing a single user to make enough prediction requests to steal a model).
Taking advantage of feedback loops
The most common and potent way to influence an ML model’s behavior is to take advantage of feedback loops. Above I described how feedback loops have to be taken into account when refreshing models. They also have the potential to allow adversaries to negatively affect the user experience by tilting the balance in their own favor.
Returning to the video recommendation example, if your ML model recommended videos based on how frequently they are watched after another video, an adversary could bias the model in their favor by compensating many people to navigate to their video directly from other popular videos.
Solving the “false traffic feedback loop” problem requires even more ML: Work with your fraud team to differentiate fraudulent or spammy clicks from “authentic” clicks, and block them (or, at least, don’t include them in your training data).
Coercing your ML models into bad behavior
A well-publicized example of ML-gone-wrong is a chatbot that was trained through online conversations and learned from its interactions once it was launched. That bot learned that profanity was a common way to express feelings, and to more generally parrot the language and points of views of its conversation partners.
The lesson for Product Managers is simple: Your ML models will learn from data, if the data are bad, your models will be bad. As a result, you should always put safeguards in place. The general concept we use for such safeguards is a “policy layer” that sits on top of an ML model (or system) and enforces a well-defined policy. Examples include:
- Prevent reinforcing bad habits: Following the chatbot-gone-wrong example from above, it is important to identify and prevent bad habits (e.g. profanity and gender pronoun biases) that may be apparent in the training data. Adversaries can intentionally coerce an ML model into bad habits by providing bad training examples. One way of dealing with this is to run any output of an ML model through a policy layer, like a “bad language detector” (yes, more ML!), before showing it to users.
- No out-of-bounds inputs or predictions: In cases where models take user provided input, unexpected values for those inputs could lead to out-of-bounds predictions (e.g. a model that predicts housing prices could predict a negative value if you specified the number of bedrooms as zero). Depending on the criticality of the model and its output, you may want to validate the inputs to be in a well-defined expected range, or cap the outputs of the model (e.g. to never allow a housing price predictor to produce a negative prediction).
- No errors: This may seem obvious, but in some cases, ML models return errors that should certainly not be communicated to your users. These errors could provide technical details to your users that could be taken advantage of. The general advice here is to fail gracefully. If a model returns an error or an otherwise useless prediction, one of your key roles as a Product Manager is to find a way for your product to still provide value to users, or at least be the least disruptive to the user experience. For example, if your model which provides video recommendations fails, you can still use heuristics like showing the most popular videos or the most co-watched videos.
Protect your ML/AI product from itself
ML models blindly learn to optimize a given metric based on the data they observe during training. By now it should be well known that, if those data are flawed, the models will be flawed. As a Product Manager, your job to safeguard your users from mistakes and biased predictions.
- Error analysis: Before launching any product powered by ML/AI, you need to spend considerable time doing error analysis. Note that this is different from the technical errors described above. Error analysis refers to instances where a model makes a prediction that is wrong or unexpected, like a false positive. In short, you should systematically investigate the cases where your model makes the worst predictions and either take steps to improve your model or to fail gracefully.
- Tuning thresholds: Let’s say your product is an email spam classifier. The confidence level at which it classifies something as spam is tunable. If it is set too low, it will classify too many emails as spam, resulting in actual emails in your spam folder (false positives). If it is set too high, it will classify too few emails as spam, resulting in spam emails in your inbox (false negatives). It is your responsibility as a Product Manager to assess which one is worse (false positive vs false negative) and which side to err on, if one can be prioritized over the other.
- Exploration vs. Exploitation: In the video recommendation example, we expect more highly-ranked videos to be clicked more often, but they were also placed more highly on the list precisely because they were expected to be more popular. There is a balancing act between having a robust model which is able to make accurate predictions, while periodically adding in some variance (for example, sampling from farther down the recommended list), to ensure the model isn’t stuck in a feedback loop.
- Biases: In cases where your data may be biased, there is no single method or technique that will ensure fairness. Take this topic seriously and read through Google AI’s Responsible AI Practices for guidance.
I hope I have successfully convinced you that the ML/AI product launch is only the beginning of the PM’s job. The update cycle of traditional products is driven by user expectations for new features and software release cycles. The added dependency on data and the probabilistic nature of ML/AI predictions lead to a much higher-frequency cycle of updating incremental learning systems. In theory, every interaction with a user provides an opportunity to update the ML model, impacting your product.
These specific challenges need to be addressed by Product Managers, including protecting models from going stale, anticipating adversarial actors, and making sure that ML models themselves keep behaving the way they should. Stay tuned for future posts!
Clemens Mewald is a Product Lead on the Machine Learning X and TensorFlow X teams at Google. He is passionate about making Machine Learning available to everyone. He is also a Google Developers Launchpad mentor.