Questions Answered — Shadow Deployment In Machine Learning

Demetrios Brinkmann
MLOps.community
Published in
6 min readSep 28, 2020
Please Lord don't let this break everything

This is a thread taken from our MLops community slack. With so many different ways of deploying a model, it helps to get some background info around these terms.

Let us know what you think in the comments or jump on slack and voice your opinion there.

David Aponte

https://alexgude.com/blog/machine-learning-deployment-shadow-mode/ simple explanation of shadow deployments

Machine Learning Deployment: Shadow Mode

Deploying machine learning models is hard; Shadow Mode is one way to make testing a little easier.

https://alexgude.com/files/shadow-mode/bricks_at_mit.jpg

Hugh Reid

How does this differ from common DevOps [Blue Green](https://martinfowler.com/bliki/BlueGreenDeployment.html) methods? And what happens to the output from the new model, is it just discarded, how do you show what happens to the downstream processing with the new model — i.e. the full business outcome, not just the model?

bliki: BlueGreenDeployment

Blue-green deployment allows you to upgrade production software without downtime. You deploy the new version into a copy of the production environment and change routing to switch.

https://martinfowler.com/bliki/images/blueGreenDeployment/blue_green_deployments.png

Laszlo Sragner

is it just discarded

Good question, if no user can be influenced by the effects of the model what’s the point in running the model in production.

Hugh Reid

If there was a big change to the model then you might gain ops insight into memory or performance, but for incremental changes a FF based approach would work better. https://launchdarkly.com/use-cases/

Feature Flags (Toggles) Use Cases | Feature Flags as a Service

Feature flags (toggles) are powerful control points in your code. Learn about the primary feature flag use cases and LaunchDarkly’s feature flags as a service.

Laszlo Sragner

The amount of engineering resource that requires these implementations should be aligned with the perceived benefits. These are multi sprint problems for engineering, they essentially need to create entire new features. I’d think the only reason to test a model in production is to evaluate statistically against the baseline solution. By that time the engineering questions must be resolved by other means. Prod environment is not to run experiments for the sake of experiments.

David Aponte

@Hugh Reid a blue green deployment looks similar to a “shadow” deployment. Referencing the blog I shared, he says a deploying both models but the shadow model takes some or all of the requests. Vs how fowler describes a blue green deployment “..you have two production environments, as identical as possible. At any time one of them, let’s say blue for the example, is live”. They seem to be the same IMO.Im not the author but I would imagine he wasnt trying to address that valid but more downstream question. Just a thought, but if you have dev, stg, and prd environments, you could always validate the model in stage before deploying into prd, if its just a simple check. But I guess it depends on your use case. The author states that he found shadow mode useful for models that “have a large effect on some conversion funnel”. I dont work in ecommerce so im not too familiar with how shadow mode would help with that but im sure we always ask him

whats a FF based approach btw?

Laszlo Sragner

Feature Flag I guess

jan

So, we also used to have shadow mode, but now we don’t (because we don’ have it anymore) and this makes things more complex (specifically in a micro service deployment).

  1. we don’t have prod data in staging (so I guess that this might be an issue, not only for me)

2. our model is used by a different component which provides an output. Now we want to know how the output would change, if (all else is the same) a different predictions was used.
3. We are using this in terms of delivery time predictions and order delivery planning. This is a situation where you’d prefer to have a check with prod data but not affecting prod-trafficSo in short in our situation, it helps to assess the performance of a model, test integration and faciliates what if analyses before making a too costly decision (edited)

Hugh Reid

@David Aponte I am glad you posted this, it is good to explore these areas. Blue Green is a mature DevOps pattern that has been around for nominally 10 years, but actually a lot longer than that. It is so commonly used that Azure has it built in (https://docs.microsoft.com/en-us/azure/app-service/deploy-staging-slots). All the logging, switching, reference access etc. has been solved and the infra team know what you are talking about. Feature Flags (https://www.martinfowler.com/articles/feature-toggles.html) have been around for quite a few years and provide a way to operate multiple versions of software functionality at the same time — even in a variety of permutations in a single transaction. These 2 things cover infrastructure and software switching; ML adds a data angle, so I am curious to see if there is a version of “shadow mode” that uses that data angle to deliver something new — rather than remain “in the shadow” of recognised DevOps processes.

Set up staging environments — Azure App Service

Learn how to deploy apps to a non-production slot and autoswap into production. Increase the reliability and eliminate app downtime from deployments.

Feature Toggles (aka Feature Flags)

Feature Flags can be categorized into several buckets; manage each appropriately. Smart implementation can help constrain complexity. (71 kB)

David Aponte

Can’t wait to read these! Thanks!

Hugh Reid

As Alex points out, Shadow Mode, Feature Flags and Blue Green come with significant overheads and so good CI/CD is needed to make this kind of thing work. More Reading https://rollout.io/blog/feature-releases-are-the-future-and-they-require-ci-cd-and-feature-flags/

Feature Releases are the Future and They Require CI, CD and Feature Flags — Rollout Blog

Continuous integration, continuous delivery and features flags represent how software development will evolve with future releases.

Nov 27th, 2018

David Aponte

Merge parties sound like a nightmare lol

Damian Brady

A bit late to comment, but that’s a great article. I was going to say it just sounded like A/B testing, but he explained the difference. I like it.

I would also say… I don’t think Production is the wrong place to test experiments.
Sure, you can offload a lot of testing and validation to pre-production environments or even your training/validation pipelines, but there’s no replacement for real-world use.
If something has passed all your tests, rolling it out gradually with effective monitoring is just good DevOps practice. Shadow deployments seem like a really careful, conservative technique.

David Aponte

Never too late!

Damian Brady

Production is also a test environment

fclesio

if no user can be influenced by the effects of the model what’s the point in running the model in production

Thanks for that.This is one of the reasons that even with tons of papers in Offline Evaluation of RecSys I really like to see the system in production first, collect data and check the results instead to rely in offline data.

Hugh Reid

Production is also a test environment

Absolutely “Testing in Production” is also a common feature of customer facing systems (where there is a long tail of test cases, think combinations of browsers, apps, plugins, OS etc. is not something you can get 100% coverage on using simulators),
https://azure.microsoft.com/en-gb/resources/videos/azure-friday-testing-in-production-with-azure-app-service/
https://martinfowler.com/articles/qa-in-production.html

Hugh Reid

But I don’t think we actually reached an end to the line of thinking; in my mind there is still a query about — does data add an extra dimension to traditional devops methods that calls for a new approach?

Damian Brady

I think it’s another consideration to be sure… and it means some practices and tools aren’t appropriate.

Big thanks to all the community members that participated in the conversation!

The MLOps community is an open and equal space where all are welcome to teach and learn from each other. We share best practices, tips, pains, and questions in slack and have live meetups to talk with some of the leading innovators in this field. If you would like to get involved please join slack and reach out. To hear all of our recorded past meetups check out our youtube channel or listen to our podcast.

--

--

Demetrios Brinkmann
MLOps.community

Father, Artist, Happy. Creator of MLOps community and Lover of AI Ethics