Before I start I have to mention that I don’t have any prior experience of working at the head of the security department. The example is made up for the sake of clarity and explainability of the important concepts on the delivery of AI products to the production environment. In this article I’ll try to cover some aspects that lie between an out-of-oven model and a production environment that handles millions of requests per second.
Suppose you’re a head of the security department in a bank. It’s the beginning of the working week. You enter your office and see a report on your table summing up the past year. You revise all the contents, everything seems good — overall amount of transactions almost doubled since the last year. But then you open the “Fraud” section where everything starts to go gross. Fraud transactions now are making up around 7.2% from all of the transactions. They gained 3.4% during the past year and are expected to grow further based on the exponential curve exploding from the previous years. This can no longer last, you have to do something with it.
You convene a meeting with the conjunction with analytical department and the agenda has only one discussing point — “Fraud”. You open with a quick greeting and immediately switch to the main issue that bothers you. After a small but definitive speech you wrap up with the short question — “What can we do?”. Some guy from the analytical department, Steve, mentions that he’s familiar with the machine learning techniques and can investigate data to see what could be done.
“It will take some time but I will show some results at the end of the week”, he says.
Likely for us you have a little odd habit — you’re eager for visualisations and like to draw everything down. You open your notebook and draw the first block.
A couple days later Steve knoсks at your door with the good news — he can create a model that will learn to distinguish clean and fraudulent transactions, but he needs time for model building and evaluation. You discuss this with the head of the analytical department and you both decide to create a new department that will be charged for everything related to machine learning and artificial intelligence (since you don’t have one). Steve is appointed as a head of the department.
In the meantime you conform changes to the server side. Now a service responsible for conducting transactions has to execute an intermediate request with transaction data to a new service which will approve or decline it.
You open your notebook and add two more blocks.
In the next Monday you visit Steve to see what’s going on. He says he had already made a few models and all of them have an acceptable level of performance. These are great news, you can already infer on the historical data. And the development team has done their job too, the underlying infrastructure is ready.
“Steve, you have to expose an HTTP interface to your model”, you start a conversation.
Steve gently notices that he’s not a DevOps engineer and doesn’t have enough experience of configuring server applications.
“But”, he says, “I’ve created a few pet servers back in the university. I’ll check what I can do”.
“Good. I’ll send you someone from the development team to help you out”, you reply.
It seems legit for you to complete the diagram with the last block.
In the next day you hear altercations from the Steve’s department. Roger — a leader of the development team — is horrified from the Flask server Steve has created.
“It works but what happens when there will be be 10 requests per second? 100? 1000? It will crush and you wouldn’t even know about it”, you hear while opening the door. “Moreover, how are you going to manage your model? Will you update the model via VCS or by SSH connection each time you develop a new model version? What if a model will weight more than 5MB? 100MB? 500MB?”
Steve gets a little frustrated with all of these questions. You decide to intervene.
“Sorry for interrupting you guys, but we also have a minor change of plans. We cannot let all of the production traffic flow through the model on the first run. We need to make sure that the model performs well in the long term. At least we have to evaluate it on the first month of blocked and accepted transactions to see how well it does in the real environment. I think 5 percent will be a good starting point.”
Clearly Roger had a more cheerful expression when you’ve entered the room.
“Emm, that’s easy to say, not so easy to do,” he breaks the quiet. “An ugly trivial solution would be to put all logic of performing model inference into API server. This means Steve would have to hardcode this into the model.”
You notice Steve’s skin suddenly turned pale.
“The bad news is, this is not scalable, which is definitely a requirement for us,” continues Roger. “Moreover, what if we would want to run a second model in parallel? We should also collect the metadata for each request to be able to identify which requests went through which model”.
Steve exhales and the skin starts looking normal, although Roger goes even more upset.
“I would have to think about this. Surely this would require service meshing and overall models’ orchestration on a higher level. I’ll come up with updates next week”, Roger almost leaves the room but accidentally notices Steve’s happy face. “Don’t be so shiny, darling. If you’ll see some new article published recently that achieves better results which would force you to, for example, change the model significantly along with the web server — know, you cannot throw away your current endpoints without notifying me. There would be model export and runtime preparation for sure. I’ll talk to you after the lunch.”
“Seems like this is going to be the hell of the month”, says Steve.
“And when will we serve the model?”, you ask unobtrusively.
“Probably after runtime preparation the model will be deployed, then we can start obtaining predictions.”
Bitterly sighing you open your notebook and make a few corrections. Changes accumulate rapidly.
The next week swept past you painfully long. You’ve talked to Steve but all he got were some instructions on the model export from Roger. Steve said he shared a few versions of the model with Roger, but didn’t have any news since then. On the sync-ups Roger constantly replied he needed more time.
“The manager is still under development.”
“What is that?”, asked you once.
“That’d be a service that will manage all the models along with handling all intermediate actions”, replied Roger.
“Like extracting models’ metadata and building Docker images on our own instead of forcing Steve to do that. This will increase fault tolerance and give us more flexibility for managing different models.”
Hmm, looks like you can associate blocks in the diagram with different teams. Although Roger’s work is not over yet, Steve’s workflow is already well-formed. He has done his major work and future tasks very likely will be related to the defined blocks. You open your notebook and distinguish the first four blocks labelling them as data science work.
Today is exactly one month since you’ve started the project. You have already lost all of your patience and intended to “politely” elaborate on the fact that this can not be put on the long run when suddenly Roger enters your office with updates.
“It’s over!”, you hear a joyful Roger’s speech. “We’ve created a machine learning environment in our cloud. Steve has already uploaded a few model versions. Now we’re splitting the traffic into 95/5 proportions from which 95 percent go through the model which accepts all requests by default, the other 5 percent go through the Steve’s model for real predictions. This even can be used for A/B testing. All requests will be aggregated and stored within the environment. After we’ll obtain ground truth labels, we can check how well model performed throughout the month. We will test this setup for a while.”
“That’s fantastic, Roger!”, of course you’ve already forgotten your intents to crucify him. “But while you were working I’ve come up with a few ideas regarding model serving. I think, we should somehow monitor models during inference time. Detecting unusual behaviour or something else would be very helpful.”
Roger twisted for a moment.
“Hmm, It’s hard to come up with the solution immediately. Since this is an inference time, we should probably discuss this with Steve, maybe he has some thoughts on this problem”.
Sounds fair enough. Anyway, you open your notebook and add a few changes.
Your assistant bursts into the room.
“We’ve got an overly increased number of transaction denials! Customers flooded technical support asking why their transactions were denied. What the hell happened?!”
You quickly glance at Roger but judging by the question mark represented by his face it’s very unlikely it’s his fault.
“Roger, any ideas?”
“I think, we should see Steve. Come along”, you say while heading to Steve’s department.
You find Steve making himself a coffee.
“Hi, Steve, how is it going?”, both of you ask simultaneously.
“Great, I’ve uploaded a new model version half an hour ago. Probably it’s already running”, he replies.
“Yeah, we’ve noticed. And you know who else have noticed? About 10 000 customers tearing guys from technical support complaining about transaction denials!”
Steve evaporated almost immediately. You found him drilling his laptop mumbling something on the mixed language of curse words and technical jargon.
“What is it, Steve?”, ask you approaching.
“Seems like I’ve accidentally uploaded a wrong model version.”
Silence reigns the room for a while.
“A shameful mistake but can happen to anyone”, utter you finally. “Roger, can we execute integration tests before model release?”
“Yeah, I’ll handle that. Steve, you will have to upload test data and a baseline performance percentage that a newer model version have to achieve to be released”, replies Roger.
“What’s with the model?”
“Redeployed, everything should be good now”, replies Steve.
“Good”, you release your assistant with a few instructions. A light sense of relief fills the room. Not for long though, you recall your intents on model monitoring.
“Steve, can we do something about models’ monitoring? It’d be nice if we could track the changes in the production inference/data and see how model behaves to possibly avoid such situations.”
Unlike Roger, Steve is very optimistic about the idea.
“Oh, that sounds great! We can monitor distributions of the production data, model predictions and also some system metrics. Monitoring concept drifts would be also a good practice, I think. Ah! Anomaly detection! We should also push notifications when something goes wrong! I’ll help out Roger with implementing that.”
“Ah, come on! I’ve finished setting up the environment 2 hours ago”, Roger sullenly leaves the room.
Kindly smiling you advise Steve to leave Roger alone for a while and research a bit on the monitoring topic. Opening the notebook you add a few corrections along with a few blocks.
“It’s always good to have a second opinion”, pops up in your mind.
“Steve, do you have any comments?”, you show him your notes.
“What is that? Interesting!”, he’s very enthusiastic about your drawings. “Yeah, I see. It’s hard to label the rest blocks as DevOps-ish. I’m contributing here and there too. But! You can mix colours and everyone will be satisfied, I think.”
“Sound legit. Do you have a blue marker?”
“Looks great! We should also add here …”, starts Steve but you interrupt him with a quick gesture. “We’ve got enough here to start building our Machine Learning. For sure there are plenty of nice augmentations to be added but first thighs first. Now, let’s go and get things done!”
Thank you for reading the story. We could dive deeper to place, for instance, model sustenance procedures into the loop, but not this time as I believe the given workflow provides a rich soil for ML Workflow augmentations.
The story shows how a machine learning workflow could shape and evolve within a single company. The industry had developed a vision how machine learning projects have to be proceeded but the instrumental part leaves much to be desired and automated.