Unexplored territory: integrate AI into a microservice architecture

Rutger Bezema
[:RI:] @ REWE digital
6 min readNov 18, 2019
Photo by timJ on Unsplash

Today a vast amount of DeepNet implementations are available in many frameworks. The number of tutorials on computer-vision classification applications is getting denser every day. Yet almost no articles describe the architectural side. How to integrate a DeepNet implementation in a microservices architecture? Here is a short story on our experience of bringing the two together.

The Project

We at R&I @ REWE digital had the opportunity to build a prototype for a product-recognition application in one of our logistic centers. If the prototype recognizes the object in view it displays information. If the object is not recognized, it re-trains the model with this new data.

“Ah, sounds like standard CNN-computer-vision problem!”, I hear you think. Well, at least that’s how we interpreted a possible solution.

So we started with a python3 script. We decided to use Tensorflow 1.14. And since we only needed a classification, a standard pre-trained Resnet101. Soon, we finished a straight forward “works on my machine” classification solution. Being able to classify a couple of products. But what if we wanted a more scalable solution independent of the DeepNet framework?

This is where most Computer-Vision tutorials end. No words on integrating the DeepNet. No thoughts on how to make the application more self-contained or reliable. In other words: how to make the application usable in a more complex environment?

One possible solution could be the integration of our application into a microservice architecture. And that’s what we tried!

YAPS: Yet Another Prediction Service

Our application resides inside a classification domain. Let’s analyze our straight forward implementation architecture inside this domain.

The inference defines a bounded context, let’s call it “prediction”. The DeepNet model then implements the “predictions” business logic. It is interesting to note, that in the case of CNN’s the DeepNet model simultaneously also is the Datasource. The application has an input vector: the image. As well as an output vector: the inference result. It runs isolated in its process on an Nvidia Xavier-AGX.

The one thing missing for the application to fit the definition of a microservice is the self-documenting interface. Hence we added a synchronous REST-JSON interface Level 2 for putting an image and returning the inference result.

Up till now, it was unclear to us if TensorFlow-Sessions would stay stable while handling multiple simultaneous requests. As it turns out, the versions we tested: 1.13.1 and 1.14.0 are stable, but not out of the box. We had to write some workarounds. Bastian Schoettner@REWE Digital describes in detail which changes we had to make.

The Architecture

Having the core logic as a microservice in place, we still needed to implement some other parts of the domain. A camera proxy service simplifying access to multiple cameras. A product information service, supplying information about the determined product. A scanning service enabling the user to manual scan a product. And of course, an inference service calling the prediction service with different camera inputs.

At the end of the day, our microservice landscape around the prediction service was in place.

The basic architecture of the given domain. The prediction service seamlessly fits in the microservice landscape.
Basic microservice architecture with AI-Component integration.

Retraining on unknown products

One thing was still missing though: the application should be able to retrain on unknown data! This raises a couple of questions with multiple possible solutions:

  1. Where to host all this new data?
  2. Where will the model be (re)trained?
  3. How do we redeploy a retrained model, without breaking the existing logic (regression)?

It is well known, that the training of CNN’s requires a lot of data. So data management and data governance are common issues in DeepNet applications. A couple of solutions for the storing and managing of data already exist. We decided to try out google’s new AutoML API since it allows a non-AI-expert (in our case a laborer) to work with the data.

AutoML also allows you to (re-)train and export a re-trained model. Bastian Schoettner@REWE Digital wrote a very good article on our experiences with AutoML. We invite you to have a look at his step by step walk-through.

The first Refactoring

Unfortunately, the exported AutoML-Model was an undocumented MnasNet model saved in a protobuf file. Since our YAPS used a standard Resnet101 Model loaded from an hdf5 file, we had to change a lot of code inside the service before we were able to run predictions against it.

This is where we first felt we made the right decision. We only needed local changes in one service, leaving the rest of the application intact. Read about the details of our findings in our article on setting up an Nvidia-Jetson environment with a protobuf file in TensorFlow.

Easy enhancement: Migration

Having answered the first two questions on retraining we were still facing the redeployment task. A classical migration problem.

Remember that we defined the CNN-Model as business logic as well as a data backend at the same time? Think about it this way; Adding a new product to the model, changes the dimension of the output vector as well. Putting a new model in place can thus be defined as a migration from one data model into the next.

In a production environment with many domains, services should communicate asynchronously. A common infrastructure is Apache Kafka messaging bus. In our case, the prediction service could then have consumed a “model-topic” and install the “new model” from a topic-payload. But, since we are building a mere prototype, this solution seemed a little oversized to us. (Being inside the bounds of our context [prediction] asynchronous communication is not necessary anyway)

Instead, we favored an additional synchronous PUT operation on our REST interface. Supplying the prediction service with a new model and a set of test-images for verifying the model. Migrating from one model to the other than include the following steps:

  1. stop the TensorFlow session
  2. save the current model
  3. deploy the re-trained model
  4. start a new TensorFlow-session
  5. run prediction tests with the given test data
    On successful: release the service
    On fail: use the old model

That was easy, again no other services needed any changes!

New boundaries: Encapsulate the TrainEngine access

For obvious reasons (resilience, stability, API-stability, etc.) it is good practice to encapsulate any third-party dependencies. If you keep the context cut clean, these interfaces will emerge on their own.

Since we decided to use AutoML’s backend facilities we still needed one more service to encapsulate AutoML: the TrainEngine. It has the following functionality:

  • Store batches of new data in GPC-buckets
  • Start training a new model using AutoML-API calls
  • Download the new model after training
  • Deploy this model to the YAPS.

One more time, we felt the advantages of using a microservice architecture. Adding this new part of the context to the application led to no changes to any of the existing services: Outstanding!

We now had all the peaces together ending up with the following architecture

Microservice architecture with MnasNet and AutoML-Backend

Conclusion

Currently, a lot of tutorials on computer-vision solutions with deep neural networks exist. Almost all articles describe a standalone application in a single user environment.

We based our application on architectural design-paradigms and well-chosen context boundaries. The result is an easy to understand and extendable computer-vision solution. Implementing clear-structured isolated microservices, communicating over well-defined synchronous Rest-API’s.

With our prototype, we showed, that a scalable DeepNet application is possible. Our solution is independent of the AI-Framework or the Model-implementation. We find, that applying common architectural patterns (even with AI ;-)), leads to a more understandable application.

Try it out, it’s worth the effort!

--

--

Rutger Bezema
[:RI:] @ REWE digital

I’m a digital nomad @ REWE-digital working for Research&Innovation while traveling the world with my family.