Building Complex AI Services That Scale

Vincent Delaitre
Deepomatic
Published in
5 min readNov 14, 2016

Introducing DMake, Deepomatic’s open-source tool to manage micro-services.

Deepomatic provides computer vision solutions to solve image related problems for companies. We take care of everything: problem statement, data annotation, definition+training of the deep learning models and exposure of those models via our API.

This API allows our clients to automatically perform visual tagging specific to their use cases, detection of custom objects in images and visually querying their own database of products for similar objects. This is a complex infrastructure with many interconnected micro-services.

In this blog post, I will describe the development and deployment processes that we put in place to ensure the quality of our services. It will also be the opportunity to open-source DMake, our internal tool to manage the testing and deployment of our micro-services.

An example: Deepomatic’s visual search engine

Let’s take our visual search engine as an example. A visual search engine is able to translate an image into a mathematical summary called feature. The goal is then to store the features of each indexed image in our client’s database. When a new query image comes, we compute the feature of this image, compare it to features that we have in the database and return the images with the closest indexed features.

We wanted our engine to be fast, reliable and scalable. For it to be fast, it needs (i) to compute the features by using graphics cards (GPUs) and (ii) to hold all the features in RAM. We thus need servers with NVIDIA GPUs and servers with lots of RAM. For it to be reliable, all the services need to be replicated across multiple availability zones. And for it to be scalable, each service needs to be able to scale, possibly independently from other services.

For all the above reasons, we decided to divide our search engine into micro-services. So apart from the obvious databases, in-memory stores and queue services, our app has 3 micro-services:

  • The web app: this is a Django web server that allows the end-user to interact with the search engine via a RESTful API.
  • The feature worker: the goal of this service is to compute the different features of all the images that go trough the API. These features depend on the type of images we are interested in (fashion, furniture or generalist), and there might be several features to extract for the same image (e.g. for fashion, we compute specially trained abstract features as well as pre-determined attributes for the fashion item of interest). The features are computed by convolutional deep neural networks (CNN) which are computationally intensive models. Luckily for us, GPUs can speed-up the computation by a factor of 40, so the feature workers exclusively run on GPU-powered servers.
  • The database worker: Once features are computed, we need to store them in a database. When searching for an image, we compare its features to the features stored in the database and compute similarity scores that we aggregate together. This requires very specific operations that would be too slow to perform using e.g. map-reduce operations on existing databases, so we had to develop our own in-memory worker, backed by a more traditional database for persistence, (we use MongoDB for historical reasons).

All this runs on Amazon Web Service (AWS) since they were the first to provide access to GPU servers. Summing up the whole architecture in a diagram, here is what it looks like:

Architecture of Deepomatic’s search engine, all servers are replicated across availability zones.

Docker + DMake = ❤

In order to deploy these services, we used Docker from day one. Docker is a container solution that allows you to package your application and its dependencies in Docker images, which can be launched in a transparent manner on Linux, Mac OS and Windows. The Docker images are built hierarchically which makes them easy to manage. In addition to that, the Docker Hub gives you access to lots of official Docker images for all common services.

Another interesting feature of Docker is that it allows you to link two containers running on the same host so that they can communicate while still being isolated from the rest of the world. If it is already interesting for security reasons we found it even more useful for testing.

Testing your application when the number of micro-services increases rapidly becomes a nightmare: unit tests are not enough, you need integration tests to make sure each service speaks the same language. This is where Docker comes to the rescue. As all your services are nicely packaged in Docker images, you can launch them all on your build server to run integration tests. By also linking Docker containers for databases and queue servers to the containers of each micro-service you have your full application running on a single node in complete isolation. This allows the new version of your application to be fully tested easily.

However, the orchestration of such a testing process can become tricky. To facilitate the management of all the interactions between micro-services, we wrote a tool to specify how to test, run and deploy each micro-service. It is called DMake and you can find it → here ←.

Development Process

In order to guaranty the quality of our services and push new features in production without pain, we have integrated Github, Jenkins and Docker into a continuous delivery pipeline. We also use Waffle.io as our project management tool for its nice integration with Github. Overall, the development process looks like this:

  • The developer picks an issue to fix in Waffle from the “Ready” column and clones the production ready branch by naming it by the issue number (see here). Pushing the new branch on Github automatically marks the card in Waffle as “In Progress”.
  • When the fix is ready, the developer creates a pull request on Github. This automatically triggers building and testing of all the modified or required micro-services in Jenkins, with test status reported in Github.
  • When tests and QA are passing, another member of the team reviews the pull request and merges in production when she/he has no additional reviews to make. This triggers a job on Jenkins, builds the Docker images, deploys the new version in the staging environment and marks the issue card as “Done”.
  • When we are happy with the result, we merge the staging branch into production, and Jenkins handles the deployment of our Docker images on AWS.

All these steps are easy to configure with Jenkins 2.x and DMake. We will soon publish a tutorial on how to setup a simple yet effective build pipeline in the DMake documentation. Stay tuned!

Acknowledgment

I would like to finish by thanking the research and development teams of Deepomatic who make this visual search engine possible and who take care of it on a daily basis, improving its performances.

Alexis, David, Hugo, Matthieu, Pierre and Zoé: thank you for rocking it! :-)

Vincent Delaitre, CTO of Deepomatic.com

--

--