All you need is PyTorch, MLflow, RedisAI and a cup of mocha latte

Sherin Thomas
Published in
8 min readJul 28, 2020


Note: If you are new to PyTorch, MLflow, or RedisAI, I will introduce them briefly where needed, with a few references. However, the main aim of this blog post is not to run through these tools, but rather to provide an example that shows how these tools can be used together to build a spectacular workflow for your deep learning infrastructure.

Another note: At the time of writing this blog, MLflow is at 1.10.0 and does not yet have the torchscript flavor integrated. The example here requires this pull request to be merged in order to enable torchscript support. Anyhow, things should be official soon, so we can get started.

There are a good number of blogs talking about how to build a model using PyTorch and a smaller portion of them talking about how to deploy a PyTorch model using X. But the deployment story is still a headache for DevOps engineers (or the modern-day MLOps colleagues) for multiple reasons such as:

  • Linking development and experiment management to deployment
  • Deploy to a runtime with high performance and capable of serving many requests per second so that you max out the hardware and handle backpressure properly
  • Provide horizontal scaling
  • Provide High Availability

RedisAI tries to ease the process by building high-performance inference features on top of Redis, the well-known super scalable in-memory database. MLflow, on the other hand, has gained popularity as a good companion for training machine learning / deep learning models and manage the entire life cycle from training to validation, to deployment on serving infrastructure. Recently, starting with its 1.9 release, MLflow started supporting deploying models trained within the MLflow environment to multiple deployment targets through plugins. At this point, we thought it’d be nice to deploy a model to RedisAI from MLflow with just one command and went on to create a plugin for that. It was a great experience to interact with the MLflow developers in the process. This blog post will walk us through a specific workflow that uses MLflow to train a PyTorch model and the new plugin system to deploy it to a RedisAI instance for serving.

For those of you who want to explore and play around with the code, the complete source code is available on Github. Once you have cloned the repository, you might want to set up a conda environment in the following way

cd path/to/the/root/of/the/repository
conda env create -f env.yml
conda activate redisai_mlflow_demo

This will set up a new conda environment and install all the dependencies to the environment named redisai_mlflow_demo.

For this blog we want to simulate a real-world use case, so we are creating an application that has a web UI (with an uncanny similarity to Google Docs) where you’ll be able to write the beginning of a story and let the AI writes the rest for you. For the sake of this example, we pre-populate the UI with a piece from the book series “A Song of Ice & Fire”.

Building a transformer model

We’ll start by building a training loop in PyTorch for a transformer model that can complete a story given an incipit. Now you might have several questions related to the previous statement. If you are asking “what is a model”, there’s some ground you should cover before continue reading. If you are asking “what is PyTorch”, it is one of the most popular deep learning libraries, with a large following among researchers and rapidly increasing adoption in production. If the question is “what is a transformer”, that is a type of neural network architecture that has become the state of the art in the “deep learning on text” world. While it’s not mandatory for you to know PyTorch or transformers to follow this blog, it’s nice to know a bit about them for you to understand the terminologies that might come later.

As of mid-2020, there’s a great availability of pre-trained transformer models, popularized by the great Huggingface model zoo. For the blog post, I think it’s quite reasonable to choose GPT-2 as a good candidate for the task, but feel free to experiment with other transformer models like BERT and descendants. Did I say pre-trained? Well, it would be a major roadblock (and a personal financial disaster) to be training a billion+ parameter model from scratch for a blog post.

Still, we want to demonstrate some form of training, it could be for fine-tuning purposes, for instance, or even a bold end-to-end training just because someone else would be paying for it.

Code for defining the custom GPT2 class, tracing it and saving it

I have a script that is supposed to train the model on the given dataset. For the sake of this example, I will run a dummy training loop on top of a pre-trained GPT-2 obtained from Huggingface. Last, I will export it as a TorchScript model. Well, I know you must have asked yourself “can I save the native PyTorch model instead of TorchScript”? Nope. RedisAI is an optimized runtime with no ties on Python and as such it expects a torchscript model for serving.

Managing the ML life cycle with MLflow

MLflow is one of the popular machine learning life cycle management tools that come with several integrations with existing engineering tools, infrastructure systems such as AWS Sagemaker or AzureML. The beauty of it is that it comes in the form of an easy-to-learn library and it requires minimal changes to an existing codebase in order to take advantage of it. Better yet, you can choose to use experiment tracking, or logging, or deployment, or all of them: MLflow is not a monolith. It has extensive documentation and an always-active community around it. In the 1.9 version of MLflow, they had made the plugin system available for integrating any deployment infrastructure with MLflow. What does this mean exactly? It means that you can potentially target any ML/DL runtime, not just the ones that are natively supported by MLflow, to deploy a model tracked by MLflow into production. One of those, and the primary reason why I embarked on this whole endeavor, is RedisAI.

I have made a few changes to the script we’ve seen above to use MLflow to log the params and save the model as an MLflow-model (MLflow keeps an internal representation of model file and its metadata. Plugins are designed to work only with MLflow-models).

The whole project, including the MLProject file, is available in the Github repository. It is also important to note that we use MLflow to trigger the execution of the pipeline rather than calling directly using the Python interpreter.

“Training” would take a few minutes, depending on what the speed of your WiFi is - since you need to download the gpt2 model. Once the pipeline is run, you’d be able to see the run ID in the terminal which in our case is bd3c13f7b3b74d67a1dc61e29b8cfce8. You can also see the same in the MLflow UI which can be launched by using the command mlflow ui.

MLflow UI entry for the above run ID

The hard part is over. We have built the training loop, run the training script (which usually takes minutes or hours or days or weeks or even months to finish depends on your data/model size).

Deploying the model to RedisAI

In this section, I’ll show you how you can ease the potentially-daunting deployment task (the complexity escalates to many different levels if you need to scale up your infrastructure to serve huge load) with the plugin system I have been bragging about and with the help of RedisAI. For those who are wondering, it’s a high-performance production runtime built by Tensorwerk and RedisLabs together as a team. It runs your deep learning model, built using PyTorch, TensorFlow, TensorFlow Lite, or any model that can be executed by ONNXRuntime (these already cover a lot of use cases, other backends may be added in the future). I think the major benefit with RedisAI as the production inference engine is that you get all the features provided by Redis itself, like scalability, fault tolerance, serving performance, in-memory caching, the client ecosystem, all of this coupled with the fastest DL/ML runtimes.

I already have a blog (a bit old) talking about the features of RedisAI if you want to look around or feel free to ask any questions in the community forum.

For this example to work, you’d need a RedisAI instance running in the same host where you are running the example from. We can quickly set up that using the official docker image

docker run -p 6379:6379 redisai/redisai:latest

We can then deploy our MLflow model to RedisAI using the plugin (the initial conda environment setup would already have the plugin installeD, but in case you need to explore the plugin, take a look at the Github page).

mlflow deployments create -t redisai -m runs:/bd3c13f7b3b74d67a1dc61e29b8cfce8/model --name gptmodel

Here the -t redisai argument will pick up the RedisAI plugin and offload the create task to the plugin. Note that the run ID we are using here is the same as we have copied from the mlflow run . output.

Voila!! As the output states, torchscript deployment is created in RedisAI and is ready to serve.

Now we can run our flask web server that comes with the UI and see the result in action. Run the below command and go the URL shown in the output

The URL will take us to the same page you have seen in the starting of this blog, but the only difference is that the “start” button on the top of the page will work now, as we have the deployment running in RedisAI ready to serve requests.

Wrapping up!!

This blog post is an attempt to show how far the deep learning engineering ecosystem has grown and how far we have come to make the deployment story as easy as possible. However, the ecosystem is still pretty much in flux there are numerous attempts made by different firms and communities to ease the process. I’ll be making different variants of this blog post to cover TensorFlow and ONNXRuntime models to be used with MLflow in the near future. So stay tuned! And thanks for bearing me so far!



Sherin Thomas

Some times I wonder, why machines are still dumb. Then i realize, its the human who make the machine