Serving Machine Learning Model in Production — Step-by-Step Guide-2

Ilnur Garifullin
Published in
4 min readOct 5, 2018


In the previous article we’ve deployed and served a seq2seq model, that was able to inference on a raw incoming strings and generate some answers. The main issue with that model was that it wasn’t able to recognize words with attached periods, commas, and words that are not in the lower letter case. In those situations model would’ve replaced these words with <UNK> tokens.

In this article we will address this problem by adding a pre-processing stage, which will properly tokenize incoming sentences and pass it further to the original model. All the code presented in this article is available on GitHub.


All the same steps, defined in the previous article.


Defined issue can be resolved by attaching to the application an additional stage that will properly tokenize incoming strings and pass them further. This can be done in any language, but we will stick with Python and a nltk library. Inside conversation-tensorflow directory, let’s create another folder called tokenizer, which would hold our Python model.

$ mkdir tokenizer
$ cd tokenizer

The Python model should contain a processing script, a model’s contract and Python dependencies, if needed.


Let’s start with actual processing script. It would handle the manipulations with the incoming string. First, create a src folder and place there a file.

Regarding the [1] point — for word tokenization nltk library uses its own language-specific vocabularies, so, for the very first time, when the nltk library have been just installed, you would need to execute the following lines inside Python shell:

>>> import nltk

This command will create a folder in the home directory and put all the vocabularies there. But because ML Lambda’s models’ infrastructure does not have an access to outside of the container, this approach cannot be applied. This constraint is made due to security reasons. To overcome this, we can download these vocabularies ourselves and pack it with the model. The reason for [1] is just to modify the list of searching paths, so that we could find packed vocabularies inside the model’s container. By default ML Lambda places all model’s files inside /model/files directory.

Next, we just tokenize our sentences [2] and create an output tensor [3].

Model Dependencies

As we mentioned above, we would need to additionally provide nltk_data folder. To do that install nltk library locally (pip install nltk), inside Python shell execute‘punkt’), and then copy nltk_data folder from your home directory to the model’s folder.

We also need to define requirements.txt for the model, if it uses any third-party packages that are not present in the Python Standard Library. In our case those are nltk and numpy.


Model’s Contract

Finally, let’s provide model’s contract. It is aimed to describe what would be the inputs and the outputs of the model. In our case it’s simply an incoming [x] and outcoming [input_data] strings. Create a serving.yaml file and place there the following:

That’s it, the model is ready for serving. The final files structure should look like this:

├── nltk_data
│ └── ...
├── requirements.txt
├── serving.yaml
└── src

Model Serving

We assume that you already have a running instance of ML Lambda, defined hs cluster and an uploaded seq2seq model. If you don’t, please refer to the previous article and prepare the environment.

Now let’s upload the model to the ML Lambda

$ hs upload

This command will upload all the model’s data to ML Lambda. Internally, a new Docker image will be created and stored in the Docker registry under specific version. This leads to a proper versioning, so you don’t have to worry about model’s versioning at all.

Next, let’s update existing seq2seq application. Go to the Applications page, select seq2seq application and click the Edit button. Add the new stage with tokenizer model before the original TensorFlow model and update the whole application.

Note: be cautious when you’re adding a new model to the pipeline. Make sure you doing this with the Add New Stage button at the UI bottom, not by selecting additional model in the existing stage.

Let’s test if everything works fine. We will do that via REST API.

$ curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{"x": [["Hello, there!"]]}' "localhost/gateway/applications/seq2seq/seq2seq"

This yields the desired result.


For more information about applications’ interfaces refer to documentation.


In this article we’ve updated a previously defined application by upgrading it to a pipeline with the pre-processing model and the seq2seq model itself. Pre-processing model properly tokenizes incoming sentences, which resolves the issue with <UNK> tokens.