Deploying your Keras model

Demo | Sourcecode

My colleague Gidi Shperber and I thought it would be interesting to develop a small server (or possibly a mobile app) that would use deep learning to automatically remove the background of images to create an effect similar to a green screen, and allow people to create studio-like photos, or compose images together. As an initial step, we wanted a minimalistic server that would allow us to test the interest in such a service. As such, I thought it would be interesting to share our experience in deploying such a service.

Once you are done architecting your model, training it and engineering the pipeline, you might want to deploy it as a web-service to run on the server side. In the next blog post we will consider a client side solution.

Our tools

  1. Flask — A python micro-framework which is easy to setup. Please note that if you need to wrap your service in more logic and better web toolbox (like user login, asset pipeline, CSS frameworks, database, ORM…) you might want to consider separating the model serving from the actual user-facing Keras model serving, and interact with Keras through the server. In this example both of these functions will be in under the same server.
  2. GitLab — Gitlab offers free private repositories for individuals and groups, as well as shared CI (Continuous Integration) workers which will help us prepare and build our server docker image.
  3. Heroku — Heroku is a PaaS (Platform as a service) solution which takes away most of the DevOps work needed to set up a server. This, of course, is a compromise since Heroku does not offer GPU instances. Ideally we would like to serve our model from an AWS GPU instance which will require more DevOps work and would be a topic for a separate blog post. Nonetheless, considering how easy it is to use Heroku, it might be worth checking out as a first step, and compare the performance of your model. Heroku offers few different levels of server compute resources.
  4. Docker — is a containerization solution which will allow us to set up easily reproducible environments that can run the same both locally on our dev machine as well as on production/Heroku. As a side-note, Nvidia has its own fork of Docker which allows for GPU accelerated containers, so that once you’re ready to move your model into an AWS GPU instance, this should be an easy switch. Since this article is directed mostly to data scientists, you could think of Docker as a parallel to virtualenv at the OS level, allowing you to install all dependencies, servers, services and configurations into a container and launch it easily on any machine, scaling it up/down.
  5. Git LFS (large file storage)— as most of the model weights data is more than a few MB, it is advised not to store it in regular git, but rather use a separate git tool to store your weights. Follow the instructions to install git lfs locally on your machine.

Let’s look ahead at how our workflow and deployment strategy is going to look like:

  1. Every code change is committed to the local (development machine) git repository and pushed to the remote git repository which in this case will be hosted by GitLab
  2. Once a change has been pushed to GitLab, GitLab’s CI system will pick up the changes and run a build process taking a Dockerfile to create a docker image
  3. The docker image will be pushed to Heroku’s container registry and be used to serve your server

Setting up your local Flask server

As a starting point, we will use Heroku example for mini-conda deployed using Docker containers. 
If you are familiar with Heroku buildpacks, you might notice that deploying with Docker is a bit different. Usually when deploying code to Heroku you would use Heroku’s buildpacks to do simple git push herokufor your code, which will trigger Heroku’s build system and create a slug that is used to run your server. 
Deploying to Heroku using Docker containers works a bit differently from the latter, and requires your local Docker to build an image, and then push it into Heroku registry to be served.

Here are the steps to get started

curl -OL
mv python-miniconda-master background-removal-server
cd background-removal-server
git init
git add .
git commit -m 'initial commit'

Setting up Git LFS

Copy your model H5 file into webapp/model directory and then run

git lfs track webapp/model/* # This will create a .gitattributes file
git add .
git commit -m 'model'

This might be the right time to go to Gitlab homepage, create a new repository for the project and push your models to it:

git remote add origin
git push -u origin master # Watch your models upload with Git LFS

Writing the server logic

First let’s write a few lines for which will preload our Keras model, and a helper function which we’ll use to predict

Also your create an endpoint which will do the prediction and return the result

Note: Every time you save your code in the editor, Flask needs to reload the entire code, which will reload your model, causing long wait times. To avoid this, consider commenting and stubbing your model preloading and prediction code to return pre-calculated results, while iterating on the server side.

To run the server use:

cd webapp
FLASK_DEBUG=1 flask run

You can test your server from the CLI with a simple curl command. possibly something like this if you’re returning a JPG output:

curl -oout.jpg -F”file=@/Users/---/Downloads/camvid1.jpg” http://localhost:5000/predict

Now that everything is working and running, let’s commit

git add . 
git commit -m 'working model prediction'

Deploying to Heroku containers using Gitlab-CI

First maybe an explanation is due — why use a CI in this context? Pushing a docker image every time you deploy requires uploading the entire Docker image which includes all the installed Python dependencies and model weights. This can become heavy and will require you to have proper bandwidth. Instead this can be done automatically so that every time you push your code to Gitlab, a new build process will spin on Gitlab. Gitlab shared worker will build your Docker image and push it to Heroku registry to run. Generally having a CI will make sure your master branch is always in sync with production and will free you from yet another DevOps task of deployment.

Let’s set up Heroku and create a free server: Make sure to install the Heroku toolbelt and then run:

heroku create background-removal

This will create an empty server waiting for you to push code.

Now that we have a server ready, let’s add our CI process which will automatically deploy our code.

In order to let Gitlab deploy from its CI server, go to your Heroku dashboard and copy the API Key. Now back in your Gitlab project page under Settings — CI/CI Pipelines, add in the “Secret Variables” section KEY=HEROKU_AUTH_TOKEN and paste the value you copied.

Add a .gitlab-ci.yml file to your project’s root


Once you commit this file, take a look at your Gitlab project and checkout your gitlab project pipeline in the browser (example)

Gitlab CI log pushing your Keras model server to Heroku using a docker image

Once the build finishes successfully your new server should be live and available at

When using a free dyno server, it has a limited 512MB memory which may crash your app at times, so you might want to consider switching to a beefier Heroku dyno config. Use heroku logs -t to see what caused an exception or crash.

Option 2: Run model on the client-side with KerasJS

Running your model in the browser can save you the server costs as all the rest of the HTML can be served statically and possibly without any server costs. However, the browser environment is still not as powerful as running on a beefy GPU AWS instance and is much more controlled (i.e: less prone to break because of browser incompatibilities). Major improvements are still on their way thanks to WebAssembly and WebGPU2 (see webdnn)

We will review this option in this next post