TensorFlow model deployment options

Dmitry Yemelyanov
Riga Data Science Club
3 min readSep 1, 2020

Moving TensorFlow model from a Python notebook to production might be a challenging task. Read this post to learn three major deployment options available, their strong sides and weaknesses!

Client-side

TensorFlow.js is a JavaScript library that allows running TensorFlow models in a web browser

Pros

  • No network latency for model inference
  • Offline inference once model downloaded
  • Safe for privacy and GDPR considerations
  • No server costs

Cons

  • Initial loading time of a model (especially if it’s a big model)
  • Device storage and battery consumption limitations
  • Lots of modern TensorFlow features are not supported

Serverless

Deploy TensorFlow model as a serverless application, for example as Amazon Lambda functions or using Google Cloud Functions.

Pros:

  • Cheap (pay-as-you-go)
  • Scalable
  • Reliable

Cons:

  • Response latency on a “cold start” — this approach requires some time to initialize and load dependencies. This negative effect could be eliminated by having a constant pool of “warmed up” functions, but requires additional effort to set up.
  • Function execution timeout limit
  • Environment memory limitations
  • Deployment package size limitations

Server-based

Deploy a full-fledged server with Flask, FastAPI, TensorFlow Serving used to serve the model via the API.

TensorFlow Serving on Google Kubernetes cluster

Pros:

  • Smaller latency (compared to serverless).
  • Scalable infrastructure and costs (if set up properly)
  • API and model versioning out-of-the-box (TensorFlow Serving)

Cons:

  • Require solid DevOps skills to set up
  • Maintenance time
  • Idle server will still cost you some money

Conclusion

TensorFlow.js seems to be well behind of the TensorFlow library and lack of feature support is a main issue with it. Unless your model is using some primitive layers only, be ready that this option might cause some trouble for you.

Serverless is a great choice for proof-of-concept projects or a minimum viable product of some startup. This option might suit you well also if you are not worried about latency and “cold start” or ready to deal with it by “warming up” functions beforehand. In this case, most importantly, be careful to fit memory and timeout limits of your platform!

Server-based choice is the best one for professional projects with high request throughput expected. It requires certain skills to set up, but will pay off with great response time and same scalability if configured properly. Please evaluate possible request count per day for your deployment first: if it is low, even an efficient and scalable infrastructure with single idle virtual machine might cost more compared to a serverless deployment.

--

--

Dmitry Yemelyanov
Riga Data Science Club

Founder at Riga Data Science Club | Machine Learning Consultant