TensorFlow model deployment options

Published in

Riga Data Science Club

3 min readSep 1, 2020

Moving TensorFlow model from a Python notebook to production might be a challenging task. Read this post to learn three major deployment options available, their strong sides and weaknesses!

Client-side

TensorFlow.js is a JavaScript library that allows running TensorFlow models in a web browser

Pros

No network latency for model inference
Offline inference once model downloaded
Safe for privacy and GDPR considerations
No server costs

Cons

Initial loading time of a model (especially if it’s a big model)
Device storage and battery consumption limitations
Lots of modern TensorFlow features are not supported

Serverless

Deploy TensorFlow model as a serverless application, for example as Amazon Lambda functions or using Google Cloud Functions.

Pros:

Cheap (pay-as-you-go)
Scalable
Reliable

Cons:

Response latency on a “cold start” — this approach requires some time to initialize and load dependencies. This negative effect could be eliminated by having a constant pool of “warmed up” functions, but requires additional effort to set up.
Function execution timeout limit
Environment memory limitations
Deployment package size limitations

Server-based

Deploy a full-fledged server with Flask, FastAPI, TensorFlow Serving used to serve the model via the API.

TensorFlow Serving on Google Kubernetes cluster

Pros:

Smaller latency (compared to serverless).
Scalable infrastructure and costs (if set up properly)
API and model versioning out-of-the-box (TensorFlow Serving)

Cons:

Require solid DevOps skills to set up
Maintenance time
Idle server will still cost you some money

Conclusion

TensorFlow.js seems to be well behind of the TensorFlow library and lack of feature support is a main issue with it. Unless your model is using some primitive layers only, be ready that this option might cause some trouble for you.

Serverless is a great choice for proof-of-concept projects or a minimum viable product of some startup. This option might suit you well also if you are not worried about latency and “cold start” or ready to deal with it by “warming up” functions beforehand. In this case, most importantly, be careful to fit memory and timeout limits of your platform!

Server-based choice is the best one for professional projects with high request throughput expected. It requires certain skills to set up, but will pay off with great response time and same scalability if configured properly. Please evaluate possible request count per day for your deployment first: if it is low, even an efficient and scalable infrastructure with single idle virtual machine might cost more compared to a serverless deployment.

TensorFlow model deployment options

Client-side

Pros

Cons

Serverless

Pros:

Cons:

Server-based

Pros:

Cons:

Conclusion

Written by Dmitry Yemelyanov