TensorFlow model deployment options
Moving TensorFlow model from a Python notebook to production might be a challenging task. Read this post to learn three major deployment options available, their strong sides and weaknesses!
Client-side
TensorFlow.js is a JavaScript library that allows running TensorFlow models in a web browser
Pros
- No network latency for model inference
- Offline inference once model downloaded
- Safe for privacy and GDPR considerations
- No server costs
Cons
- Initial loading time of a model (especially if it’s a big model)
- Device storage and battery consumption limitations
- Lots of modern TensorFlow features are not supported
Serverless
Deploy TensorFlow model as a serverless application, for example as Amazon Lambda functions or using Google Cloud Functions.
Pros:
- Cheap (pay-as-you-go)
- Scalable
- Reliable
Cons:
- Response latency on a “cold start” — this approach requires some time to initialize and load dependencies. This negative effect could be eliminated by having a constant pool of “warmed up” functions, but requires additional effort to set up.
- Function execution timeout limit
- Environment memory limitations
- Deployment package size limitations
Server-based
Deploy a full-fledged server with Flask, FastAPI, TensorFlow Serving used to serve the model via the API.
Pros:
- Smaller latency (compared to serverless).
- Scalable infrastructure and costs (if set up properly)
- API and model versioning out-of-the-box (TensorFlow Serving)
Cons:
- Require solid DevOps skills to set up
- Maintenance time
- Idle server will still cost you some money
Conclusion
TensorFlow.js seems to be well behind of the TensorFlow library and lack of feature support is a main issue with it. Unless your model is using some primitive layers only, be ready that this option might cause some trouble for you.
Serverless is a great choice for proof-of-concept projects or a minimum viable product of some startup. This option might suit you well also if you are not worried about latency and “cold start” or ready to deal with it by “warming up” functions beforehand. In this case, most importantly, be careful to fit memory and timeout limits of your platform!
Server-based choice is the best one for professional projects with high request throughput expected. It requires certain skills to set up, but will pay off with great response time and same scalability if configured properly. Please evaluate possible request count per day for your deployment first: if it is low, even an efficient and scalable infrastructure with single idle virtual machine might cost more compared to a serverless deployment.