Great post and thanks for the contribution of serving multiple model versions.
For scalability and fail over, I would like to advice using Kubernetes. You can build the docker image of TensorFlow Serving and just download the one from Google. With Kubernetes, you can deploy the models with one command and leverage the advantages of load balance and fault tolerance.
Actually, we have implement the Cloud Machine Learning service, just like Google CloudML. And everyone can submit the train job or create the model service very easily.