Great article !
Was trying out the tensorflow serving to host pre-trained inception on a google cloud (1 GPU K80,4 CPU machine with 15GB RAM). Saw that serving was using the Cuda capabilities.
Got a response on my local for one image inference in nearly 12 secs ! Is that even possible or am I doing something wrong ?? What is the machine configuration that you use for Amazon EC2 instances and what approx response time do you get for a single request.
Thanks in advance !