Go360, introducing our Vision APIs. Leverage GPU computing via Web Services
Democratizing computer vision
We have all in some form or another started mucking with GPU accelerated computing. Ok, maybe not everyone but in recent times there are more and more developers leveraging GPU computing. People who use GPU acceleration in libraries like OpenCV or Tensorflow understand the pain behind setting it up, and that pain is often exemplified if you are trying to containerize your environment in a Docker. The setup overhead is hell, but once the container is running, you have a hardware accelerated environment that is killer at crunching numbers. I always wondered, why is this process so complicated.
Internally at Go360, I don’t want our engineers to worry about setting up CUDA drivers and configuring containers that are specific to their drivers. NVIDIA dockers do not function properly unless the Host OS and Docker container have the same driver version installed. Furthermore it doesn’t help that Intel owns OpenCV and NVIDIA has CUDA which is a proprietary framework. Behind the scenes, both Intel and NVIDIA are working on making life difficult for each other. So even under ideal circumstances, they are special booby traps setup that requires some expert tinkering to get great reproducible results. And long behold that system is also not compatible with the target architecture that they are developing for.
Azure, AWS and Google Cloud were too expense if you have hundreds of devices, low latency requirements and high frames per second.
Therefore, I have decided to democratize my work to see if it is useful for others who have endured through the pain as well. Rather than building support for every combinatoric setup of GPU, OS and CPU architectures; Go360 has decided to abstract the computation pipelines and actually make them available via simple primitives everyone understands. HTTP, TCP, UDP are protocols that any device can communicate with and it is quite easy to send data payloads across these platforms.
The architecture for the websocket interface is quite simple. We send an image to a certain url + port with different flags. Each flag represents which type of model we want processed. Each model is then processed sequentially if you want all the data consolidated on a single image (synchronized but high latency) or asynchronously in parallel if you want the data as soon as possible (low latency but not synchronized)
In our roadmap, we plan to have the Vision APIs publicly accessible with API keys. Our initial models will include: object recognition, people detection, face detection.
In our second phase we plan to deploy 3D structure registration, single image localization, OCR, Logo Detection.
We do plan to incorporate the ability to move the compute closer to the data or the edge in the future.
Additionally, the storage / cache layer will be also be implemented using Redis + Distributed File Systems. This enables individual tasks to only focus on using Compute, Memory and Network and network mount the storage or use the Redis cache making the uptime of the service much higher.
Stay tuned for more updates as we start ramping up the service. We plan to make the APIs publicly available starting in August, but we are working with a few select partners to test them out.
Written by Sravan Puttagunta