Reducing GPU Workload Costs on Oracle Cloud using Kubernetes

Netra, Inc.
Netra Blog
Published in
4 min readMay 1, 2018
Source: Pixabay. Free for commercial use

Netra develops image and video recognition APIs to help enterprise index and make sense of their visual data. We leverage containers to run our GPU workloads giving us the flexibility to deploy across almost every major cloud provider.

Netra was recently accepted into the Oracle Scaleup Ecosystem program and granted a number of technical support and cloud resources to help scale our solution on Oracle’s Cloud platform.

We’re still in the process of transferring our production environment away from other cloud platforms onto Oracle Cloud, but we wanted to share a few things we’ve learned in the process. We’ll discuss some findings around using Terrafrom to deploy Kubernetes and the performance and cost advantages of using GPU hardware on Oracle Cloud compared to other cloud platforms we’ve previously used.

Deployment on Oracle Cloud using Terraform

Setting up Kubernetes with GPU support on Oracle is a breeze using the Oracle-maintained terraform-kubernetes-installer. The prerequisites (such as Terraform itself and the Terraform OCI provider) are listed in the project README along with installation instructions.

The trick is to use the right Oracle Linux image (the latest one that supports GPUs at the time of writing is Oracle-Linux-7.4-Gen2-GPU-2018.02.21–1). We used Kubernetes 1.9.6.

One caveat is that it only seems to work with bare metal machines, such as the BM.GPU2.2. We would like to note that the Oracle Terraform script is designed for OCI Oracle Cloud Infrastructure service, with its data centers US-ASHBURN-1 that currently only support P100 and V100 BareMetal CPU Instances with virtual machines coming soon.

Performance and Pricing Comparison

Once we were up and running we conducted a performance test to see how the new setup on Oracle Cloud compared to a setup of similar specs on Google Cloud that we’ve historically used.

We deployed 2X P100 instances containing Netra’s Logo Recognition model on each cloud environment. The test measured the time to process 1000 images on each environment. Processing an image involves downloading, decoding , and classifying the image using Netra’s Logo Recognition model.

Networking costs may vary subject to bandwidth usage. Oracle GPU specs and pricing can be found here.

The setup on Oracle Cloud processed a single image in a matter of milliseconds and at a speed 30% faster on Oracle Cloud than on Google Cloud. The additional CPU resources were likely the main contributing factor to the observed speed advantage on Oracle. But the biggest advantage we see is on pricing, which is 62% cheaper on Oracle Cloud than on Google Cloud, for even more CPU resources alongside the 2X P100s. Oracle Cloud offers higher performance and lower pricing versus Google Cloud…now that’s a value proposition!

Note: this analysis only compared Google Cloud to Oracle Cloud we’ve also previously deployed on Azure and AWS. We did not include Azure in the analysis because pricing is a lot more expensive than Google Cloud ($4.14 / hour for 2X P100s). AWS does not have any P100s so we did not include them in this comparison, although they do have more powerful V100 GPUs.

What’s Next

Oracle Cloud clearly out performs Google Cloud in terms of both performance and price, two major variables for us when choosing a cloud provider. With that said, there are a few features that we’ll miss on Google Cloud. They make it easy to manage your cloud resources and automatically scale up new instances on demand.

One feature on Google Cloud that we’ve recently grown to love is auto-scaling. With auto-scaling we avoid over-paying for unused resources. Once the lag in our API queue builds up to a set threshold, Google Cloud automatically adds additional resources to help chew through the spike in volume. It also automatically scales back resources if they operate below a set utilization level for a certain amount of time.

But Oracle Bare Metal resources provide such a unique cost optimization opportunity today that we’ll be moving our entire infrastructure onto Oracle Cloud. This includes not just our image recognition models but also our API messaging broker, Kafka, and our logging.

In the future we’re looking forward to seeing features like auto-scaling and a deeper integration with Kubernetes become available on Oracle Cloud.

Netra develops image and video recognition APIs to help enterprise structure and make sense of their visual media. Netra’s API ingests photo or video URLs and, within milliseconds, automatically tags it for visual content such as brand logos, objects, scenes, and people with demographic classification. If you’re interested in learning more, visit our website or say hello at info@netra.io !

--

--

Netra, Inc.
Netra Blog

Consumers share 3B photos daily. How are you taking advantage of all this visual data?