GCP in 2018: Machine Learning Hardware advances

Image: Google and NVIDIA

Machine Learning is a very intensive task requiring a lot of computing power. Using the cloud for horizontal scaling helps reduce the duration of the training process by dividing the work among multiple machines. However vertical scaling by adding newer and better hardware still has a huge impact on speed and performance. Google realizes this by adding new generation GPUs every couple of months, but in 2018 they took it a step further and released their TPUs to the public!

Let’s begin with the good old GPUs

Google started 2018 with a bang by announcing Preemptible GPUs, giving a 50% discount compared to the normal price. These GPUs work the same as normal GPUs, but you have to keep in mind they can reboot at any time (given a 30sec warning to do a proper shutdown), and they can be used for a maximum of 24 hours. It might take a bit of effort to code around these limitations, but the price difference is worth it! In June they even slashed the price of preemptible GPUs even more!

Image: NVIDIA

In February Google added support for GPUs in Kubernetes Engine as well. This makes it easier to scale your ML pipelines based on demand and usage, apply resource quotas for certain namespaces, take advantage of built in technology to ensure that pods that don’t need a GPU don’t get scheduled to a GPU attached node, and all this with extra Stackdriver integration for logging and monitoring.

Back in 2017 we had the NVIDIA TESLA K80 and P100 GPUs available on GCP. This year they added the V100 in April, the P4 in August, and finally an announcement of the T4 in November (currently in alpha). So why do we need so many different GPUs? Let’s compare!

Comparison of current offering of GPUs on GCP

The GPUs listed in the table above are sorted from older to newer. Prices are listed as they currently are, not the original prices on release. Let’s do a small calculation to get an idea of whether this is a good deal or not. The TESLA V100 is somewhere in the range of $10–12k to buy retail. At $0.74 per hour for $11k, this is a total of 14864 hours or 619 days of training 24/7 before you get to the retail price. And this is without electricity and cooling of course. Not a bad deal!

So, why would we need any of the more expensive GPUs? The K80 is really cheap, and the performance isn’t that bad compared to the other cards listed. The reason behind this is that the K80 doesn’t support lower precision floating point or integer operations. Deep Learning workloads don’t really need the 32bit precision, and lower precision calculations are of course cheaper so this results in more Operations Per Second (OPS) being available in the newer GPUs. This trade-off needs to be considered depending on your specific load.


TPU?

One Cloud TPU v2 (Image:Google)

The TPU or Tensor Processing Unit is a specialized Google-designed hardware accelerator for ML workloads in TensorFlow. In February Google released them in beta on GCP. In June they announced the preemptible pricing making them a whopping 70% cheaper at around $2 an hour compared to the normal $6.5/h. This makes it possible to train ResNet-50 on ImageNet from scratch for just $7.5 using the TPU.

Every TPU2.0 has about 180 TFLOPS, the TPU3.0 would be around 360 TFLOPS. (Best compared with something in between the FP16 and INT8 column in the table above, but it’s not as simple as just comparing numbers to get to the real performance).

TPU 3.0 Pod (Image: Google)

Google combines multiple TPUs in one cluster, the so-called TPU Pods. TPU Pod V2 had 64 TPU2.0 boards in it, leading to a total power of 11.5 PFLOPS per Pod. With TPU Pod V3 Google claims to achieve 100+ PFLOPS.

Let’s give that a thought. 100 Peta Floating Point Operations / second…

100 000 000 000 000 000

Operations EVERY SECOND

A fun comparison to grasp this number. If you take 100 Peta seconds, so 100*10¹⁵ seconds. You get very close to the age of the earth! (~143Ps).