Dobermanifesto is a video microblogging network exclusively for pets. Animal based videos can be uploaded, worldwide, and sent anywhere to be viewed & experienced.
This group reached out to me, because they were noticing that their throughput and latency on the Google Cloud Platform wasn’t what they’d hoped, given their recent boom in.. pup-ularity. (HA!)
Their team reached out to me with a very common problem : while transferring data to/from their GCE backends, their observed bandwidth was not as high as they were hoping for:
Ruff try at confirming
What’s odd about this, is that I’m not getting the same performance either, in fact, mine is worse!
Obviously I was doing something wrong with my tests, and asked for a deeper set of reproduction steps from their company.
I updated my test to use 1vCPU instead, and behold, got almost the same numbers they did:
*Facepalm* Nooooooooow I remember.
The #Cores -> Gb/s correlation
Documentation for compute engine states:
Outbound or egress traffic from a virtual machine is subject to maximum network egress throughput caps. These caps are dependent on the number of vCPUs that a virtual machine instance has. Each core is subject to a 2 Gbits/second (Gbps) cap for peak performance. Each additional core increases the network cap, up to a theoretical maximum of 16 Gbps for each virtual machine
Which means that the more virtual CPUs in a guest, the more networking throughput you will get.
In order to figure out what this looks like in practice, I set up a bunch of different core size groups, in the same zone, and ran IPerf between them a bunch of times.
You can clearly see that as the core count goes up, so does the avg and max throughput, and even with our simple testing, we can see that hard 16Gbps limit on the higher machines.
NOTE : if you run IPerf with multiple threads (~8 or so) you can exceed 10Gbps up to about 16Gbps today using a n1-standard-16 or larger
The fix is in!
The Dobermanifesto team took a look at the pricing list, the network throughput graphs I generated for them, and some profiling on their CPU usage, and decided to go with a n1-standard-4 machine, which gave them almost 4x the increase in avg throughput, but cheaper than the n1-standard-8 machines.
One of the nice things about their movement to the higher machine, is that it actually runs less frequently. Turns out their machines were spending a lot of time staying awake, just to xfer data. With the new machine sizes, their instances had more downtime, allowing the load balancer to reduce total number of instances on a daily basis. So on one hand, they ended up paying for a higher-grade machine, but on the other hand, needed to use less core-hours on a monthly basis.
Which goes to show you, once your performance directly impacts the bottom line, there’s a lot of nuanced tradeoffs to consider.
So keep calm, profile your code, and always remember that #perfmatters.