Image for post
Image for post

Internal IP vs External IP Performance

Colt McAnlis
Jul 27, 2017 · 4 min read

We’re not getting the throughput on our VMs. I think the bald guy in those videos is lying.

Calls with customers are fun.

Google has a fantastic crew of dedicated people to help you get to the bottom of your cloud problems. I was lucky enough to sit in on a call with “Gecko Protocol” a B2B company offering a custom, light-weight networking protocol built for gaming and other real-time graphics systems.

They reached out to our fantastic support team since they were seeing lower-than-expected throughput for their backend machines which were responsible for transferring and transcoding large video & graphics files.

Here’s the graph they shared with us:

Image for post
Image for post

Truth is, yes, those are a lot smaller numbers than I’d expect. Let’s dig in a bit more and see what’s going on.

Too busy to read? Check out the TL;DR video!

Simulating the same test

Image for post
Image for post

1.95GB / sec was much higher than what Gecko Protocol was seeing in their graphs. Just to sanity check some things, I jumped on a quick video chat with their engineering team, and tried to get them to reproduce this test.

After about 20 minutes of “I still don’t see the same numbers” the reason for the problem suddenly appeared.

External vs Internal IP

I switched over to testing the external IP in my tests, and got the same results as Gecko Protocol was, much slower.

Image for post
Image for post

We see the difference is 1.066 gb / sec between using internal vs. external IPs in this test.

At this note, the team quickly scrambled : One of their engineers realized they were using external IPs for all their backends, even when transferring data within the same zone; and with this difference in throughput, it’s clear to see a bottleneck.

A bigger boat

Having just found a throughput problem with the Dobermanifesto group, I decided to check a higher core instance and see if that would get us closer to the numbers they were seeing.

Sure enough, running a 16vCPU machine, doing same-zone transfer, on an external IP showed exactly the bandwidth that GeckoProtocol was seeing:

Image for post
Image for post

When we switched over to using the internal IP for the same test, on the larger machine, the bandwidth went through the roof:

Image for post
Image for post

The difference was 14.21 Gbits/sec between the internal & external IPs, using the right CPU configuration and same-zone transfer.

All Done!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store