Examining cross-region communication speeds in AWS

Comparing bandwidth speeds between regions connected through VPC Peering and Software VPNs

As organizations make a focused effort to migrate workloads to AWS, something that many of them s̶h̶o̶u̶l̶d may be considering is the use of multiple AWS regions for hosting resources. Utilizing multiple AWS regions provides benefits when you:

  1. Have globally distributed end users and want to provide them with low latency access regardless of their location
  2. Want to build a highly-available system immune to the effects of regional AWS failure

At the time of writing, the AWS cloud spans 18 geographically distinct regions (orange circles in Image 1), with plans announced for 4 more (green circles in Image 1) in the near future:

Image 1: AWS’s Global Infrastructure Footprint

With an AWS solution that leverages multiple regions, you’ll often find that resources in one region need to communicate with resources in another. As an example, you may have implemented a ‘Shared Services’ setup featuring a hub and spoke network topology as shown in Image 2:

Image 2: Illustration of hub and spoke network connectivity

Until late last year, connecting resources in different regions required some additional thought. One option was to create redundant (if following best practices) IPSec tunnels using a software VPN product of your choice. A second was to create a dedicated transit VPC hosting EC2-based VPN appliances. However, towards the end of 2017, AWS introduced a third option with the announcement of cross-region VPC peering support. This is a feature that had been in high-demand from AWS users for numerous years and an announcement that caught the attention of many.

Release of this new functionality sparked my curiosity: Other than simplifying the setup of cross-region communication, would cross-region VPC peering have any impact on the speed and/or consistency of the bandwidth between resources hosted in different regions compared to that of a software VPN based solution? And if so, to what degree?

Infrastructure Setup

To try and answer this, I configured the following resources shown in Image 3 to test bandwidth between my chosen hub region of us-east-1 (Northern Virginia) and spoke regions ap-southeast-2 (Sydney) and eu-west-2 (London):

Image 3: Resources created in 3 AWS Regions to test bandwidth

I decided to test bandwidths from us-east-1 to two different AWS regions to examine bandwidth variability caused by geographic separation. Although I could almost guarantee that the bandwidth between us-east-1 and ap-southeast-2 would be less than us-east-1 and eu-west-2, I wanted to investigate by how much.

Looking at the architecture in Image 3, you can see that there are three EC2 instances in the us-east-1 region. Two of these (Test Servers A and B) were used to send traffic to Test Servers C and D, and a third was configured as an Openswan software VPN. Image 3 also shows that the subnets that Test Servers A and B reside within are associated with different route tables. Test Instance A sends traffic destined for 10.80.0.0/16 and 10.102.0.0/16 through VPC Peering Connections (pcx-xxxxxxxx), whereas Test Instance B sends traffic for 10.80.0.0/16 and 10.102.0.0/16 through the software VPN instance (i-aaaaaaaa).

VPC Peering Configuration

Configuration of the two VPC peering connections (pcx-aaaaaaaa and pcx-bbbbbbbb) took a matter of minutes. Connections were initiated from the us-east-1 side using the ‘Peering Connections’ form in the AWS console as shown in Image 4:

Image 4: VPC Peering Connection Setup

After accepting the VPC peering connection from ap-southeast-2 and eu-west-2, the only other step required was to ensure that the route table associated with Subnet A was configured to send traffic destined for 10.80.0.0/16 and 10.102.0.0/16 to the VPC peering connection. Image 5 shows the route table configuration for 10.80.0.0/16:

Image 5: Route Table Configuration

Software VPN Configuration

For the software VPN, I launched m5.large EC2 instances running Amazon Linux in the us-east-1, ap-southeast-2 and eu-west-2 regions. This instance type features Enhanced Networking enabled by default and offers network performance of up to 10 Gigabits. An important step to remember when setting up an EC2 instance for use as a VPN server, I disabled the ‘Source/Destination’ check within the instance networking options for my three servers.

For the software VPN product, I decided to use Openswan as it’s recommended by AWS as a suitable open-source product and one that I had worked with previously. Full configuration of the IPSec tunnels using Openswan is outside the scope of this post, but there are several articles that do a great job of explaining it step-by-step. An example of the IPSec configuration file to create the VPN tunnel between the us-east-1 and ap-southeast-2 AWS regions is shown in Image 6:

Image 6: Example IPSec VPN Tunnel Configuration File

In addition to configuring VPN tunnels within IPSec configuration files, I ensured that the security groups associated with the Openswan EC2 instances in ap-southeast-2 and eu-west-2 were configured to receive traffic from the Openswan EC2 instance in us-east-1. Similar to the configuration for VPC peering (see Image 5), the routing table associated with Subnet B in us-east-1 was configured to send traffic for 10.80.0.0/16 and 10.102.0.0/16 to the Openswan EC2 instance.

iPerf3 Tool

To test the bandwidth between instances in us-east-1 and ap-southeast-2, I followed an AWS recommendation and utilized the iPerf3 network benchmarking tool. iPerf3 is an extremely powerful tool that performs active measurements to determine the maximum achievable bandwidth on IP networks.

Reading the product documentation for the iPerf3 tool, a huge number of flags and parameters can be added and configured to test very specific networking scenarios. Full disclosure: My usage of iPerf3 within this post is primitive and barely scratches the surface of available functionality.

The iPerf3 package was installed on all test instances at command line through the ‘epel’ yum repository. Available to run in either ‘client’ or ‘server’ mode, I chose to run iPerf3 in ‘server’ mode on Test Instances C&D and in ‘client’ mode on Test Instances A&B. Running iPerf3 in server mode (-s flag) simply sets it up to listen for incoming connections on a specific port number, which I kept at the default 5201. To run in client mode (-c flag), you specify a command with a number of customizable parameters.

Running iPerf3 in server mode on Test instances C and D was done using:

iperf3 -s

and an example of running iPerf3 in client mode on Test instances A and B was:

iperf3 -c <Private IP of Test Instance C or D> -i 1 -t 10 -P 1

where:

-i = the interval time in seconds between periodic bandwidth

-t = the time in seconds to transmit for

-P = the number of simultaneous connections to the server to make


Test Scenarios

As mentioned above, my use of the iPerf3 tool was extremely limited in scope and by no means an exhaustive test. My goal was to examine how bandwidth between test instances was affected as the number of parallel connections fluctuated. I kept the parameters for both interval (-i) and time (-t) consistent, while varying the number of parallel connections (-P) between values of 1 and 16. My chosen values of P were 1, 2, 4, 8 and 16.

For example, the command used to test available bandwidth between us-east-1 and ap-southeast-2 over 10 seconds, measured in 1 second intervals with 8 parallel connections was:

iperf3 -c 10.80.1.169 -i 1 -t 10 -P 8

Each command was run a total of 15 times spread across 3 separate days. The results presented below represent the average of the 15 values collected for each value of P and represent the connection speed in megabits per second.


Results

Table 1 and Table 2 highlight the the average bandwidth speeds measured over the VPC peered and software VPN connections from us-east-1 to ap-southeast-2 and eu-west-2 respectively:

Table 1: Bandwidth results for us-east-1 → ap-southeast-2
Table 2: Bandwidth results for us-east-1 → eu-west-2

Comparing the two tables side-by-side, the major takeaway is that available bandwidth speeds from us-east-1 → eu-west-2 are considerably higher than from us-east-1 → ap-southeast-2. This is understandable given their geographic proximity, but it’s interesting to see that bandwidth is 2–3 times higher throughout. When looking to set up a DR site in an alternate region, understanding the available bandwidth between those sites and the specific requirements of your application is crucial to providing an expected level of service.

Switching our focus to examine results within each table, note that average bandwidth values over the peered and software VPN connections are comparable when the number of parallel connections is low. With values of P=1 and P=2, results are very similar. However, with P=4 we start to see a slight separation in the values as the peered connection bandwidth grows at a steady rate. As the value of P continues to increase, the gap between the available bandwidths of the two connection types continues to grow. The graphs shown in Image 7 and Image 8 help to illustrate the varying bandwidth speeds between the peered (red line) and software VPN (blue line) as the number of parallel connections increases.

Image 7: Graph of average bandwidth speeds from us-east-1 to ap-southeast-2
Image 8: Graph of average bandwidth speeds from us-east-1 to eu-west-2

Examining Tables 1 and 2, standard deviations of the peered connection bandwidth measurements are much lower than those of the software VPN. This indicates that the peered connection provides more consistent network performance and could be explained by the fact that the Openswan server is adding an additional source of latency to the connection. As you can see in Tables 1 and 2, the number of TCP retransmissions over the software VPN is consistently higher and occurs at a lower value of P, indicating a higher loss of TCP segments due to network congestion or packet corruption.

A second point of note is that the us-east-1 → eu-west-2 standard deviations are considerably higher than the values for us-east-1 → ap-southeast-2. Although the us → eu connection has greater bandwidth, both the peered and software VPN connections displayed greater inconsistency in their measured network speeds. The higher number of retransmissions within the us → eu connection indicates that times of network congestion and packet corruption may have caused such variability in measured performance.


Summary

The introduction of cross-region VPC peering in AWS caught the attention of many within the AWS community, including myself. Up until then, cross-region communication had to be configured using EC2-hosted VPN solutions.

In addition to reducing the complexity of configuring cross-region communication, cross-region VPC peering reduces infrastructure costs as it removes a requirement for redundant, EC2-hosted software VPN instances. Two m5.large EC2 instances cost around $150 per month. When organizations require this setup in multiple AWS accounts across a number of geographical regions, costs can quickly add up. In contrast, VPC peering is a free service where the only associated cost comes from data transfer charges. Such data transfer charges are also applicable between software VPN EC2 instances.

In addition to the benefit of reduced cost and complexity that cross-region VPC peering brings, I was interested to discover whether a peered connection would increase the speed and consistency of connections between resources. Summarizing the results discussed in the previous section, the basic tests I performed using the iPerf3 tool indicate that this is certainly the case. In addition to enjoying superior bandwidth, EC2 instances connected through VPC peering benefit from a more consistent connection than those connected via a software VPN solution. All things considered, there’s no doubt that cross-region VPC peering is a very welcome addition to the AWS feature set.