Benchmarking network latency on AWS C5 instances
And how it compares with Azure Standard_H16r
Earlier this year we published a benchmark that compared four cloud providers with respect to their performance for distributed memory calculations. The original text is available here.
Back then we found that network latency could be a problem for scaling distributed memory calculations on AWS. Earlier this month, however, the next generation of compute-optimized instances (C5) from AWS came into production and promised improvements in the network layer. We tested them and ran standard latency benchmarks for message-passing interface using the same compute environment as the one described in the article above. Results are included below.
In short— AWS is back in the game. There are significant improvements in the network latency and preliminary compute benchmarks we ran internally also demonstrate it.
AWS c5.18xlarge
Here are the results for two c5.18xlarge instances:
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec
0 1000 4.40 0.00
1 1000 4.85 0.21
2 1000 4.87 0.41
4 1000 4.87 0.82
8 1000 4.89 1.63
16 1000 4.86 3.29
32 1000 4.88 6.56
64 1000 4.88 13.11
128 1000 4.92 26.01
256 1000 4.99 51.30
512 1000 5.02 102.05
1024 1000 5.02 203.82
2048 1000 5.02 407.68
4096 1000 5.61 730.32
8192 1000 6.01 1362.72
16384 1000 7.03 2330.25
32768 1000 11.15 2938.57
65536 640 24.16 2712.74
131072 320 31.64 4142.52
262144 160 59.07 4437.71
524288 80 108.69 4823.81
1048576 40 193.53 5418.30
2097152 20 363.35 5771.71
4194304 10 690.70 6072.54
Azure Standard_H16r
Below are results for two Azure Standard_H16r instances that performed best in the Linpack benchmark mentioned above.
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec
0 1000 3.22 0.00
1 1000 3.23 0.31
2 1000 3.24 0.62
4 1000 3.24 1.23
8 1000 3.31 2.41
16 1000 3.24 4.94
32 1000 2.61 12.26
64 1000 2.63 24.32
128 1000 2.75 46.49
256 1000 2.79 91.84
512 1000 2.90 176.46
1024 1000 3.22 317.82
2048 1000 3.82 535.50
4096 1000 4.95 827.23
8192 1000 6.24 1312.82
16384 1000 8.02 2043.16
32768 1000 10.97 2986.38
65536 640 16.33 4013.73
131072 320 28.27 4636.91
262144 160 56.58 4633.06
524288 80 98.20 5338.97
1048576 40 185.56 5650.75
2097152 20 349.18 6006.00
4194304 10 686.60 6108.82
Conclusion
The new networking layer (ie. Elastic Network Adapter) introduced in C5 does make the latency better. And if you already crossed AWS off the list of high-performance computing vendors to partner with, you might want to check it out once more.