How I Maximized the Speed of My Non-Gigabit Internet Connection

Tips from an engineer at Ookla

Published in

Speedtest by Ookla

7 min readMay 2, 2017

*Join Brennen at his Reddit Ask Me Anything*

My name is Brennen Smith, and as the Lead Systems Engineer at Speedtest by Ookla, I spend my time wrangling servers and internet infrastructure. My daily goals range from designing high performance applications supporting millions of users and testing the fastest internet connections in the world, to squeezing microseconds from our stack — so at home, I strive to make sure that my personal internet performance is running as fast as possible.

I live in an area with a DOCSIS ISP that does not provide symmetrical gigabit internet — my download and upload speeds are not equal. Instead, I have an asymmetrical plan with 200 Mbps download and 10 Mbps upload — this nuance considerably impacted my network design because asymmetrical service can more easily lead to bufferbloat.

We will cover bufferbloat in a later article, but in a nutshell, it’s an issue that arises when an upstream network device’s buffers are saturated during an upload. This causes immense network congestion, latency to rise above 2,000 ms., and overall poor quality of internet. The solution is to shape the outbound traffic to a speed just under the sending maximum of the upstream device, so that its buffers don’t fill up. My ISP is notorious for having bufferbloat issues due to the low upload performance, and it’s an issue prevalent even on their provided routers.

As a result, I needed the ability to shape traffic over 200 Mbps speeds — this prevented me from using MIPS or ARM based routers, as they don’t have the CPU horsepower to route over ~150 Mbps without hardware offload (I was actually using Tomato on an Asus AC68U at the time). Very few routers provide the ability to shape a single direction of traffic in software, thus I had to find a solution that could handle bi-directional shaping over 200Mbps. While many of the Ookla Engineering team use Ubiquiti Edge Routers, their CPU limits their traffic shaping performance to the following:

ERLite-3 and ERPoe-5: below 60 Mbps most likely will work, above 200 Mbps most likely will not work
ER-8: below 160 Mbps most likely will work, above 450 Mbps most likely will not work
ERPro-8: below 200 Mbps most likely will work, above 550 Mbps most likely will not work
ER-X and ER-X-SFP: below 100 Mbps most likely will work, above 250 Mbps most likely will not work

Editor’s note: Since this article has been published, it is now possible on recent firmwares to perform traffic shaping in a single direction on the EdgeRouter platform.

Requirements

Thus, my router requirements were as follows:

x86-64 based hardware with a TDP less than 15w.
Strong support for native IPv6—many studies have shown IPv6 leads to a faster web browsing experience.
Ability to perform Point to Point VPN and Split VPN tunnels.
802.1Q VLAN Tagging — I run three separate logical networks that correspond to respective SSIDs on the APs:

LAN network: the normal network we use in daily life. Has access to split VPN tunnels, Sonos devices, FreeNas storage server and Xen hypervisors.
GUEST network: network we place guests on, has no access to other networks/resources and is inbound rate limited to 100 Mbps.
IOT network: network for IOT devices, has no access to any other networks beyond WAN and is inbound rate limited to 5 Mbps. This is split off for security reasons, and the IOT devices we use (TPLink Smart Home, alarm system, security cameras) handle NAT translation without issue. (Note: these are affiliate links.)

I grabbed this small x86–64 server and combined it with 4GB of Kingston DDR3L and a 32GB Adata SSD. The key points to this server are fewer but higher frequency cores and 4xGBE Intel NICs. Intel NICs have some of the best support in the Open Source world, and it’s highly recommended to stay far away from the cheaper companies. This machine doesn’t have AES-NI or Intel Quickassist, but it hasn’t had any issues with encryption/decryption at line rate for the VPNs.

The actual routing

Once assembled, I installed PFSense 2.3 for handling the actual routing. For those who haven’t used PFSense, it’s an incredible routing operating system that is based on FreeBSD. It easily met the requirements above, and vastly surpassed them. I was able to apply CodelQ AQM shaping to outbound traffic to prevent bufferbloat, along with splitting the ISP provided IPv6 /60 into /64’s for my 3 VLANs.

In my research and testing, I also evaluated IPCop, VyOS, OPNSense, Sophos UTM, RouterOS, OpenWRT x86, and Alpine Linux to serve as the base operating system, but none were as well supported and full featured as PFSense. The closest runner up to PFSense was VyOS as I love the declarative CLI interface and read only primary/backup partition system, but there were a few reasons which blocked me from using it:

VyOS doesn’t support IPv6 Prefix Delegation in the stable branch.
The stable branch is based on Debian Squeeze, which is quite old. There’s a Debian Jessie version, but it’s considered experimental.
Sadly, their development team and pace has shrunk considerably since the initial Vyatta fork.

PFSense isn’t without its issues, but it’s perfect for my use case. The biggest issue I had was the default DNS configuration. On PFSense, the DNS server (unbound) is set to function as a recursive resolver rather than a forwarding server. While this might have a security benefit in edge cases, the performance impact on lookups is substantial — web browsing was jerky as domain-sharded assets had slow lookups.

QoS Settings

Some people have asked what QoS settings I use in PFSense. I avoided the default wizard QoS settings because in general, I try to avoid proto/port classification. The majority of traffic on the modern web is TCP 80/443 with a smattering of UDP 53, so HSFC class based QoS isn’t as effective as it used to be. However, every case is different, so I’d love to hear about your rule setups.

I essentially emulated FQ-CODEL by placing a FAIRQ scheduler in front of a CODELQ queue. CODEL is capable of prioritizing streams and dropping packets when backoff is necessary, so it’s been highly effective in high contention scenarios. For the very curious, here’s a representation of the QoS tree I have setup in PFSense:

WAN - Scheduler: FAIRQ | BW: 12531 Kbps
 └── WAN_main - Options: Codel Active Queue | BW: 12531 KbpsGUEST_LAN - Scheduler: FAIRQ | BW: 100 Mbps
 └── GUEST_LAN_main - Options: Codel Active Queue | BW: 100 MbpsIOT_LAN - Scheduler: FAIRQ | BW: 5 Mbps
 └── IOT_LAN_main - Options: Codel Active Queue | BW: 5 MbpsLAN - (No limits/queues)

Why 12,531 Kbps?

For the eagle eyes out there — why was my upload speed shaped to 12,531 Kbps when my connection is 10 Mbps up?

The answer is two-fold. First, DOCSIS connections are often over-provisioned to make sure that, even with loss in the cable/modem, users will probably hit the speeds they pay for. So running a Speedtest on my 10 Mbps connection without shaping actually revealed ~13 Mbps. However, I needed to find the point that maximized the upload speed while not filling the buffers of the upstream device.

To find this point, many tutorials recommend “take a Speedtest, and then subtract 20%” — I argue that this is incorrect, as a flat percentage may be too much or not enough. To find the optimum point — I essentially did the following pseudocode:

Turn on QOS with upload at expected speed Start a few massive uploadswhile (true):
    if (UDP loss > .5% and ICMP latency change is impactful):
        reduce QOS upload speed by 1%
    else:
        increase QOS upload speed by 1%

This can easily be done by hand, and takes about 5 minutes of tweaking to perform. Thus, I settled at 12,531 Kbps as the highest upload speed possible without any impact on my service.

Distribution to client devices

The router then trunks to a HP Procurve 1810G switch, that then passes tagged VLAN traffic to three Ubiquiti UniFi AC Pro AP’s spread around the house. Untagged traffic then goes to other ethernet based devices.

PFSense has great monitoring tools to measure the health and quality of a connection, but I wanted to track the speed of my connection. I built a little Node and React app called speedlogger that takes a Speedtest every 8 hours and plots it in a pretty graph.

Was it worth it?

Absolutely.

As with any experiment, any conclusions need to be backed with data. To validate the network was performing smoothly under heavy load, I performed the following experiment:

Ran a ping6 against speedtest.net to measure latency.
Turned off QoS to simulate a “normal router”.
Started multiple simultaneous outbound TCP and UDP streams to saturate my outbound link.
Turned on QoS to the above settings and repeated steps 2 and 3.

As you can see from the plot below, without QoS, my connection latency increased by ~1,235%. However with QoS enabled, the connection stayed stable during the upload and I wasn’t able to determine a statistically significant delta.

That’s how I maximized the speed on my non-gigabit internet connection. What have you done with your network?

If you made it to the end of this article, you’re probably pretty nerdy like us. We are looking for skilled engineers and developers — if that’s up your alley, check out our postings on: Workable.