A Ping That Saved Me From Madness

Tuan Pham-Barnes
Literally Literary
Published in
7 min readAug 11, 2020

Follow my journey into troubleshooting internet disruption issues after switching service providers.

Photo by Joshua Sukoff on Unsplash

Now that we are in an active-pandemic world, those lucky enough to continue remotely working rely on fast and reliable internet service. I’m a software developer by trade and run multiple video meetings a day to keep in touch with our engineering team. All of our systems and tools are in the cloud these days, and we expect that internet service is like electricity and running water.

Working at home now requires me to be the “I.T.” guy. When the service goes down, I will hear it from one of my kids before I even realize it.

“Dad, the internet is down!”

Of course if “the internet” was down, we’d have bigger problems, but I know what they mean. The connection to our service provider has been severed, so they have to pause their online lives until I can solve it.

Troubleshooting Basics

The first thing I do is to turn off the wifi on my phone and switch to cellular LTE to check if there is a localized outage. Having cleared that, I check the lights on the modem to confirm they are all green. If not, a reboot gets us back online.

The same goes for the mesh pucks that we have around the house, if it shows red, a reboot usually restores the connection.

“Ok, it’s back up!”

I feel like the hero and go back to what I was doing, and they return to their TikTok, Instagram, Fortnite, and Netflix sessions.

Other issues expose the kids’ lack of understanding of wireless coverage areas, channel interference, and range. To remediate their complaints, I have tuned the wireless network for decent coverage up to our front street and back into the alley.

Slowness in the system is now attributed to multiple streams of traffic on the network, splitting the bandwidth across dozens of devices, including our security cameras and electronic home assistants.

More Speed == More Bandwidth?

We were choking on our bandwidth with the increased traffic of video meetings, especially during school days with simultaneous remote learning and remote work conference calls. I decided to change from cable internet service to fiber service; now that fiber was available in my area. Full-duplex 1 Gbps over the 100 Mbps, with a 22% cost saving, seemed like a no-brainer.

The ordering of the service was straightforward and smooth over their web site, including scheduling when a technician would install the equipment. I couldn’t wait to see how fast I could pull down source code and navigate our virtual instances.

The install took a couple of hours with no issues. I was so elated as I plugged my laptop into the ethernet and saw 950Mbps download and 930Mbps upload. I connected to the wifi network and saturated the 5Ghz band at 430Mbps.

A quick test navigating to reddit.com, medium.com, and cnn.com was good. I streamed a Netflix show and Apple music and then tested our digital electronic assistants and security cameras. Everything checked out and seemed to be working well.

Speed !== Reliability

After a few hours, the fiber nirvana came to a screeching halt when I heard a yell from upstairs.

“Dad, the internet is down!”

This was my start in a downward spiral into madness. The Netflix client on our TV would abruptly unload itself at random times. Sites would take an unusual amount of time to load, including Google searches. Zoom meetings would freeze with no audio. Our digital assistants would not answer due to no internet connection. Our security camera videos would blank out, went offline, and returned at random intervals.

There were no reported localized or wider outages, but sustained speed tests returned with typical results. Internal diagnostics tests on the modem all passed. The lights on both the fiber ONT box and the modem were indicating normal operation.

Resetting the modem box seemed to stabilize the connection for a short period and then dropped and delayed connections returned. I was not going back to cable internet and paying more for less.

I was determined to solve this!

Try All the Things!

A game of elimination is the first step to troubleshoot the issue. I unplugged the wifi network and plugged ethernet directly into my laptop. Maybe a device on the wifi network was flooding it with invalid packets, causing a faux DDoS attack.

With my laptop as the only device connected and a web browser, the single application running, it should give weight to that theory if it pans out. That experiment didn’t work; still, the problem persisted.

Maybe the DHCP server was an issue and not renewing IP addresses correctly? I changed to a using static IP address on my laptop, within the defined range. No dice, same issue.

I disabled the DHCP server altogether on the modem and kept the static IP address. Nope, no change.

Could it be the default DNS server causing long lag times? Pinging it returned with decent responses under 15 ms. I changed it to use Google’s DNS server at 8.8.8.8 and even using CloudFare’s DNS at 1.1.1.1. All were returning good responses on pings, but the delays and disconnections continued.

I decided to place the modem into an IP bypass mode to use my router. After some considerable research, I was able to expose the WAN IP address to my router and grant LAN IP addresses through my router’s DHCP server. It was a failed exercise; no change.

How about downgrading the firmware on the modem? I went through the process of flashing the firmware through a few minor revisions and down through a major revision. It failed again.

It was time to call the service provider and convince them to send me a new modem. They claimed all the tests on their end passed, and they did not see any problems. I sent them a log of the errors, and they decided to send a new one.

I was convinced that the first modem was a lemon and that a replacement would solve all our issues. We dealt with the current problems by restarting the modem a few times throughout the day and waited for the new one to arrive.

A few days later, the new self-install modem was replaced. My frustration continued. Did I get two lemons in a row, or was that model just flawed?

Never Give Up

I was annoyed but not defeated. There has to be some combination of settings and configuration I have not tried. While diving deep into forums discussing the model (BGW210–700) and all of the issues related to it, there was a one-sentence comment that would have been easily overlooked, but it struck me as weird.

“When the issue happens, even visiting the modem’s internal administration pages has a delay.”

That was odd. I tested it, and it proved correct. Why would visiting a page served on a local web server on the modem be slow? It should be almost instant. After the initial load, the rest of the pages were fast. I’ve noticed this during the multiple times changing the configuration during troubleshooting. Does the server go into idle or sleep mode, and does that cause the problem?

Let’s test it. I sat on the administration page and continuously hit refresh for over 5 minutes, and the delays and disconnections STOPPED! WTF?

The Ping

If refreshing the page keeps the admin web server awake, then would a constant ping to the server work? I opened up a terminal window on my MacBook and typed in:

ping 192.168.1.254

And I let it run continuously. Low and behold, it worked, it kept the server from idle or sleeping. There was rarely a disconnection or delay. The ultimate test was to run the Netflix client on our TV and stream a movie. I randomly chose a show and let it run. Two hours in, and it was still playing without it crapping out. The security cameras stayed online as additional proof that this hack worked.

Coming from the vantage point of a software developer, why would there be such a mechanism? Were they thinking about saving energy or optimization of heat dissipation due to a process continually running on a seldom-used feature? Regardless of the reason, why is it tightly coupled with the main operation of the modem’s main routing feature to the internet?

This is probably a design flaw in the firmware that needs to be addressed in the next upgrade. I hope it’s not a hardware issue and that new firmware will fix it. In the meantime, I will continue to run this temporary hack.

What’s Next?

I have an old spare Android tablet connected through wifi to our network, constantly pinging the modem’s administration page to keep it from idle. It seems to be overkill for a tablet to run a simple ping. It could perfectly fit as a Raspberry PI project, though.

Stay tuned for any updates if I decide to tackle that project. I hope this article helps with other users that have experienced the same issues. Or is this a unique problem surfaced by my network architecture?

Troubleshooting an intermittent issue is rarely trivial, and it will take you through unexpected paths. My experience hunting down defects in software engineering prepared me to investigate all clues, regardless of how insignificant or unrelated it may seem. This ping has saved my sanity, and I was able to play a hero for my kids again.

--

--

Tuan Pham-Barnes
Literally Literary

I write code, flash fiction, commentary, and poetry; sometimes my code reads like poetry and my fiction becomes flash commentary!