Pokémon Go Fest: A Lesson in Bad Error Handling, Testing, and Planning

I’ll admit, I haven’t played Pokémon Go for a long time now. And, I didn’t go to Pokémon Go Fest. So, if you’re looking for a guy with first-hand experience from the event, I’m not that guy. But, I am a Chief Technical Officer at a software company. And, I am a music conductor who puts on events for thousands of people. I do know a little about software and event planning.
Niantic does both VERY wrong! And, it cost them almost a half million dollars today. They refunded everyone their entry fee at the Pokémon Go Fest. But, from its launch a year ago until now, it has cost them a whole lot more than that in lost revenue and bad publicity.
Check out the headlines from today. “Pokémon GO Fest Plagued By Mile-Long Lines And Connectivity Issues In Chicago”. And, “Pokémon Go Fest attendees to get refunds as technical issues break the event”
Read some of the articles, and you’ll feel like you’re transported back to when Pokémon Go launched.
Remember? It crashed all the time!
Oh, never mind. It still does!
Error Handling
The problem is bad error handling. If you’re not a programmer, bear with me. I won’t be in the code very long.
From what digging I was able to do, it appears that Niantic wrote Pokémon Go in Java, C++ and C#. There is also a little Objective-C and other languages sprinkled in as well.
Guess what? In every one of those languages, you can check for errors and give the user an error message instead of crashing. As a matter of fact, every modern programming language has a way to handle errors. Even archaic languages like Cobol from 1959 include error handling capabilities.
Here’s how you check for an error in C#
try
{
// do something that might crash
}
catch(Exception)
{
// handle the error and retry
}It works the same in Java and C++. It’s not that hard. Whenever you do something that might run into an error, handle the error in the code.
Connecting to a cell tower? Check for an error and retry if it fails.
Looking for a GPS signal? Check for an error and retry if it fails. Or, you can use the last known good position if it was within a certain amount of time. Better to use stale GPS data than croak.
Need to connect to Niantic servers to get data? Check for an error and retry if it fails.
I see a pattern here. Check for an error and retry!
Now, to be clear, I’m sure there are thousands of places where Pokémon Go code is performing these operations. It can be a massive pain to handle errors in a huge codebase like Pokémon Go. But, it is possible. And, even though error handling isn’t fun or sexy, it is necessary.
Testing

But, exception handling alone isn’t enough. We need tests. Tests allow us to exercise the code we’ve written and make sure it all works as expected.
What happens if the game loses connection to a cell tower in the middle of a battle? Write a test for that and makes sure it doesn’t crash the game. Tests allow you to simulate every possible error that could happen ahead of time. Then you can write your code to handle the error with grace, not a crash.
Let’s face it. Niantic isn’t testing their game very well. My son plays Pokémon Go on his iPad. I’ll admit, this isn’t a typical use case. His iPad doesn’t have GPS and it doesn’t have a cell connection. So, he connects it to my wife’s Wi-Fi hotspot and plays it that way. All kinds of wonky stuff can happen that way. The GPS data Niantic gets from the device comes through my wife’s Wi-Fi and is sketchy.
Guess how often Pokémon Go errors on him? Every single time he plays for more than a few minutes! Sometimes the whole app crashes. Recently he’s been going on raids and it will freeze when it’s time to catch the raid boss at the end. He’ll end up with nothing at the end of a raid almost half the time. No boss, no rare candy, no items, no nothing.
Sure, my son’s case isn’t typical. But neither was Pokémon Go Fest. If Niantic was testing all those weird “corner cases”, there wouldn’t be issues. Good testing can find both server and in-app errors. You can test for every possible error!
At EndFirst, we shoot for 100% code coverage. That means that there is a test that will exercise every single line of code in the code base. It’s not rocket science, it’s good coding practice. And, it saves you from getting a black eye when your software crashes. Some people argue that it costs too much money to test everything. Well, I submit Niantic as proof that it is too costly NOT to test!
Planning

Planning and testing go hand-in-hand.
Niantic knew how many people they’d have at Pokémon Go Fest. They knew from a full year worth of experience how much data Pokémon Go consumes.
Did they call the cell companies to see if they would be able to handle 20,000 people at once? Did they bring in the AT&T “Cell on Wheels” which they use at Lollapalooza (also held at Grant Park)? Did they use Verizon “small cells” to boost capacity? Even with these upgrades, people complain that Internet connection during Lollapalooza is sketchy. At Lollapalooza, people are happy to be able to call, text, and upload a picture or two. Not so at Pokémon Go Fest. It’s all about the data for Pokémon Go Fest.
What if it was too costly to upgrade cell reception for the festival? They could’ve at least made enough Wi-Fi capacity available. Tech conventions do this all the time. Without a stable and strong data connection, Niantic doomed Pokémon Go Fest.
But, it wasn’t only the connectivity issues. Players had to wait in a line over a mile long to actually get into the event according to reports. Having to wait in line for 4+ hours to get into an event seems ridiculous. While 20,000 is a lot of people, it is a paltry number compared to the 400,000 that attended Lollapalooza in 2016. Grant Park can definitely handle 20,000 people! There are lots of ways to reduce lines and wait times.
Worst of all though, Niantic servers couldn’t even handle the load they had on them. They should’ve planned for 20,000 people slamming their servers at once from one location. And, they should’ve tested out that load. But, it wasn’t only the load at Grant Park that caused servers to crash. Niantic encouraged people all around the world to play Pokémon Go today to help unlock Pokémon. Niantic had to have been dealing with 10–100x the amount of load on their servers today than usual. They should’ve been able to test and prepare for that.
Niantic is a software company, but the servers issues they were having today were also due to their code. Their CEO admitted to authentication issues and a bug in their code. Based on my experience in software, I’d bet they’d seen both of these issues before Pokémon Go Fest. They chose to ignore them and hoped things would be fine.
Crossing your fingers and hoping for the best is not a plan.
Fixing issues, planning for load and testing like crazy is a real plan.
Conclusion
To be fair, I’m sure Niantic did planning and testing to prepare for today. They also use error handling because not every problem causes crashes. And, Pokémon Go is more stable than when they launched a year ago. But, they failed today because they didn’t plan, test and handle errors well enough.
Software is hard to write and test well.
Errors are never fun to handle, but must be!
Events are hard to plan and execute well.
I hope Niantic can learn to do these things better. If they can, the next Pokémon Go will go off without a hitch.
If Niantic would like some help, I’m available!
