I almost lost 10,000 users

I made a big mistake when deploying new features for Toby, the Chrome extension I've been working on. Here are some of the things I learned in the process and how to avoid the same mistakes in the future.


Tuesday, Sept 28, 10:00 am

My team and I were stoked. All the new features were ready for the new version of Toby, the extension I had first released a couple weeks earlier, on September 13, on Product Hunt. The launch went pretty well, giving Toby thousands of users over the course of two weeks. We were proud and happy. Users sent us feedback, suggestions and lots of words of support, and I wanted to keep them satisfied with the app and maintain the momentum going for as long as I could. So I fixed bugs and added some key features the community asked for. We thought they would absolutely love it.

The new version of Toby, featuring Saving Session and a bunch of other cool things.

As I was preparing to deploy the new version, I noticed I should replace the screenshot on the Chrome Store because the UI had changed with new buttons and functionality. The screenshot should include the browser window, without showing some of the other extensions I use. So in order to capture the right shot, I decided to uninstall a couple of extensions.

As soon as I uninstalled Grammarly (awesome one, btw), I noticed that I was redirected to a web page with a form that asked me why I decided to give up on using it. Now: that was pretty clever, I thought.

If a user is uninstalling something, for whatever reason, that moment is super precious and it is the best time to collect feedback, bug reports or general user views on the product. It is the precise moment the user is not satisfied with the current solution or thinks it doesn’t meet their needs, so putting them in touch with developers at that point is a very smart choice.

Upon uninstalling Grammarly, the user is redirected to a form and is asked to provide feedback

Grammarly's uninstall form was simple, yet personal and easy to fill out. It showed users that the team cares about their individual experience and gave users a chance to express their reasons or frustration. It allowed them to be heard. Well done!

It took me less than 30 seconds of googling to figure out that setting up a URL to be opened when uninstalling Toby was technically quite simple. All I had to do was add one line of code to my extension:

chrome.runtime.setUninstallURL("http://myURLhere.com");

Of course, I would also have to create the page and the form itself, but that'd be super straightforward.

So I made the best decision of the day: I decided to create a similar form for Toby and include it on that release. It wasn’t fancy or anything, but if users chose to remove the new version of Toby, they’d see a page telling them that we’re deeply sad to see them leaving and asking for feedback. They could then write a quick message, optionally add their name and email, and every entry to the web form would immediately be sent to my email inbox. Pretty neat, eh?

When users uninstall Toby, they see this form.

I hoped that the form would be helpful to understand user reasons and behavior, but obviously I didn't want it to be used much, as I prefer to see users installing Toby rather than getting rid of it. After pushing the form to the site, I was confident everything was ready for the new update. I had tested the new version a million times and I was eager to use it myself with my own Toby lists. It seemed like all was good to go and it'd be another success.


Tuesday, Sept 28, 11:00 pm

As most of the users were in the United States, I figured that night hour would be a good time to release Toby. I live in Vancouver, Canada (PST), so I thought that publishing at around midnight would get most users to wake up the next day with a fresh new Toby on their computers.

The production file had been built and uploaded to the Chrome store earlier, so it was all ready to be sent to each one of the thousands of users across the globe. I was one click away. I remember feeling nervous for a couple minutes. But then I took a deep breath, clicked the button and boom, it was done.

Or at least I thought so.


Tuesday, Sept 28, 11:59pm

The Chrome Store takes up to an hour to publish changes to users. Only then, browsers start getting updated automatically, one by one, a process that may take several hours because the browser only checks for new extension versions every now and then.

About an hour after publishing Toby, I was getting ready to sleep when my watch notified me that I had a new email from the website. The subject read: “Toby uninstall”, which meant it was sent from the web form I had set up earlier. The message was “it’s crashing”.

That was certainly not the first message I expected to read after the release.

I barely had time to process that email when a second one came, 30 seconds after the first. “The new version it f**ckd up”, it read. It was also from the uninstall form. And then another one. And another one. And over the next few minutes, I had ten uninstall emails telling me the extension was broken.

I quickly opened my laptop and desperately tried to figure out why so many people were saying it was crashing. Everything was working perfectly for me, I thought, while checking the new version on a development environment. Then I switched to my personal profile on Chrome, which has the production version I use on a daily basis, and saw it hadn’t been updated to the new Toby yet. Then I remembered I could force my own browser to check for updates, clicking on “Update extensions now”.

I must have clicked that button five or six times until Toby was finally updated.

When I opened the new tab, what I saw was horrifying. Toby was refreshing infinitely, making my new tab screen completely unusable and ugly. Of course I was receiving so many emails past midnight! The user was left with no other choice but uninstalling or disabling the extension completely. Out of all bugs it could have had, this was possibly one of the worst.

Toby's bug captured and tweeted by Maxime Quillévéré

Even looking at this gif right now feels agonizing.

So my first step was trying not to panic. Before spending any time trying to figure out the whys or how to solve the horrendous bug, I absolutely had to put that aside and focus 100% on reverting the changes as quickly as possible and trying to contain the spread of the buggy version. I didn't want to lose thousands of users, and I knew that would happen if I didn't do something quickly about it.


Wednesday, Sept 29, 00:15 am

Thanks to versioning systems and Git, reverting changes is actually quite simple and can be done super quickly. I had all tags in place on my master branch, so I could quickly roll back to a version I knew was working perfectly — version 0.1.5. The buggy version was Toby 0.2.0. Unfortunately, the Chrome Developer Dashboard does not allow developers to revert changes, so even though I rolled Toby back to 0.1.5, in order to publish it again, I had to relabel it version 0.2.1, making the store think that it was more recent than the buggy one. With that in place, hopefully user browsers would stop getting 0.2.0 and would get 0.2.1, the old one, instead.

In a matter of minutes, I had the good old Toby on the Chrome Store again. Phew!

There were only two more problems:

  1. The Chrome Store takes up to an hour to make changes available, which meant users would keep getting the bad version for some time.
  2. Browsers only check for new versions every 5 hours, so users who already had the buggy version would not get the fixes in several hours.

For the next hour or so, I just waited anxiously for Chrome to update my changes. I was receiving a BUNCH of uninstall emails from people who were super frustrated with the buggy behavior. At that point, there was nothing else I could do other than wait and hope for the best.

Through Mixpanel, I could follow analytics in real time and see people who got their extension updated to the bad version. The number was increasing by the minute, I kept receiving emails, and I was getting more and more nervous as hundreds of people were getting affected by the bug.

The graph shows that it peaked 497 unique users affected by 12am. In total, 651 users were affected.

But at around 1am, the buggy version was replaced with the old Toby, now relabeled as 0.2.1, and those numbers stopped rising. The spread was contained. No more users would get the bug, at least. And at that point, I could only hope that the 651 people affected by the bug would get the automatic fixes as quickly as possible, even though I knew that would most likely take 5+ hours.

I obviously continued to receive emails about the problems for several hours, and I tried to address each one of them.

Some were extreme, like this one guy who told me that Toby basically "ruined his life completely" because all his important tabs were there and he had to uninstall it.

And some were incredibly supportive and brought a smile to my face, like this one user who said she "loves Toby almost more than breakfast foods" and couldn't wait to see it get fixed and try all the new features soon.


Wednesday, Sept 29, 03:45 am

After a few hours dealing with all the stress from the failed release, responding to emails and tweets, and working on fixes, I finally figured out what happened and could totally reason the events of the night.

The absolutely most requested feature from users was that they could use Toby along with other extensions that populate the new tab, such as Momentum. For that, people wanted to have the option of choosing whether Toby opens on every new tab or not. I honestly think Toby works best on every new tab (because then it's always on your face and makes you organize tabs constantly and become more productive), but we wanted to address the users' request.

So, I added a preferences menu with some customization options, including the option to turn off "Toby opens on every new tab". The default is on.

This worked perfectly fine for new installations of Toby, and I could verify that through the hundreds of times I tested the new version.

However, it didn't work for users who already had Toby to begin with and were simply upgrading.

This was my aha moment. It turns out I tested the new version quite well, but I never attempted to get the automatic updates from the old version to the new one through the Chrome Store on staging.

The logic behind optionally opening Toby was something like this:

if (tab is a newTab AND user wants to open Toby on a new tab) {
openToby();
}

However, for users who already had the previous version, the new tab was also Toby itself. So in practice, the following was running:

if (tab is Toby and user wants to open Toby) {
openToby(); //openToby() caused this condition to trigger again
}

Users who got the automatic update were trapped in an infinite loop.

At 3:45am, the problem was finally fixed, but I was incredibly tired and decided to call it a day and release the fixes later, after I had some rest — the risk of spreading the bug was gone, I had reverted the version, but I'd have no energy to act quickly again in case something else went wrong.


Wednesday, Sept 29, 10:00 am

The new version was again ready, named version 0.2.2. The redirect bug was fixed and all the nice features were there. I decided to try it again and deploy it.

This time, though, instead of publishing to all 100% of users at once, I figured I should start slowly spreading the new version. In their documentation, they call it Controlled Rollout. I started deploying to only 5% of the users.

You can do that easily from the Chrome Developer Dashboard.

Controlled Rollout — deploy to a percentage of all users.

As soon as I saw some regular activity on Mixpanel using their Live View, with users who had updated to version 0.2.2 and were able to perform regular tasks like adding tabs or editing lists, I knew it was working and free of the reloading bug.

So I increased the deployment to 25% of users. After some time and some good feedback, I increased it to 50%, and then finally to 100% of users.


Accounting for losses

In total, 651 people were affected. 
Out of those, 575 actually saw the C-R-A-Z-Y bug.
Out of those, 258 uninstalled Toby (in the worst case scenario, all because of the bug).
Out of those, 49 filled out the uninstall form and told us why.
Those who waited and got the fixes didn't lose any data.

Now, considering that we had about 10,000 active users at the time, it doesn't sound like it was such a big deal to lose a couple hundred. But it felt wrong and disappointing, and I'm glad I was able to act quickly.

My friend and colleague, Mack Flavelle sent me a well phrased Slack message:

This could've been a TRAIN WRECK. Turned out to be a small car accident.

Lessons learned (TL;DR)

If you're building a Chrome extension, and especially if you already have users depending on it:

1. Make sure you test it well.

Don't just test the new version, test what happens if you update it. Don't just test it locally: deploy a staging version of your extension to the Chrome Store, install it, deploy the updated version to staging, and make sure you test what happens when you receive the automatic updates. You can make the staging one restricted to test accounts only.

Toby has Production and Staging versions on the Chrome Store.

2. Set a uninstall URL. Send users to a web form. Ask them why.

When users uninstall your extension, that's the most valuable moment to capture their frustration and feedback and understand why people are bleeding from of your app. As I explained above, it takes one line of code to configure this. This is the one thing that made all the difference in alerting me quickly that something was wrong.

3. In the case of failure, don't panic. Just act as quickly as possible.

Everyone makes mistakes. Take responsibility, do your best to revert it and bring a solution as quickly as possible. Be brave to face users and explain what happened. Some people will be pissed at the tiniest errors, others will be understanding and even offer to help. Just accept that this is what it is and make the best out of it.

4. Tell your story, the good and the bad.

The first thing I did after all this was over was presenting it to the whole team. I admitted the failure and all the risk that was at stake. At the same time, I walked them through my thought process, showed them what I did wrong, showed them what I did right, and allowed others to learn from this experience. By writing this blog post and sharing the story again, I hope it serves to help others.

Final words

As the time of writing, about 28,000 people have installed and tried Toby. While we're hoping to reach thousands more, I am proud of the work we've accomplished so far and so grateful and happy for all the words of support, feedback and ideas we get all the time.

Toby helps me be more productive, manage my tabs with ease, and ultimately get things done. If you'd like to try Toby, install it here and send me your thoughts: hello@gettoby.com .