Status Blog
The status blog is one of those unsexy, boring pieces of startup infrastructure that you need to have but no one really wants to do. Your site is going to be down for a 1-hour database migration – someone needs to update the status blog.
Nowadays you would also put downtime notices on Twitter. But this may not work as well if the service you are running is Twitter. Also, if your service is Blogger, you should find another blog hosting provider for your status blog.
Those are a couple things I learned from experience as is this story about updating the Twitter status blog at the request of the US State Department.
I’ve always been the guy who updates the status blog.
As a young product manager it was an easy, useful thing I could do for the dev team. Less altruistically, by being the Official Status Blog Updater, I had a legitimate reason to ask the dev team for updates during a crisis. “I’m trying to save us all a lot of pain, tell me what’s going on and you won’t have to tell everyone else a thousand times.”
With Twitter, updating the status blog became somewhat of an artform. During the worst of our scaling challenges in 2008 and 2009, I was continually finding new ways to say “I’m really sorry that things aren’t working better. Believe me, I want them to work better too and we are working very hard to get things fixed. Here’s what we know so far…”
By mid-summer of 2009 things actually had started to get a bit better. We had moved to a new hosting provider – NTTA – which solved a lot of problems by providing us greater transparency into how the system was actually performing. And we were routinely upgrading the hardware configuration of the site to add machines and work through our bottlenecks.
One of the larger changes we needed to make was to move off of an old, slower network. This would be great for the service but would require downtime. It would also require a coordinated effort between our ops and engineering team as well as the network ops team at NTTA. Scheduling the date of the network move proved incredibly difficult and there were a couple attempts that got waved off.
timeanddate.com
Finally on the morning of Monday, June 15th, 2009 we announced that there would be a 90 minute downtime that night at 9:45p Pacific. You can read my status blog update here. Some points I would make critiquing my own work from 3 1/2 years ago:
1. The update should have been posted sooner. We were only giving folks 12 hours and 24 hours is preferred. Longer than 24 hours isn’t great either because in addition to updating the status blog we also put a notice in the web UI linking to the post and you don’t want DOWNTIME NOTICE in front of people for too long.
2. Linking the time of the outage to timeanddate.com so folks could see what time it would be for them: PRO STYLE.
3. Scheduling the downtime during a major pro-democracy protest in Iran: NOT PRO STYLE.
It turns out that on Friday, June 12, 2009 Iran held presidential elections in which Mahmoud Ahmedinejad won a disputed victory over pro-democracy opposition candidates. Protests erupted in Iran over the weekend and, in response, Iran shut off Internet access to suppress the protestors ability to organize and communicate. However, the mobile operators remained up and since Twitter worked over SMS, it became an invaluable tool for coordination. On the morning of Monday, June 15, one of the opposition leaders, Mir-Hossein Mousavi, planned to make his first post-election appearance in Tehran’s Freedom Square before 2 million pro-democracy protestors.
If you look at the pro-style timeanddate.com notification in my blog post, you’ll see that 9:45p Pacific time on Monday, June 15 corresponds to 9:15a Tehran time.
#nomaintenance
After the intial status blog post, we immediately started getting tweets from users about the planned downtime. People were pleading with us to not take the maintenance window because of the importance of the pro-democracy protest. The hashtag #nomaintenance was the top trend by midday.
We were sympathetic to these complaints. But we were pretty convinced we had to keep the maintenance window as planned. First, we really needed to get off the old network. There was good evidence to show that if we didn’t the increased usage we were getting would not be supported by our current network. And the downtime that would be caused by an unplanned maintenance would potentially last days.
Second, we had real qualms about wading into geopolitical affairs and saying “We think these protests are so important that we want to enable them to go forward.” One of our core beliefs was that it was the users who would determine what to do with the product. We didn’t want there to be the perception that we were grandstanding for a cause. Our goal was to be a platform for open communication but we didn’t have a side. (And we certainly didn’t kid ourselves into thinking that we knew enough about Iranian politics to know who the right side even was.)
You’ll see I updated the status blog around 2p affirming our position. (That’s also pro-style by the way, posting as an update instead of changing what was there.)
You’ve got mail
But then in the late afternoon Biz got an email from a contact at the State Department. They wrote to us saying, in effect, we think that Twitter is playing a critical role in the pro-democracy protests and this is the largest civil unrest seen in Iran since the fall of the Shah. We are not telling you what to do but without exaggeration we think it is vital that you not take this downtime.
Biz and I found ourselves in the position of getting on the phone with our Ops Manager and I remember Biz saying, “So I know we absolutely need to take this downtime and there’s no way to reschedule it, but what if we did.”
One of the wrinkles was that we were jeopardizing not only our own uptime by moving the maintenance window but also that of other services sharing our network. Fortunately our hosting provider and team found a way to be flexible and all that was left was to update the blogs.
Twitter was around 30 or 40 people at that point, I think. We were in our second office on Bryant St. in SOMA. I remember saying to Biz as we walked back to our desks to hammer out the blog posts, “Doesn’t it seem weird that we’re dealing with the State Department and pro-democracy protests in Iran and the only tool at our disposal is a blog post. We’re basically working this out with a text box on the Internet?”
Text boxes on the Internet, man, I tell you what.