Firefighting as an empathy tool
In my day job, one of my favourite activities is firefighting. Working together to figure out why something isn’t working and fix it. It requires the ability to dive into complex technical issues in a short amount of time. You may also need to have the overall vision of how a piece of your stack functions. But I’ve recently come to realise that it’s not just the technical side of things I enjoy, and that’s what this blog post is about.
During the DDoS on Dreamhost, the Elm package servers were affected. Essentially, the domain nameserver records for package.elm-lang.org were made invalid. This meant that nobody could download anything, breaking all tests and builds. As you can imagine, that’s a pretty big issue.
Several users came in to report the issue. When something like this happens, I usually take the role of informing others of what’s happening and of any potential workaround. I also let the users reporting the issue know the expected downtime. This is pretty similar to what I do in my day job.
A couple of users started discussing in the same space potential long term fixes. I did my best to cut the conversation early in that channel — in a firm tone. My reasoning was that the majority of users needed to be kept aware of how to solve their issues right now, rather than how package management could be completely redone. Both have a time and place, but I felt right there and then was not it.
One of the users in question reached out to me, and let me know that they felt like my response was lacking in empathy. They were dealing with a huge blocker as a result of this downtime. I explained my view on why I responded the way I did, and gave them an offer of a short video chat to resolve their issues. They were pretty happy to wait for it be resolved, and understood my position.
To me, one thing was clear from our interactions — I communicated poorly during a time of high stress for the person on the other end.
This user was dealing with the issue, unable to do anything. My response in cutting their conversation short took away the chance they had to feel like they were doing something. That leaves me with the question of “How should I communicate this in future?”, something that’s been on my mind most of the day.
Thinking about when fires happen in my day job, I realised a few of differences. I know the people I am work with and I know how to communicate “this isn’t productive” in a way that doesn’t offend. I’m in a position where there is something that I can do about the issue at hand. I usually have a fire document where long-term solutions can be added later or as we fix each piece, without needing to discuss them in depth at the time. The fire document can be used to direct questions or thoughts. It can also be the place where any support-related needs get answered.
A fire writeup usually contains information on what happened, who effected, who is fixing it, the timeline, the short term fix and the long term fix, along with various questions.
I’m a big fan of fire writeups. Especially as a communication aid during the fire. I think next time an open source fire happens, I’ll open a fire document for that too. Users can be directed there for “how do I fix this and what happened?” questions, meanwhile users who want to discuss things further can be directed to a channel. Nobody needs to feel left out, and all questions get answered.
I actually did this previously, for another issue that I fixed during one of my previous day jobs: https://github.com/elm-community/Manifesto/issues/41. The problem with using Github as a platform for fire writeups is that issues are essentially static — they aren’t alive in the same way Google Docs, Dropbox Paper or Quip documents are. That makes it good for writing things up afterwards, but I consider writing things up during just as important. Informing others of what is happening is vital during any incident, and there’s no reason why that shouldn’t apply to open source too.
So, tl;dr — let’s use a collaborative platform for figuring out fires together. We should make sure that short term issues are addressed, and all the possible workarounds or solutions are in one place. We should also identify long term solutions eventually, but also understand that discussing potential long term solutions can be productive for some people. Most of all, we should be considerate that when something goes wrong, it feels bad for everyone involved.
