A series of mistakes — why support procedures are important

A support case gone wrong and how it could’ve been avoided if everyone just followed common processes

Once upon a time, while developing an interface that sends information between our system and the customer’s system, a developer made a mistake.

A basic functionality of this API was to update the number of users in various parts of the system, in real-time. So, when a user logs in somewhere — the number for that specific area increases by 1, and when a user logs out — the number decreases by 1. Simple, right? Well, the aforementioned mistake was also very simple — the reported number decreased only if the user logged off after doing an action in the system. So, users who came and went without doing anything — were reported as if they are still in the system long after they were gone…

The customer was understandably upset since most of these numbers appeared on their website and therefore provided wrong information to the users.

And here starts a comedy of errors.

Mistake by customer — bypassing tech support

This customer is big, so they have their own account manager. Having him as the primary contact for many things, they emailed him instead of going through tech support, as they’re supposed to.

Mistake by account manager — basically doing everything wrong

The account manager replied immediately that our product team, who was added to the email thread, is on holidays. He suggested that the customer should wait a few days. This is wrong on many levels. First, he didn’t properly understand the severity of the issue; Second, he removed responsibility from himself by pushing it to the product team; And third, he did not suggest escalation to the support team.

And he did all that by sending a 1 line email. Impressive!

Mistake by product manager — becoming the middleman

A member of the product team, although being on holidays, saw the email and replied that he had just forwarded the issue to the development team and will inform back once he gets results. First and foremost, it’s not his job to deal with production issues. He tried to bypass procedures by going directly to development (who indeed, ultimately will be the ones to fix the issue), but lost context throughout the process and even more importantly — the ability to communicate to all channels. He placed himself in the middle as the only person to have all the information, and due to him not being able to follow up (holidays…) — this brought the process to a halt.

So, how did it get fixed?

The product manager actually sent the issue to the correct people in the development team, but lacking any real context, all they did was identify the problem and create a ticket for the fix. No maintenance was scheduled.

A day passed, and the customer, going impatient, sent another email stressing the severity of the problem. At this moment I, in the role of product team leader, saw the thread and immediately suggested that since the fix might take a bit more time, maybe the customer should consider removing the misleading information from the website. The goal here was incident severity management— if we can’t fix the problem completely in a short amount of time, at least we can lower its impact.

Interestingly, the customer didn’t like that. The next morning they made an angry call with some management members from our side, followed by replying to my suggestion with “this is a production issue and we usually fix problems instead of just removing functionalities from the site”. By that point I already got a stressed call from a member of the management group, asking for assistance in escalating the fix.

I made a short call to completely understand the problem, then went to developers and agreed on a quick fix from their side. I then replied to the customer that my suggestion came directly due to the production nature of this issue, and hinted that we will manage to fix it today even though it wasn’t escalated properly through tech support. It was fixed by end of day to the full satisfaction of the customer.


Lessons learned?

It seems like most of the issues described in this case study stem from one basic fault — not following procedures.

Sometimes, this is done due to the need for human interaction. The customer prefers to go through the account manager (or other known personnel on the service provider’s side) instead of sending a request to an abstract group called “tech support”. Because frankly — who really wants to deal with tech support? Let me speak with John, whom I’ve been meeting with regularly, not with some unnamed tech guy.

But there are reasons for procedures. Tech support are 100% dedicated for solving issues. They are the fastest to respond. They log everything. They handle the issue from start to end. And in the case of a critical issue with a high-profile customer, they will assist 24/7. Plus, they actually do have names.

Sometimes, not following procedures is done by pure convenience, or even incompetence. The account manager could’ve easily followed the process by escalating to tech support. He could’ve even done more with a bit more effort, and involve a member of the team who wasn’t on holiday, to help him escalate to the right people. But he didn’t.

And then, the product manager made the same mistake, only for the opposite reason — he didn’t go through tech support because he wanted to give the customer personalized treatment the kind they are so keen to receive. But the product manager was not fully available to assist, and wasn’t fully versed with the correct processes needed to deliver the fix quickly. And in the end, that special treatment that he tried to provide, was what caused the fix to delay by more than 2 days. The road to hell is paved with good intentions…

Summary

The customer might always be right, but it doesn’t mean they can expect the problem to fix itself by them shouting and banging on the tables. They must have the responsibility to actively assist in issues that come up. If a customer is not going through the properly defined procedures, they cannot really complain that they are not getting the agreed upon service.

The service provider’s personnel should also follow procedures, because procedures are there for a reason. They should act based on their job description, not removing — or adding — responsibilities from themselves. This is the best way to ensure proper care for any support issue.

You know what, they don’t have to do everything right all the time. They should just try harder not to do everything wrong…