To err is human

Tao-Sheng Chen
ShopBack Tech Blog
4 min readNov 4, 2018

--

Few days ago, I read message from slack about one of our talented engineers repented his typo which cause deletion of whole table in testing environment.

Whole story was simple and it also reminded me, last time I even think about Hara-kiri when I did pretty much the same mistake — but in production db.

Whole story is simple, if you plan to delete a record by an id:

But in a midnight with too much caffeine in your body, and your right hand fingers act shorter than your expectation, and you fire enter unconsciously… then you might have the typo of - instead of =.

It definitely make the whole world crazy!

I did almost the same thing around 12 years ago. But even worse, it was in production db. That table was for invoice and hold valuable information for running 10B USD business! Dogeza even can’t relief any one’s pain. In such level of mistake maybe Hara-kiri was the only way to maintain honor and reputation.

I am still alive. Means that mistake was mitigated soon and damages was in control. How I got rid of the disaster which made by myself could be another article with 3 chapters. But in short, I was lucky enough to reach right person in right time and also my company was a company which really encourage innovation even some stupid actions inside innovation.

Everybody made mistakes and every programmer created bugs. To learn from mistake is actually one of human nature and that even doesn’t require teaching. Human naturally learn from mistakes.

So, is this a common article to encourage readers to fix the mistakes, lessens learnt from mistakes, make the whole world better after that? Then you could write the whole process done in your resume or answer the top N interview question?

Well. It is NOT. At least not entire story.

In here we want to value on those talents who prevent disasters instead of hero fix/save disaster.

Many articles discussed how to increase your visibility in organization. But to know how to “win with ease” is harder and requires more deep knowledge and skills. Which is much more critical in a growing company like us.

Lowest level method to prevent human err is via Carrot and Stick Approach. It is very easy to practice and it looks fair from many point of view. It does work in some system, however, it never works in software development environment. I don’t even want to mention it in my article because that means anyhow I knew it.

Middle level method to prevent human err is via rules, training and put improvements focusing on human. It works in some scenario and sometime there is no better options. It could be go further, for example: in software technology it is possible to automate rules. Static Code Scan is a similar concept.

Advanced level to prevent human err is via structure and system. The concept is to have systemically design to reach the final goal in management of both successful and failure. It doesn’t mean failure won’t happen, it means the error/failure will be fully managed and won’t cause disasters. Just like a healthy and strong human body can actually protect itself from virus or bacteria. To reach that level in software system — especially a online service — we need to value those who contribute on those fundamental missions. For example, how we organize fast response team to deal with incident is sometimes as critical as how we do the fix; how we systematically prevent error is also as critical as how we iron out bug at mid-night to resolve critical users’ feedback. Those values are somehow easily to be ignored in some organization but NOT in ShopBack. We value those hero who save us from nightmare but also value those makers/architects who make nightmare won’t actually happens.

--

--