Incidentally #5

Puneet Awasthi
3 min readNov 13, 2023

--

How outages happen and how to prevent them.

Welcome to the fifth installment of Incidentally. You can find older articles by following the chain starting at incidentally-4.

I have worked in technology operations for years, witnessing costly mistakes that caused not only stress to the responding teams but also financial loss, regulatory compliance issues, and customer dissatisfaction leading to loss of revenue to the company.

Today's topic is distinct from the previous ones, and I acknowledge that it's not an easy one. Nevertheless, by applying rigor, discipline, and with proper investment, we can develop more robust systems that are less likely to cause significant incidents.

We never had a tester without a middle name!

We never had a tester without a middle name!

Example 5: Testing

Testing is an essential aspect of software development. Writing code can be a challenging and rewarding task for developers, but ensuring that the code works with all inputs, API interfaces, and environmental conditions is a formidable challenge. Here are some tips to improve your testing and avoid lethal bugs.

Test Early: Shift-left is a popular phrase that applies to software testing as well. The earlier you test the code, the more likely you are to catch any pesky little bugs. One approach to achieve this is TDD (Test Driven Development), where you write the test cases first, and then write the code. According to the National Institute of Standards and Technology (NIST), identifying and fixing defects in software becomes 30–40 times more difficult as it progresses through the five broad phases of development.

Test Enough: Testing everything with complete confidence is not possible, hence it is important to decide the adequate amount of testing required. You should determine the number of unit tests and the code coverage percentage that are mandatory for your teams. It is crucial to have an appropriate set of smoke tests and regression tests that will keep you out of trouble. Automation can be of great help here, as it allows you to scale up without adding more testers. Additionally, you can set a goal regarding how many tests must succeed before signing off the release as production-ready.

Manage Your Test Suites: When it comes to testing, it’s important to have a comprehensive approach and continue building your test library. As the product evolves, the number of test cases should increase. It’s important to periodically remove any unused test cases and update those that are impacted by changes in functionality. Failed tests must not be ignored as they can lead to serious issues. If tests are not successful, you should either investigate and resolve any issues found or remove the test cases altogether. One challenging scenario is an ‘environmental transient failure.’ In such cases, it’s important to carefully review the number of exceptions that can be allowed while ensuring that enough testing is performed to maintain the quality of the product.

Test for Negative Scenarios: When it comes to testing, you want to test negative scenarios as much as you want to test positive scenarios. This could include testing for bad inputs (Incidentally#2), handling of stale or lost connections, and infamous timeouts. As the code runs in production, it is likely to encounter scenarios unseen and users so creative that you could not even imagine. Therefore it is critical to test out what ways your system can break. Testing for security gaps and data leakage should be included in this type of testing and there is a whole field of Application Security Testing to guide you on that front.

Continuous Improvement: Always look to get a little better. Write that one extra test, delete that unused function, and automate a couple of manual test cases while you are in there. Getting just 1% better every day makes you 37 times better in a year! This mindset of ongoing improvement will help improve the product, establish the culture of your teams, and have an amplified impact over time.

So that is all for now. The streaming platforms always seem to have six episodes in a series, I do not know if there is science behind it. But I will follow them and close out the Incidentlly series with the sixth installment. Look out for the grand finale!

--

--