The Art of Debugging: Fixing and Reflection Process (Part 4)

Sopheak Hang
7 min readJun 18, 2024

--

Posts in this Series

  1. Introduction
  2. Bug Reproduction
  3. Diagnosis Techniques
  4. Fixing and Reflection Process (This Post)
Source: Sources of Insight

In the previous post, you learned how to spot the bugs using various techniques depending on different situations. I assume you have identified the issues now that you need to proceed with the last 2 steps of debugging steps: “Fix” and “Reflect”.

Fixing

When fixing bugs, keep the following principles in mind to ensure effective and efficient resolution:

1. Fix the cause, not symptoms

You need to address the root cause and fix it rather than just clean its immediate effects. Fixing only the symptoms might provide only a temporary solution, the underlying issues remain unresolved which can lead to recurring problems in the future.

For instance, if your system suddenly has a performance spike without any significant increase in the load, you might think of upscaling the servers to resolve the problem. However, it should be treated as a temporary solution to calm down the situation only. The server capacity is not the cause. Therefore, you need to take further steps to fix the real cause that leads to performance issues which can be recent code changes, ineffective implementation, database indexing issues, etc.

2. Ensure the tests pass after the fix

You don’t just apply the fix and deploy it to the production as soon as it’s done. Working on a set of tests is a must to confirm the fix is effective and efficient:

  • Test on the current issue to see if the bug disappeared after the fix or not
  • Run the old unit tests or any automated tests to ensure no regression errors occur after applying the fix.
  • Add new test cases to automated tests to ensure those specific scenarios won’t introduce the issue again.

3. Correct any issues caused by the bug

This principle highlights the need to address any fallout from a bug. Bugs can mess up data, corrupt files, or disrupt processes. After fixing the bug, it’s important to find and correct these issues to get the system back to normal. 2 important points to be considered:

  • Identify Affected Areas: Determine which aspects of your system were impacted by the bug. This could include data inconsistencies, corrupted files, or any other disruptions.
  • Restore Integrity: Implement measures to correct these issues. This could involve cleaning up or correcting data, restoring files from backups, or rerunning processes that were disrupted.

There is a situation which the fixing process requires long time and the issue is critical and urgent, you can strategize a temporary work around solution so that you can buy some more time to work on a proper fix.

Reflection

Everything is fixed!! Is it done now? No, the work isn’t over yet.

To prevent future issues and improve the overall quality of your system, follow these crucial steps:

1. Explain the Root Cause:

Make sure that you thoroughly understand what is the root cause and can explain it well. This includes how the bug originated, what conditions it is triggered, and why it behaved the way it did.

2. Share Your Findings with the Team:

Share your findings, including the root cause, and the fix, with your team. It ensures that everyone in the team is aware of the issue and the solution which could help foster a collaborative environment, especially, they could learn from the mistake and avoid it in the future or know how to spot and fix it quickly if they happen to see it again.

3. Document Your Findings and Fix:

Document everything related to the bug: the symptoms, root cause, diagnosis technique, steps taken to fix, and any changes made to the code (If you use Jira, all the code changes will be linked with a Jira ticket which can easily track it in the document). This document serves as a valuable reference for future debugging

4. Identify Other Instances of the Same Problem:

Check if similar issues exist elsewhere in the system which has not been reported and fixed yet. Sometimes, the same bug might exist in different parts of the ecosystem. Conduct a thorough review with relevant people in the team to see any potential instances to take care of.

5. Improve the Logs and User Error Messages

If you struggled during the diagnosis step because of missing information in the logs, it means that you have to work on adding the detail to the logs so that the bugs can be easily identified given enough information.

If the bug was due to an end-user mistake or misunderstanding, consider improving your error message. Bring clarity to the error messages can guide users better and reduce the likelihood that the same issue encountered again. It can also reduce the number of support requests.

Below is the template of an effective error message:

[Title]: Brief, Clear Summary of the Issue
[Description]: Explain What Happened
[Solution]: Provide Steps to Resolve the Issue
[Additional Info]: Optional Technical Details or Reference Code
[Contact]: Offer a Way to Get Further Assistance

Example:

Title: Unable to Save Your Changes

Description: We encountered an issue while trying to save your changes. This might be due to a temporary server problem or a network issue.

Solution: Please try the following steps:
1. Check your internet connection.
2. Try saving your changes again in a few minutes.
3. If the issue persists, refresh the page and try again.

Additional Info: Error Code: 500-SAVE-001

Contact: If you continue to experience this issue, please contact our support team at support@example.com.

6. Close the Issue with Other Stakeholders:

Once we’ve fixed the bug and completed all related tasks, it’s important to close the loop with everyone involved. We make sure to update stakeholders about what happened, what we did to stop it from coming back, and any improvements we’ve made. If it is the external parties such as customers or partners, the updated information is not necessary to be that detailed. This kind of clear communication is key to keeping everyone in the loop and building trust among all the stakeholders.

Bonus Tips to Boost Your Debugging Skill

Well, I still have more to say. Here are a few more tips to help you boost your overall debugging skills.

1. Don’t try to catch an unexpected exception and dismiss the original one.

Bad practice of using try-catch
Log with the full exception

The main goal of using try-catch block is to handle the error gracefully not to completely dismiss it. So, when catch the exception, especially the unexpected one, be sure to always log the full exception so that you can see the full stacktrace that help us during bug diagnosis. Without the log, the only information you get is “Something went wrong!”, but you don’t know what is something.

2. Learn common built-in exceptions/issues

Common exceptions in Java

Why understanding common exceptions is useful?

By understanding the meaning and what commonly causes the error of each Exception type, you can quickly spot the issues and apply the fix.

For instance, NullPointerException occurs when an object is not initialized and being accessed to a property or method. If you know that, you can find the variables being used in the code block that is more likely to be null and apply a null check accordingly.

3. Do incremental and bottom-up development

When developing a big system or module, it is crucial to adapt an incremental approach and test frequently after each sub-module is added. This makes the development process manageable and helps programmers pinpoint issues promptly without being overwhelmed by a long list of errors all at once.

3. Understand the whole architecture of the system you are working with

Debugging is beyond examining your code. Understanding the big picture, namely the entire system architecture is crucial to developers to the next level of debugging master although some parts are not their responsibilities. It sometimes happens that the issue reported has nothing to do with your department or your stack that you should take care of. So, if you know the full architecture, you might be able to help pinpoint the stack that might cause the issue quickly which could save you and your team a lot of time.

For instance, a slow image load might indicate a problem with the Content Delivery Network (CDN) rather than your website code itself.

Congratulations that you have made it to the end of this series of articles. Please keep in mind that there is no single silver bulletproof technique that could solve all sorts of problems. Only by understanding various techniques, their use case, and their processes, can you be flexible and confident about what to do when the issues arise.

Thanks for your reading

If you find this series useful, please show your support with several claps 👏 !

References:

--

--

Sopheak Hang

Tech lead / Scrum Master / Freelancer / Tech Mentor / Data Science and AI enthusiast