Mentorship via Debugging: How to Turn a Code Bug Into A Learning Experience

Published in

Extra Credit-A Tech Blog by Guild

8 min readAug 6, 2020

⛰ 🐛 🛫 🛬 🐛 🏖

Here at Guild, we like to foster a welcoming and inclusive engineering environment. In this series we will explore some ways to be an effective mentor as well as an effective mentee. For this segment, let’s use bug hunting as a tool for mentorship! Bug hunting is an invaluable skill, and it becomes an even greater challenge when you are dealing with distributed systems.

Bugs are a part of all systems, and tracking them down is a part of our jobs as developers. But when a bug is no longer a single fly buzzing around a program and becomes a fly stuck in a web of systems, debuggability is more cumbersome and tedious. In order to successfully hunt it down, we need to enhance our understanding of what it means to trace bugs through systems.

Photo by Guillaume de Germain on Unsplash

A systems model is an essential tool for tracing complex problems. Being able to visualize, either mentally or with a diagram, provides a “big picture” context that allows you to identify spaces where bugs can hide. Teaching newer engineers how to trace bugs is a great way to build out mental models for your systems and to level up people on your team. Let’s think of an example.

Imagine that a user reports that they are having trouble logging in. A likely instinct is to check the authentication code. This a logical first assumption, as that code is directly responsible for the actions taken by the user in order to access your system. In a monolithic architecture, we could trace the call stack through one system; but when services are abstracted and microservices adopted, we need a mental model in order to properly map out what may be going wrong. We need to know how the code we think is responsible for the bug is interacting and affecting other code and systems around it.

Let’s use a hypothetical authentication flow in which we have many applications, as well as many services driving the data and logic behind these applications.

At the highest level, this happens:

User visits a login page, enters credentials, and authenticates into Application A.

If we dig slightly deeper, we understand that the process looks more like this:

Code is run to validate that the user exists in our database, passwords match, and the user has all the data we need to establish a connection into Application A.
The validation returns true, so grant the user a JSON Web Token (JWT, pronounced “JOT”), establish a session, and carry on with your day.

But thanks to our mental model, we know that reality looks much closer to this.

User visits a login page, enters credentials, and authenticates into Application A (So we have a Single Page App that renders the login form, and handles all client-side validations. We can call this Service One)
Service One posts data to a backend service to look up the user and validate their credentials (So, we now have another server-side application handling database lookups and validations as well. We call this Service Two)
Service Two does some querying, finds the user, formats the User object into JSON, and passes it back to Service One.
Service One receives this data, formats it into a JWT, and hands it over to Application A.
But wait! Application A needs some additional data in order to fully render an experience for the user! Application A then queries Service Three, which finds the missing data and passes it back to Service Two, which then re-formats the JWT, and then the JWT is passed back to Application A. Finally we are good to go!

Phew. That’s a lot of services doing a lot of work for an authentication flow! Without an understanding of how all services interact and work together, it’s much harder to determine the root cause. Sure, we can keep making educated guesses about where to target our search, but with a clear mental model of the system, the task becomes less daunting. If you were part of the initial buildout of this system, or helped in architecting it, finding a bug may be much easier; however, imagine what this feels like as a brand new person on the team, or a junior engineer just starting out… it’s confusing!

In a way, I think of debugging as a git bisect of our system — that is, let’s start with the outlying layers and drill inward from there. With a git-bisect we can use a binary search in order to find the commit that introduced the bug. In a system-bisect we can utilize a binary-like search to find what system contains the bug. The mental concepts are the same; only the mechanisms are slightly different. And having a model of our systems allows us to bisect more effectively.

Effective bisecting can help get around an age-old truth about engineering: what you may think is obvious is not, in fact, obvious. Yes, sadly, your assumptions after months or years of working in a system can sometimes leave out important and relevant details for unearthing your bug. So, walk through the basics as part of your process, and train teammates to do the same.

Now, when debugging, context is king. If the engineer debugging this issue has been around for a while, or helped architect some of the system, it is much easier for them to hold on to a mental model while going through the debugging process. Maybe they know that certain users are always missing data Y when they come from a certain population — in fact they have that ticket lingering in the icebox ready to fix this exact issue! So I suggest two things here:

Diagram and document everything. I know it may sound redundant, but strong documentation can come in two forms: written and visual. Use both, please! Some people are visual learners, and having the design of your systems in this format is extremely helpful in turning everyone on your team into effective debuggers. Others will prefer to read about the design of the system, so having written documentation is equally valuable. Some folks, like myself, prefer both. It is nice to be able to visualize a system with an architecture diagram, while at the same time having solid documentation around each available service. By visualizing the flow, I know that Service One makes a POST request to Service Two. But what endpoints are available on Service Two? How is authentication handled on that backend? And what gets returned if something is amiss? These are the types of things that documentation and diagrams can help with! So please, write good documentation, but also help other people who learn differently with visual diagrams. When you are the only person who understands a system fully, and you are out of the country for two weeks on vacation, your past self will thank you.

An example of visual documentation for the authentication flow we described above.

2. Pair with people! If you are the resident expert on a particular system, and a hairy bug comes in or a system is down, can you grab a more junior person on your team and have them ride along as you debug? This works twofold:

a) You are sharing a lot of knowledge with your team and showing someone who doesn’t have as much experience how you debug. We all do this slightly differently, and there are a lot of techniques that can be shared! Maybe you love networking, but your more junior teammate doesn’t know much about it, so show them! They will get stronger and you will grow to understand them more.

b) You are jaded. Yeah, you, the super senior. Other people on the team, especially less experienced or newer members, have good questions and fresh eyes. They see things that you overlook, and they can give you a different vantage point. Pairing on bug triaging is a valuable exercise for you and your teammates. Help them grow and share the knowledge; sure, maybe this isn’t always feasible, but if some non-mission critical bug is popping up, use it as a mentoring opportunity and level up your friends. Again, when you are on vacation, your past self will thank you.

Not sure where exactly to start? Here’s a list of the common places I check when I’m debugging:

Did a bad deploy cause this? Walk new teammates through how to find deploys and versions, how to roll back bad deploys when they happen, and how to do a postmortem or retro on things when they go wrong.
Did the bug reporter send a screenshot? Can you see what browser they are using? Is there a chance that the problem is browser specific? Does the screenshot show what type of error they received? Little bits of knowledge that feel unimportant or obvious could turn out to be relevant, and they might not be apparent to someone newer to your system. For mentoring, my motto is: Ask, don’t assume. Asking clarifying questions helps you, the debugger, clarify the problem statement for yourself while also clarifying things for your mentee.
Where are your logs? You have logs, right? Logs can be noisy and hard to trace. Show your teammates the little tips and tricks you’ve picked up along the way. If there is a log aggregator — Splunk, Datadog, Cloudwatch, whatever — show teammates how to query it effectively. If your logging is set up to tag events with a piece of unique data, find that data and show new developers how to format those queries to get the most value out of your logs.
What is handling your networking? Do you have proxies and DNS and workers set up? Some engineers are not as familiar with networking and may not realize that it could be an issue. Show them your networking code, where it lives, what it does, etc. Again, what you think is obvious isn’t always crystal clear for others, or even for yourself!
Is something down? No, seriously, is AWS having issues in us-west-2? Is a third party tool having issues? Sometimes the simpler the issue, the more easily it can be overlooked. Sanity check your third party systems!
Reproduce the bug. Spin it up locally and try to reproduce. Get a runtime debugger involved. Are you using Rails? Pry can be your best friend. Are you using React? React Dev Tools and good old fashioned `console.log` can be life savers. By reproducing the bug locally, we bring the logs to our machines and have strong tools to drill into each bit of code and find the weak link.

So, there you have it — using debugging as a means of mentoring. Though not hard and fast, I think some of these practices and ideas can really help other engineers on your team get a better grasp of your systems, while also providing immediate value to users. Now, a junior engineer just joined your team, and in the first week you’ve shown them the lay of the land while also fixing a bug affecting users. That’s a win-win scenario that really helps empower your teammates. It shows them that, yes, we all write bugs; and yes, you really can do this!

Mentorship via Debugging: How to Turn a Code Bug Into A Learning Experience

Written by Tyler Long