Debugging the Production Web

Jennifer Rullmann
8 min readJul 25, 2016

--

The ability to diagnose and debug production bugs through the browser is an important skill for any front-end engineer, and one I think will become increasingly important as more teams and projects do continuous deployment. Here’s a couple of ideas to make it easier.

Diagnosing Production Bugs

When a user reports a bug, they often don’t provide enough context for you to understand what caused the problem. These strategies do not depend on the user to make detailed bug reports.

Tip #1: Don’t point your local to production

A lot of developers, especially those who have only worked on small projects, are tempted to point their local environment to the production APIs (or database). This is a very convenient and fast way to diagnose a bug or verify a fix. But it’s difficult to understand the magnitude of changes that are different between the environments. There certainly are some differences, or you’d be able to reproduce locally. Those differences can result in changing production data in ways you don’t intend.

Like me, I’m sure you’ve seen this go badly, usually in unforeseen ways. Just don’t do it.

Tip #2: Send logs from the browser to the server

Logs should be your first strategy when investigating a bug, but the default front-end logging mechanisms provided by the browser don’t send the data to a place you can inspect later.

Speaking of inspecting later, grepping through text files is painful. Send your logs to a full logging solution, such as loggly or Splunk. They let you do things like click on the user SessionID of the log you’re looking at to find all the logs that were also part of the user’s session. It’s like seeing a story of the user’s experience as they used your application.

What to Log

  • XHR Network Failures

Most modern web applications load data via XHR. If one of those requests fails — times out or returns a non-success status code — you should be able to correlate that event with a bug report. Nearly all HTTP libraries have a mechanism to inject a failure handler. For example, axios has interceptors. Use this to send a request to your server with details of the failing request and its response.

Note that this suggestion is less beneficial for teams that own both the front-end and the API server, as they can easily add server-side logging of non-successful requests. There is still some benefit however: logging of requests that never made it to the API server.

  • JavaScript Errors

Run-time errors of applications are critically important for diagnosing problems — but most are trapped in the user’s browser! Log these errors to the server and enjoy the bug diagnosing experience your back-end dev friends take for granted.

Don’t worry, you can have run-time errors logged too!
  • User Actions

Sometimes finding the source of an error is not enough, and you need to know what the user did before the problem manifested. You need to know what actions the user took to get the browser into the problematic state.

Many modern applications track user actions, sending them to an analytics tool like Google Analytics or Mixpanel. While useful for high-level questions (are people putting things in their shopping cart? What pages has this user visited?), they are not detailed enough for diagnosing bugs. What you need in this situation is a log of every user action and how it changed the model.

Given idempotent view code and a log of the model, you can figure out exactly what was rendered to the user’s browser. In Production. This is HUGE.

There are a couple of challenges with this technique, mainly centered around the verbosity of user actions. But the possibilities of what can be done with the data is mind-blowing — see Tip #3 in Verifying Fixes in Production for one.

How to Log

Here’s the kind of things you want to see in your log:

{timestamp} Type: Network exception. Request: {url, method, params, body, etc} Response: {status code, body, etc} SessionID: {unique identifier for this user’s session}

{timestamp} Type: Error updating model. Action: {description of the update requested and any params}. Previous state: {dump of the model before the update was attempted}. Error: {the actual JavaScript run-time error thrown by your model code} SessionID: {unique identifier for this user’s session}

{timestamp} Type: Error updating view. Props: {the properties received by the view when it tried to update} Error: {the actual JavaScript run-time error thrown by your view code} SessionID: {unique identifier for this user’s session}

{timestamp} Type: Model updated. Action: {description of the update requested and any params}. SessionID: {unique identifier for this user’s session}

Here’s a couple of recommendations to achieve these kinds of logs:

  • Make your logging DRY, but tailored to the class of problem encountered. This is much easier to do if you have clean separations of concern (MVC, MVVM, or any other strong pattern) so you can inject a generic error handler per area of concern.
  • Prefer idempotent code. If your view code always renders the same HTML provided a set of parameters, then all you need to know is the parameters passed to it to reproduce a problem. There are several libraries and frameworks available today that help you to achieve this. My favorites are React and Redux.
  • Create a unique session identifier for the user, and include that with any logs sent to the server. Include this SessionID automatically when the user submits a support request, so you can easily find the associated logs. If your backend is micro-service based use the same SessionID in each service’s logs for major bonus points.

Tip #3: Prettify minified assets

Your favorite browser will prettify minified assets, adding whitespace so they’re much more readable. This is tremendously helpful when pausing execution on a breakpoint — either manually or having the browser stop on exceptions — so you can inspect the callstack, local state, and function parameters. You can often figure out how to reproduce the issue locally with this simple technique. And since it doesn’t depend on special tools, you can apply this on any computer.

Tip #4: Tamper with the assets in your browser (danger zone!)

You can use a proxy server like Fiddler to change the assets requested by your browser. Use this approach to replace minified assets with dev versions, inject more logging, and more.

Warning: you can do bad things to production data with this technique. The risk is much lower than pointing your local to production, because the change you introduce is very small. But because this approach uses the production APIs it is possible to do something that you later regret.

Be very cautious which tampering with production assets

Verifying Fixes when You Can Only Reproduce on Production

So you’ve diagnosed the issue and you have a potential fix, but you want to test the change. How do you do this if you can only reproduce the issue in production?

Tip #1: Don’t point your local to production

See tip #1 in Diagnosing Production Bugs

Tip #2: Just ship the fix

If you have a potential fix in mind, but can’t verify except in production, just ship it. Your automated test suite will stop the deploy if you broke something, and if the potential fix doesn’t work, you can always push a revert of the first fix, and try again.

This is one of the many beautiful things about having an automated test suite that you trust.

Tip #3: Replay the user’s actions on production in your local

Given idempotent view code and detailed logs (see Tip #2-User Actions above), you can replay the user’s actions in your own browser and see exactly what the user saw.

SliderMonitor for redux-devtools— play, pause, and rewind user actions.

First replay the user’s actions on your local to verify that you can reproduce the problem. Then use the full power of your development tools to tweak the code and replay the user’s actions until you can no longer reproduce. I’m so excited about this technique I’m going to drop this again:

The only tool I’m aware of that gives you this powerful feature is Redux. It’s trivial to add middleware that sends every user action and associated params to your server to log. Using Dan Abramov’s redux-devtools, you can later replay these logs from production in your own browser.

I’m hoping redux-dev tools and monitors continue to mature, as there are a lot of exciting possibilities with its model.

Tip: #5: Danger zone: Tamper with the loaded assets to inject your fix.

Using the same approach as Tip #4 in Diagnosing Production Bugs, you can modify the static assets on Production before it reaches your browser.

Warning: you can do bad things to production data with this technique. The risk is much lower than pointing your local to production, because the change you introduce is very small. But because this approach uses the production APIs it is possible to do something that you later regret.

Summary

The tooling available to front-end web developers gets better every year. I think the best is still to come, and that deterministic views will be at the center of it.

More resources to read:

--

--