by Todd H. Gardner
Editor’s Note: The hardest bugs to fix are the bugs you never see. You may not see them, but perhaps thousands, even millions of your users do see them every day, in the middle of workflows that are critical to their success with your app.
Once in a while we find products or services that we highly recommend and invite the company behind them to participate in a sponsored post. I’m pleased to bring you this sponsored post for TrackJS by Todd H. Gardner.
I enjoyed reading it, and learned a thing or two. I think you will, too.
Logging Our First Error
To capture errors from our application, we’ll need to add the TrackJS tracker agent to the page. It will wrap up our application and the browser’s API to catch errors and send them off to be normalized and logged.
If you haven’t already, grab a free trial account of TrackJS and it will show you an installation snippet that looks something like this:
This doesn’t have to be in the `<head>`, but it should probably be the first script on the page. Otherwise, any errors that happen before we can capture it would never get reported! There is also an NPM package so that you could bundle their script with the rest of your app.
Newer browsers will give you an instance of `Error` with a stack trace, but many older browsers will only provide message, file, and line. Error monitoring libraries like TrackJS can improve error collection by injecting listeners more places where errors are likely to occur.
Now that we’ve got something listening to our app for errors, let’s make sure it’s working. Open up your browser console and try something like this:
You should see an error being sent off in your Network debugger, and if you’re using TrackJS it should pop into your Recent error list.
You may be wondering why we needed to put a `setTimeout()` in our example. Most browser debuggers create a sandbox around their console to prevent errors from leaking into the page. However, in our case, that’s exactly what we’re trying to do! The `setTimeout()` allows us to inject our error into the page to be executed on the next event loop cycle.
The Anatomy of an Error
Wait, what about the stack? Strangely enough, an error stack trace is a non-standard property! Internet Explorer older than 10 and Safari older than 6 do not even include it. Even for new browsers, the structure and syntax of the stack trace is different.
TrackJS will automatically normalize the structure of your stack traces. If you are building something yourself, you should take a look at StackTrace JS, which can help give you similar capability.
Unfortunately, there is not a standard way to describe a network error. TrackJS records network errors as the METHOD and URL of the network request, as well as the status code of the response. To protect sensitive data, it does not capture request or response bodies or headers. If you are customizing your own tools, you may want to consider capturing this depending on the context of your application
Tip: Since TrackJS records messages passed into the console, you could add your own request and response bodies and headers by writing them out into the console.
TrackJS will automatically record messages passed into `console.error`. If you want to do this, or add additional context to things passed into the console, you can wrap the console functions.
Enhancing Error Context
TrackJS records tons of this context automatically, and gives you a powerful hook to define your own custom metadata about your application state. But what is really interesting is the Telemetry Timeline.
Telemetry are all the things that your application is doing before an error happens. Things like changing state, issuing network requests, or responding to user actions. The TrackJS agent is recording this telemetry so that it can present a timeline of activity leading up to a problem. It’s a really great way to visualize how an error occurred.
If you are using an architectural pattern like Flux, this can be a really powerful way to visualize state transitions leading up to an error.
Getting Past `Script Error`
One of the first errors you are likely to log is `Script Error`. This is caused by the browser obfuscating errors from a script on a different origin as part of the Same-Origin Policy. For example, if your scripts are loaded from a CDN, or referenced from a third-party, errors that originate in them would have their details stripped out.
Script Error Sucks. There is no context. No clues. No indication of how your users are impacted. We need to get it out of the way fast to understand our real problems.
The most compatible way to handle Script Error is to load scripts from the same origin as the page. You may lose some performance benefits of a CDN and multiplexed loading, so may want to consider doing this temporarily or for only a fraction of your traffic. For example, you could have 10% of your traffic load scripts from the same origin and use this traffic for error monitoring.
For a more thorough discussion and the causes and other solutions, check out the TrackJS Blog on Script Error.
Identifying and Fixing Errors
Once we have good contextual errors being reported, we need to actually fix them. Yep, now the hard work begins. It’s up to you to sift through the errors with your unique understanding of your application, your users, and your context.
Generalized Error Causes
I often start by trying to find the general cause of an error first, as it helps narrow the questions needed to debug further. Here are some general causes of errors and some clues you can look for in your error data to identify them.
Browser Compatibility Error
The application does not work correctly in a specific (or set) of browsers. Perhaps there is an expectation of an interface or behavior that does not exist, or the browser has unexpected performance characteristics.
Symptoms: Low cardinality of browsers affected. If there is a small number of browsers reporting the error, and you have not specifically tested the application with the browsers.
Debug With: Use developer tooling provided by the browser reporting errors to debug compatibility. If the browser does not provide developer tooling, you can always fallback to `console` and `alert`.
User Configuration Error
The user has customized their network or browser environment in ways that are incompatible with application. This could include invasive browser extensions that manipulate the document, or network proxies that re-write content during transmission.
Symptoms: Errors originate from unrecognizable sources not part of your original application.
Debug With: Expand your error context to record the contents of the document and scripts as part of error capture. Compare these with the expected application to determine if they have been manipulated.
Network Resilience Errors
The application fails when certain kinds of network failures have occurred. The internet is not always reliable, especially over mobile networks, applications can fail in interesting ways when some or all of the assets of the page fail to load.
At the 2016 Google I/O conference, Ilya Grigorik shared metrics that 1%-10% of network requests may fail from a mobile device due to connectivity or processing constraints.
Symptoms: Reports of `TypeError` and `ReferenceError` may indicate that foundational interfaces are not available. For example, `$ is not defined`. These can arise if the scripts that provide these foundations fail to load.
Debug With: Check the load and presence of assets with error reports. Explore how your application loads if key assets are removed. If possible, construct reasonable fallbacks and safety check for types before using them. Progressive Enhancement is still the best path.
Symptoms: Errors that begin reporting as of a specific date and time that does not correspond to your own release schedule. When integrating with third parties, note that their changes can impact the experiences of your customers.
Debug With: This should be debuggable with standard developer tools, like the Chrome Developer Tools, run in your application.
Logical Application Errors
When we run out of everything else to blame, it must be our own code. The timing of real-world asynchronous events or edge cases of the application state may not be accounted for.
Debug: Record customized context about your application to better understand the current state and timing.
Go Forth and Build Better Apps!
It’s amazing how much better our apps can be with a little feedback from our production environments. Usually there are only a handful of subtle problems keeping applications from being great. I’d love to help make yours great and crush a few bugs. Grab a free 30 days of TrackJS and let’s build a better web.