The Fine Art of JavaScript Error Tracking


A couple of months ago, I had some downtime at work, so I tasked myself with one of my goals for the year: implement error tracking for our JavaScript applications. It’s a bit crazy to think we weren’t monitoring our front end apps until now. Especially considering our stack is heavily comprised of a bunch of SOA-style Angular and Spine apps. But we’ve managed to do alright; and technically we did have New Relic passively keeping tabs on browser errors site-wide. However, it was about time we started paying serious attention to the nitty gritty errors thrown out in the wild.

Goals

Starting out, I didn’t exactly establish a set of clearly defined goals. Overall, I just wanted to find a good service to report exceptions that might come up across the various browsers and devices we support. Roughly speaking, here’s a list of what the service should be able to do for us:

  • It should provide a flexible JavaScript reporting client. We want to be able to control what exceptions we report on, as well as how we report on them.
  • It should provide us with enough information to reproduce errors to help resolve them. Obviously, we want the typical reporting information: stack traces, browser/OS versions, frequency, etc. But related to the first point, we also want to be able to provide extra information ourselves through the reporting client, such as AJAX params, member information, or other environment variables.
  • (It should help us achieve and maintain zero errors.) We have this policy for server-side exceptions, where we try to maintain zero errors and only report on important ones. This way, we can be sure that any error we’re notified of is something we need to stop and address ASAP. I naively set out to shoot for the same policy for JavaScript monitoring, but I quickly found out this was impossible.

Options

There’s a ton of tools out there for error monitoring, and I’m sure most of them would suit our needs described above. I didn’t want to get too caught up in trying out a bunch of different services, so here are the four options I considered:

  • New Relic. We’ve been using New Relic since I’ve started working here to monitor our app ecosystem’s general health, and it’s fantastic. I think it was sometime last year when they announced New Relic Browser, their monitoring service for the front end. The AJAX insights are where this service really shines, but the error reporting aspect of it didn’t give us enough flexibility (there was no reporting client available).
  • Honeybadger. Our back end Rails apps reported exceptions to Honeybadger, and it was pretty good overall. I did try using HB for one of our front end libraries, but it ended up being too noisy since it hooked into window.onerror without filtering anything out.
  • TrackJS. This service looked really promising. It had a flexible reporting client, a beautiful dashboard, and an exciting feature called Telemetry Timeline, which provided context of the events leading up to the thrown exception. I tried it out for a week and— at least for the errors captured during that time— it didn’t seem to do a great job aggregating similar errors, the Telemetry Timeline wasn’t very useful, and it was quite noisy. In hindsight, I probably should have given it more time in production. With a bit of playing around, it may have turned out to be a nice solution to our problem.
  • Sentry. Recommended by our DevOps lead, Sentry held as much promise as TrackJS. It provided flexible reporting clients for a number of platforms (JavaScript, Ruby, Python, to name a few), it had a good-looking dashboard, and it was open source! It was still noisy, but it seemed to aggregate similar errors better in my opinion.

The Winner: Sentry

Sentry was the last service I tried, and it stuck. There’s a bunch of reasons why I liked it the most. I’ll boil it down to three.

First and foremost, the dashboard did the best job surfacing the most important errors using its Priority sort, which is a weighted score of the time of the last seen instance of an error and its frequency. Both its list views and detail views provide us with the information we care about most, in the simplest, most concise manner out of the group. Minimal clutter, maximum readability.

Secondly, it has a bunch of core and community driven integrations with other services. For us, it allows us to hook into HipChat for channel notifications and JIRA to create tickets. In addition to these integrations, Sentry allows us to create notification rules— i.e. we only want a HipChat notification the first time an error is seen, and only if it’s hit a threshold of x events reported in a given minute.

Lastly, it’s open source and supports a variety of platforms. With the source code up on GitHub, we could actually run our own Sentry server if we ever have the need to. The paid options are good enough that we’ll probably stick with it. Regardless, we still get the benefit of community driven improvements and bug fixes. The multi-platform support turned out to be an unexpected benefit, since we ended up moving all of our server-side apps to report to Sentry as well.


Things I Learned

Signal vs. Noise

There was a common theme among the cons of all the services I looked into: noise. All the services dumped an overwhelming number of errors. Some were legitimate, some were out of our control, and many were duplicates of others.

I knew dealing with JavaScript errors was messy, but I didn’t realize just how difficult it was until I dove into it. There’s a great talk by Sentry’s Matt Robenolt on the problems with the current state of JavaScript errors and how the Sentry app tries to deal with them.

https://www.youtube.com/watch?v=e4eE5VeO1_o

The stark reality is that the noise is not going away any time soon. We’ll most likely never reach a state of zero error zen, but that’s okay. That shouldn’t have been a goal in the first place. The real goal for us is to be able to identify actionable and important errors, with some clues on how to reproduce and resolve them.

The State of Stack Traces

Everyone knows JavaScript stack traces blow, but this fact really gets hammered in when you’re sifting through dozens of them in desperate hope of direction. It’s even worse when your JS is minified and mangled in production. I just wanted to briefly mention source maps, which Sentry supports and uses when showing stack traces in its dashboard, they’ve been a nice addition to our build tools. Also, the work in progress library Zone.js, which is super exciting and coming to Angular 2.0.

How We Use Sentry

Reporting

Our JavaScript Sentry projects are set up to correspond to entire pages on our site that might contain one or more small Angular apps. Each app then integrates the Raven JS client to intercept all Angular exceptions and HTTP response errors. Below is more or less our Raven integration into Angular, run during an Angular config callback.

https://gist.github.com/jico/e4d14ae301ed93132823

Identification

Sentry is pretty great at surfacing important JavaScript exceptions. Nonetheless, deciding which errors need to be reported and addressed, and which are low priority noise, is a fine art in judgement. Here’s a quick example of how we identify actionable JavaScript errors.

We look for the spikes in the stream. Sorting by Priority is usually the best way to filter errors. You can see how noisy the stream is, but looking at the error frequency and the number of users affected, relative to the entire feed, is a good indicator of a major issue.

Let’s dive into the cryptic, vague [object Event] exception.

As you can see, this is one of the more useless error messages. It also doesn’t have any stack trace whatsoever. We then ask ourselves: is this error actionable and high priority? The graph shows it’s happening more than just intermittently, and that it affects browsers we support, so the answer is yes.

Sometimes we’ll come across errors with nearly zero context from which we can try to reproduce them. Or, we’ll find errors with bizarre messages. For the latter, if it’s coming from a single browser version, it’s usually an exception from a browser plugin or third-party library. These are examples of unactionable errors which we simply ignore and mute.

At this point, a JIRA ticket is easily created through the built-in integration (see the Create JIRA Issue link in the sidebar). The person assigned this bug will likely weep in despair, but we’ve done our best to ease the pain. Our Sentry setup aims to provide as much contextual information as possible and lead us in the right direction.


It’s Not Over

We’re very much still refining our JavaScript error handling process. It was a very rocky start, with blaring notifications flooding our inboxes with hundreds of email alerts off the bat. It took some time to tweak our notifications threshold to the appropriate setting, and I took some flak for spamming the company. That’s just an example of how we’ll be reactively tuning our process over time.

If you’d like to share some insight into how you or your company handles JavaScript exceptions, I’d love to hear about it! Write a response, tweet at me, get in touch.


PS. If you’re into the sort of things described in this article, Crowdtap is hiring!

Like what you read? Give Jico Baligod a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.