Measuring the Data You Don’t Have

Pavel Studený

Published in

Outreach Prague

5 min readAug 28, 2023

If you build anything connected to internet, you should collect usage data from your product for two purposes:

You want to know the current status, primarily reliability
You want to make data-driven decisions about future improvements. Usage insight is critical to address your users’ needs.

This article focuses mostly on reliability. However, you can use the same data for usage insight.

To have a reliable product, you need a mechanism to safely roll new features out and back. There would be no point in having such a mechanism if you could not evaluate how your new feature is doing.

Tools Let You Focus on Your Core Business

First of all, does it crash despite all the testing? In Outreach, one of the tools we use is Rollbar. It can not only show and group stack traces, but also capture user logs before the crash. Rollbar could be stronger e.g. in recognizing anonymous TypeScript functions.

After crashes, there is a question of usability. Are people using the new feature? If there are multiple ways to access it, what’s the most and least used one? People most often use a new feature, because they are trying to achieve something. Are they successful in that? Is it fast enough? And maybe a customer would approach you with a specific issue and you want to be able to investigate. You want to cover at least some of these cases and you need product telemetry to do that.

There are plenty of services that focus on some telemetry aspects.

Google Analytics measures pretty much automatically how users navigate on your web pages.
Honeycomb is great in monitoring your backend.
Amplitude gives you info about product usage, funnels and conversion.
Tableau can visualize data that have a complex relationship.
Databricks is a data mining service.
And DataDog becomes more or less an industry standard in processing ad-hoc data, that allows you to build monitoring and alerts on top of them.

From Events to Scenarios

In Outreach, we found DataDog useful for both frontend and backend, although we are monitoring its competitors. DataDog lets your product send ad-hoc structured logs without any prior setup in DataDog. The data would look like

{
  eventName: 'redirecting authentication request',
  app: {
    name: 'auth',
    version: '1.2.2'
  },
  level: 'info',
  deployment: 'cluster A'
  ...
}

It’s possible to set indexing on some of the fields. This brings extra querying possibilities. You can also set metrics and alerts on top of the metrics that would wake you up if certain criteria aren’t met. For example, if you receive too many logs with a specified error.

Scenarios

We have introduced scenarios, that are a series of user actions that represent a user activity that accomplishes a given task. For example an import scenario would consist of clicking an import button, collecting the data, processing the data and displaying a result. All scenario logs have a unique ID. I.e. if you perform the same scenario twice, you will have two different IDs.

If everything goes well, the scenario finishes with a success and the last log in the scenario marks a successful completion. If anything fails, e.g. the data processing ends with an error, the scenario ends with a failure. Each of the scenario steps include a duration from the start. A series of scenario logs would look similar to

{id: '1AF3', name: 'import', step: 'Started', duration: 0}
{id: '1AF3', name: 'import', step: 'ImportButtonClicked', duration: 0}
{id: '1AF3', name: 'import', step: 'DataFetched', duration: 412}
{id: '1AF3', name: 'import', step: 'DataProcessed', duration: 456}
{id: '1AF3', name: 'import', step: 'Success', duration: 461}

Keep in mind that for DataDog or any similar service, each of the scenario logs is independent. It still allows you to observe how many users have started the import and set an alert if the ratio of failed vs. started imports was too high and an alert if the import started to take longer than expected.

The Missing Data

Now, what happened: A customer reached to us that they cannot process emails with Outreach. We checked the data coming from the customer and everything looked great: reliability was over 99%, processing times were acceptable, so what could be wrong? We kept investigating until we had the idea to compare how many scenarios were started, succeeded and failed. You are probably guessing already where this is leading. Out of all started scenarios, about 0.5% failed, 80% succeeded and 19.5% neither failed nor succeeded. We figured out that our product was obviously killed without a chance to send any info about it.

We were curious about the impact on users: is it a few users that never finish or does it happen every now and then for everybody? Does it depend on an operating system? And so on. The log lines are independent, though. You cannot really search for a starting log line that does not have a corresponding success line. You can look up a single starting line and check: is there no match or failure or a success with an id that matches the starting log id? This is fairly difficult to repeat if you have millions of logs.

We needed extra tools. DataDog has an API that allows you to connect and download logs matching a query, and filter specific customer, scenario steps, start time or end time. Such an API allows to limit the downloaded data to a manageable size that fits easily to a local computer drive.

Then you run a script to process the data. The tool simply collects all the scenario IDs that represent in the data. After that, it removes the IDs that finished and goes through the data again with the remaining ones.

Alternatively, even better, if you know a bit of python, you can use a notebook that does both the querying and processing.

With this approach, we were finally able to find out that the problem happens on Mac, found a bug in a specific version of our partner’s code, reported that and it was fixed fairly promptly, thanks to all the data we have provided.

Don’t measure just failures

What was the learning for us? Don’t compare failures and successes. Compare the overall number with successes instead. Failures and missing scenario completions still provide extra insight into what’s going on.

In summary, log events give you info about feature usage or errors and scenarios measure how long it takes for users to achieve their goals. You will probably want to set up dashboards to see the current status and alerts to be notified when anything unexpected happens. Have fun!