Export Your Google App Engine Logs to BigQuery: Introducing Mache

Aleem Mawani
Streak Engineering Blog
3 min readJun 9, 2012

Here at Streak we know that our users’ inboxes are central to their day, and we’re honored that they let us help organize their email. But with great placement comes great responsibility: if Streak doesn’t function perfectly, it’s front and center. To ensure that we catch any errors as soon as possible, like all good webapps, we keep track of any errors that occur either on our servers or in our users’ inboxes. Additionally, we gather a number of anonymous metrics that help us decide where to focus our efforts on improving the product.

We’ve been using Google App Engine’s built-in logging framework for this purpose. It’s worked very well for diagnosing errors as they happened, but we also want to get a higher level overview of the information we’re logging. We want to answer questions like:

  • How often do people access their Streak pipelines?
  • Is the newest client version fixing the error our Belgian users were encountering?

We’d also like to look further back in time: did that work we did improve our latency today compared to last Friday? Luckily, Google recently released the BigQuery API for just this kind of high level, big data processing. As soon as we saw it, we knew we wanted to use it to figure out where we could best help our users. We just needed a way to export our App Engine logs to BigQuery for processing.

In the grand tradition of scratching our own itch, we’ve been working on a way to do just that. We’ve been pleased with the results so far, and we’re hoping that other App Engine users in the same boat will find it similarly compelling.

We’re proud to announce Mache, an open source Java framework for exporting App Engine logs to BigQuery. It’s available at: http://github.com/StreakYC/mache

Mache is powered by a cron job that runs every few minutes. It exports logs to Google Cloud Storage, where you can store them indefinitely, and then initiates the job to import the logs to BigQuery. It’s extensible: you can define your own log parsers, choose how often to export the logs, and decide how to aggregate them into BigQuery tables.

An example run of Mache

See the GitHub README for a full explanation

After you’ve imported your data with Mache, you can parse your data with the full power of BigQuery. For instance, want to know which URLs are costing you the most to serve?

After adding Mache, just go to the BigQuery browser and run:

SELECT SUM(cost) as total, path FROM [{table name}] GROUP BY path ORDER BY total DESC;

Or are you using App Engine’s traffic splitting feature and want to know whether the new version is serving more errors? Just run:

SELECT SUM(IF(httpStatus >= 400, 1.0, 0.0))/COUNT(*) as errorRate, versionId FROM [{table name}] GROUP BY versionId ORDER BY versionId;

If you’re interested, check it out, hack on it, and let us know of any improvements you’d like to share. We think you’ll love the insight it gives you. We do.

--

--