Published in

CircleCI

9 min readJul 6, 2016

Testing With Untrustworthy Data Means Nothing: how CircleCI came to grips with declaring bankruptcy on analytics and rebuilt from the ground up

When I joined CircleCI as a Growth Engineer, I came on board with one mission: improve conversion rates. Having freshly transitioned from an ecommerce startup, I had spent the past six months designing and iterating on funnels which turned user acquisitions into conversions. During that time I had learned a pretty basic formula for improving conversion rates:

Enumerate the key funnels
Find an area of weakness in a funnel
Hypothesize why users are falling off
Test that hypothesis
Look at the data to either confirm or deny the hypothesis
Lather, rinse, repeat

So when I joined CircleCI Rishi (our Growth Project Manager) and I went to work. First we enumerated our key funnels: non-user to user, user to paying customer, and then paying customer to higher paying customer. We used the knowledge and data we had at our disposal to identify what funnels we had and which were underperforming. And last, we launched test after test to try and improve acquisitions, conversions, and upgrades.

But as we started to try and measure the success of these tests, we kept running into the same problems. We either couldn’t trust the data, or we were missing the data we needed to tie the test to the larger picture: the health of the business. After careful consideration, we decided it was time to declare bankruptcy on a broken analytics implementation and build a new one from the ground up.

The Problem(s)

Looking at our defunct analytics implementation we found four main issues:

The data was untrustworthy
The events lacked cohesiveness
We lacked the ability to track org funnels
We lacked the ability to connect demographic and event data

Untrustworthiness

The most obvious, and arguably the biggest, problem with our old analytics implementation was its untrustworthiness. Early on I was told that our event tracking was dropping 15% of all new user signups. This was luckily caught by the redundancy of our production data. For any given day, if we had 100 new users in the database, we only had 85 signup events in Mixpanel. A predecessor of mine had done some investigating and believed this was caused by a race condition with how Mixpanel assigns user ids. However being a small team that prioritizes well defined problems, and considering analytics was often deprioritized anyways, this issue had constantly found itself at the bottom of the backlog.

But this made us wonder, what about our other events? Were they also dropping 15%? This specific race condition may have been unique to the signup event, but there could easily be other bugs with our implementation that went unnoticed because we lacked the same sort of redundancy. Even though this was a localized problem, it was a crack in the dam, and it quickly eroded away any confidence we had in our event data. At this point we didn’t trust any event we hadn’t implemented ourselves, which had us asking “why not just start over?”.

Lack of Cohesiveness

Another big issue was complete lack of cohesiveness in the naming schema of our events, which lead to me dubbing the old analytics implementation “90 good ideas implemented 90 different ways.” Looking at the code, it was clear that each event had been a bespoke implementation by a lone engineer. And while each made sense in isolation, the combination of them with a lack of overarching visions made the aggregate completely unusable.

At one point during our process, while trying to build a funnel, we came across two eerily similar events: “click-plan-update” and “update-plan-clicked”. Looking at the analogous events in Mixpanel showed no clear differentiator, so the only solution for figuring out which to use was to wade into the code to see which event corresponded to which user action.

It’s inefficient for the people using analytics to waste time simply trying to understand the events they’re using. And, in my experience, it means they just won’t use the data.

Tracking Org Funnels

Another flaw in our analytics had nothing to do with the implementation but was just a shortcoming of Mixpanel (and every other event platform in the market): we needed to track funnels from a user level as well as from an organizational level.

While users sign up for the platform, it’s ultimately organizations that upgrade and downgrade. And while small companies may see the same user who first added an organization making plan changes, at bigger companies it’s not unusual for a manager, or even someone from the finance team, to make them.

This lead to having users with very little interaction on the site enacting substantial monetary changes. So it was not that user’s behavior which lead to the upgrade or downgrade, but rather the aggregate of all users who belonged to that organization. But without the ability to associate individual events to an organization, there was no way to correlate user behavior to these changes.

From what we’ve seen, every event-based analytics platform only allows the user to create funnels and correlations based on one property: user ID. However, when trying to predict the behavior of an organization based on its user’s events, we needed to group events on the organization ID property.

This problem could normally be solved by cohorting each organization, and following the behavior of that cohort. However our users can switch organizational context within the same account, and some views are aggregates of all their organizations and therefore should be attributed to none of them (or all of them, depending on the point of view).

This is a problem we have not seen solved by any third party analytics provider, and one that is core to our ability to prove our the effectiveness of different tests and features on our platform.

Partitioned Data Sources

The last problem was our partitioning of event and demographic data. While all of our event data lived in Mixpanel, all of our demographic data (a clone of our production data) lived in our own Postgres instance. This meant there was no way to ask questions of our data which needed a combination of both events and demographics, which was a serious limitation to our analytics.

As a contrived example, if we wanted to know: last month, what percent of organizations that upgraded were Ruby shops, and what percent were Python shops? While we knew how many organizations on our platform used Ruby vs Python via our demographic data, and which organization upgraded last month via our event data, there was no easy way to combine them.

Infrastructure

After we had identified the main problems with our analytics, the next step was finding which third party platforms would help us solve them.

Using third party platform inevitably leaves us with a less flexible architecture that sometimes can’t answer certain questions (for example, our earlier problem cohorting by org id). But as a small (one engineer) Growth team, time is our most valuable asset, and saving ourselves time in both initial development and continued maintenance was a high priority. We were not looking for perfect, just the most value for the least effort.

After evaluating many possible platforms, we landed on 3 to be the bedrock of our analytics.

(Although I won’t go through all the platforms we evaluated here, we have opinions on many, so feel free to email me if you want to chat.)

Amplitude

Our first decision was whether or not we needed to replace Mixpanel. As I explained earlier, we had problems with our specific implementation, but that didn’t necessarily mean another implementation wouldn’t work.

We evaluated a number of options in the market, from Heap to Keen.io, CrazyEgg to Fullstory, but looking at the various analytics implementations it became clear that what we needed was a custom-event-based analytics platform: one where we could selectively create events on the features and user actions we needed to track, then easily set up funnels and correlation analysis via the third party UI. That is, we needed what Mixpanel offered.

However we then discovered a relatively new player in the analytics space with a similar event based analytics platform: Amplitude. Besides offering the Mixpanel features we had grown reliant on, such as funnels and event segmentation, they offer additional analysis features such as event flows and one click cohort creation. And as a kicker it would cost us a little under half of what we were spending on Mixpanel for the same number of monthly events.

But our issue with our old implementation had not been with Mixpanel’s feature set, or the cost, but with the fact that we were missing events. So trialing Amplitude, we decided to focus on a specific problem: dropping 15% of user signups.

During the trial, we immediately noticed that Amplitude forced us to set user IDs, unlike Mixpanel which had assigned them for us, causing a race condition. Since we had control over the creation and storage of user IDs, we could ensure they were being properly set before firing our signup event. We ran the test and Amplitude showed the correct number of signup events, compared to Mixpanel which showed -15%.

We were sold.

Segment

The next third party platform we added was not necessarily something we were looking for, but has since became the core of our analytics infrastructure. While implementing Amplitude for our trial, another engineer at CircleCI suggested we use Segment. Being a hub-and-spoke model event data platform, Segment allows users to turn on and off integrations to third party analytics providers like Amplitude, Google Analytics, Heap, etc. This meant if one provider wasn’t meeting our needs, all we needed to do was turn their switch “off,” and another provider’s switch “on.” The one time setup, where every integration is as easy as flipping a switch with no engineering required, has turned out to be a huge help as we’ve trialed different platforms for both our Product and Marketing needs (and has significantly lowered the cost of doing so).

The other product Segment offers, and the one that is arguably more impactful, is their Warehouse product. This product converts event data to SQL, converting each event into a table, and each event property into a column in that table. This feature solved two major issues for us.

Being able to track org based funnels

Having our data in SQL allowed us to attach organization IDs to each event we fired which took place in the context of an organization. Since each property becomes a column in that event’s table, we could then group all events by organization ID in SQL and use that to create org based funnels. This allowed us to start answering questions and drawing correlation about what user actions were leading indicators of an organization upgrading, downgrading, and churning out.

Being able to combine our event and demographic dat

A benefit of Segment’s Warehouse product is that users can choose a hosted database or provide their own. We went with the provide-our-own option, since it allowed us to store our event data in the same Postgres instance our demographic data. This immediately solved the problem of answering the more complex questions that required the combination of those data sources.

It’s our belief that in the natural growth of an analytics team, eventually questions come up that a “one-size-fits-all” third party platform like Mixpanel and Amplitude can’t answer. However, by owning our own data, in its raw form, we are only limited by the data itself, and not the UI on top of it.

We see this as being the long term solution to analytics, and Segment as the platform that enables that.

Looker

The last important piece of our analytics infrastructure was one we already had in place: Looker. Looker is a powerful query and visualization platform that sits on top of an analytics database. While it takes someone with SQL knowledge to set up, once configured it significantly lowers the barrier for non-SQL fluent users to generate reports and answer bespoke questions with data.

While Looker has been in our technology stack for quite some time, we were never able to query our event data, only our demographic data. With Segment’s Warehouse product enabling us to load our event data into Postgres, we are now able to use Looker to query both. This made the platform more powerful and significantly more useful.

By investing in initial setup, and making all data available to Looker, we’ve empowered non technical employees to easily write queries and generate complex reports. Whereas before only Product Managers used Looker, it is starting to be used more broadly by employees in non technical roles. For example, our Customer Success Managers are now using it to learn more about their customers.

In the future I see Looker as being the crux of CircleCI’s business intelligence.

Next Steps

After coming to terms with how broken our analytics were, and figuring out which third party providers could help us fix them, it was time for the real work to begin: implementation. While having the right third party platforms help in building a strong analytics infrastructure, in the end what matters most is how easy it is for engineers to quickly add events which the data consumers can parse and use.

That, however, is a story for another day, so stay tuned for part 2 coming soon!

Originally published at circleci.com on July 6, 2016.