Introducing Woopra Data Loader

Elie Khoury
Woopra
Published in
9 min readNov 19, 2019

You can now import existing data to Woopra to retroactively map the end-to-end user journey.

Today, we’re excited to introduce a new feature that we’ve been working on for over a year: Data Loader.

We hear the following scenario over and over: you have existing customer data in SQL databases, data warehouses, Salesforce, and a multitude of other platforms. You want to understand how this data impacts the user journey, but you have no way of leveraging that historical data. This is where Data Loader comes in.

With Data Loader, customers can now import and convert their existing customer data from various sources into Woopra as actions without involving a developer. This ultimately enables our customers to retroactively map the end-to-end user journey across all channels.

What problems are we solving?

Woopra’s initial approach to data collection has been through front-end trackers, such as our Javascript Snippet and its underlying HTTP Tracking API. The primary focus has always been on real-time tracking of user behavior across websites, web applications, and mobile apps.

Around 2014, we introduced AppConnect to evolve our data collection systems and collect data from third-party service providers such as Zendesk, Stripe, and others using Webhooks, a technique that was commonly used around that time.

This approach helped our customers collect real-time customer behavioral data across all their channels to create the most comprehensive picture possible of the user experience.

This may have been the only way to collect large amounts of data at the time (I will get more into this later in this post), but this strategy has multiple limitations.

Data Integrity

Front-end trackers are limited mainly because the data is pushed by the user from an uncontrolled environment — the browser. While there is no way around that for front-end tracking, some important milestones such as a “user signing up” are being recorded in the company’s database. It may make more sense to collect data right from the source of truth rather than rely on the front-end to notify our servers that the user has signed up.

Using Webhooks for real-time data collection from service providers can also be unreliable. Webhooks might be skipped at some point for whatever reason and may end up causing problems like broken funnels.

Data Retroactivity

Most of our customers start using our platform years, if not decades, after they launch their business, so they already have existing data.

Tracking user behavior is, unfortunately, an afterthought by many companies. By the time they’re ready to start tracking their users, the traditional tracking methods will only track data from that moment forward.

That creates two significant issues. The first is that customers are limited to new data collected and will never be able to leverage years of data they already have. Second, customers will need to collect data for two to three months to reach an ah-ha moment.

Developers Must Get Involved

The third issue with traditional data collection methods is that developers have to be involved.

Developers always have bigger fish to fry, and we often face resistance and delays to getting code snippets installed and configured to track custom actions. Any little improvement or new custom action to be introduced will often have to wait weeks, if not months.

Our friends at Heap introduced an excellent solution to streamline this process by tracking all possible interactions and move the responsibility of taxonomizing data to the analytics user rather than a developer.

Since then, my colleagues at Woopra and some customers have suggested to follow that path and implement similar technology. I’ve been hesitant to move forward with this initiative for many reasons, which I won’t get into at length in this post, but mainly because we’re not in the front-end micro-interactions tracking business. It conflicts with our vision to track the end-to-end user journey. I believe that instead of replicating that solution and collecting noise that is irrelevant to the problem we’re trying to solve, we should start looking at the front-end behavioral tracker as a yet-another channel. We should also partner with the best of breed services like Segment and Heap to import just the interactions we need to complete the user journey.

This leads me to the following point.

The Convergence of Business Intelligence and Analytics

If you’ve been observing the data space over the past few years, you probably noticed the re-emergence of SQL and relational databases. If you didn’t catch that trend, try this Google search. Just a few years ago, the direction was that SQL is dead, and NoSQL is the future. What happened?

Amazon happened. Amazon introduced RedShift, which redefined what’s possible with relational databases.

Remember the Big Data buzzword? Companies realized last decade how valuable data is. While it was technically and organizationally challenging to store structured data due to the lack of scalability of structured and relational databases, they resorted to leveraging Data Lakes, which are designed to accept any non-structured data. They then put the weight on Data Scientists to scrape the data to get a few complex questions answered.

The introduction of a scalable relational database system like Redshift enabled companies to implement a data strategy upfront to collect data in a structured manner. This shift ultimately empowered SQL experts to get questions answered without the need for Data Scientists to write complicated code to parse through unstructured data.

As a result, a new breed of Business Intelligence cloud services emerged, such as Looker and Periscope Data. Just last year, Alphabet acquired Looker at a $2.6 billion valuation, which is impressive for a relatively new company.

Business Intelligence became more than just a data visualization tool for executives to assume correlations from line charts generated by siloed data sources. Instead, it was finally possible to bring all customer data together using SQL interfaces and make real data-driven and deterministic correlations, which ultimately drove more confident decisions. In other words, companies can now tie individual user data together and find deterministic correlations instead of having to make risky assumptions by observing two sets of aggregated siloed data.

Why is this important? It’s important because the customer analytics space and business intelligence space are converging. Those two species of data solutions are now solving the exact problem.

What happened next? Similarly to how Webhooks trended in 2013, cloud services now realized that their value increases by making their data available in data warehouses. They are offering basic analytics, but that data cannot be tied to data from other sources. A marketing automation service can tell you how many emails were sent on a day to day basis and how many were opened. That’s great, but it will not tell you how that email campaign is affecting your customer engagement and how a particular audience is resonating with campaign X and not Y.

For that reason, most service providers have launched or are launching the ability to dump their data in some data warehouse. If they don’t, companies like Stitch Data are helping their customers do just that.

Modern data warehouses like Amazon Redshift, Google BigQuery, Snowflake, and others are at war today to be the “final destination” for all company customer data.

Now, to the problems.

Need For Data Experts

Using Business Intelligence solutions today requires data experts — or SQL experts to be more specific. It’s true that companies no longer need to hire dozens of data scientists to answer fundamental attribution questions. However, if your team doesn’t know SQL, they’re still running blind and waiting in the data breadline to validate their efforts and make data-driven decisions.

SQL for User Journeys

The SQL interface is absolute horsesh*t for attribution analysis. While it can help you narrow down complex multi-dimensional data into one simple two-dimensional tabular report, SQL is not the right solution for generating graphs — and user journeys in particular. It doesn’t mean that you can’t do it, but you will probably have to write a three page SQL query to generate a very basic funnel (e.g., X users did step A, then 50% did step B, then 10% did step C).

It gets even more complicated when you want to analyze a particular initiative and its attribution to the success or failure of the customer.

SQL Is Too Generic

While SQL is mighty, it’s too generic for the average decision-maker at companies.

Suggesting that marketers should learn SQL is like asking Uber drivers to pave the road and build the car from the ground up.

SQL queries can be used for any type of data, including satellite data, traffic lights, weather conditions, and more. But, when it comes to customer analytics, there are two constants that we are dealing with:

  1. The structure of the data: Timestamped user actions and unique identifiers
  2. A finite set of possible questions: Behavioral segmentation, attribution analysis, funnel reports, journey mapping, growth trends, retention and cohorts analysis

So optimizing for a specific data structure and building a friendly drag & drop user interface that abstracts the complexity (and limitations) of SQL queries is arguably the best way forward to democratize customer analytics.

Which finally leads us to the initiative we’re launching.

Launching Data Loader

Our mission at Woopra is democratizing customer data to enable every team member to analyze the effect of their work on the success of the company. A key component to realizing this mission is providing a complete view of the end-to-end user journey.

With the launch of Data Loader, Woopra customers can now import and convert all their existing customer data from various sources into Woopra as actions without involving a developer. This enables customers to retroactively map the end-to-end user journey across all channels.

Here’s How it Works

Woopra has full control over the underlying data architecture. Our proprietary database system is optimized for sophisticated queries like behavioral segmentation, journey analysis, and attribution reporting. And as a result, users can get all their customer-related questions answered in seconds, without the need to work with data experts! Which of course is impossible with traditional SQL based systems.

To make all this possible, every behavioral data point we collect must adhere to the following rules:

  • When: Must have a timestamp — we must know when the user did that particular action
  • What: What is the user doing in this action, and optionally what metadata is associated with this action? (e.g., view page x, create a ticket, use feature X)
  • Who: Must have a unique identifier so we can tie that activity to the right user.

So if we can take all the data you have and identify these three variables, we can successfully import your existing data, convert it to a behavioral footprint data structure, and keep it in sync.

Let me give you a typical example. A SaaS company has some database of users, which is their source of truth for authentication and more. That table looks as follows:

In this scenario, we can convert entries in that table into a “Signup” action in Woopra as follows:

  • When: date_created
  • What:Signup’
  • Who: ID (unique), email (unique), first name, last name

This mapping process can be configured by directly connecting your data source to Woopra. Woopra will automatically load the schema (all the tables and columns). Then our customers, in collaboration with our data consultants, can map all these tables into actions and user properties. On top of that, Woopra offers an ID Graph system to handle all the scattered unique identifiers across all the services and databases you’re importing data from.

Once the initial batch is imported, Woopra can look for changes as fast as every minute, to make sure that your data is continuously in sync.

One data source at a time and one table at a time, we can build the whole end to end journey overnight without wasting our users’ and their developers’ time. What’s impressive is that many of our customers host all that data in one final destination data warehouse. But even if they don’t, Woopra will connect to your service providers and import existing data the same way by loading schemas as tables and allowing you to join data to match unique identifiers across multiple data sources.

Even more? We will have a mapping template pre-generated for all the service providers that you chose to load data from. So in most scenarios, loading tickets and payments historically would be as simple as authorizing your helpdesk system and payment gateway.

Our first release includes connections with leading database technologies:

  • Amazon Redshift
  • Google BigQuery
  • Azure
  • Snowflake
  • Cassandra
  • MySQL
  • Postgres

as well as cloud services such as:

  • Salesforce
  • Zendesk
  • Stripe
  • Hubspot
  • And more

We now have the infrastructure to scale our data sources through partners, and we’re open to integrating directly with most data sources upon request.

We can’t wait to see you use the Data Loader.

--

--