The 3 Questions You Must Answer to Build a Valuable Analytics Product

Published in

Udemy Tech Blog

7 min readJun 6, 2019

If you’re building an analytics product, it won’t always be easy to follow the standard product definition process: identifying the user, identifying the problem you’re trying to solve, and brainstorming/iterating on solutions. Many decisions will depend on the data you have available and the limitations you have in data quality, as well as in collecting and processing this data.

From our experience, we found that it’s best to make this an iterative process where you expect to update and possibly even redefine solutions as you build.

To illustrate, we’ll walk you through our experience building the Instructor Performance tool, Udemy’s first comprehensive instructor analytics dashboard. The Performance tool empowers Udemy instructors to understand their course performance and identify areas for improvement. Ultimately, we landed on a process that worked well for our organization, and we hope these best practices can help you succeed when developing your own data product.

It’s going to be important in your journey to focus on the three key questions:

What problem are you trying to solve?
Who are your users?
How will you adapt to constraints/limitations along the way?

Keeping these three areas in mind throughout the development process is essential to build an intuitive, actionable analytics product.

What problem are you trying to solve, and how can you stay on track?

Don’t allow the data to bias which questions you ask — rather, find the data that answers the questions your users care most about. Take care to understand the story you want to tell. What insights do you want your users to draw from the data?

One of the most interesting challenges we faced in the development of the Instructor Performance product was the Traffic & Conversion section. We envisioned a tool that would enable instructors to understand how visitors arrived at their course landing pages, where students can view the course description and decide whether to enroll, and how many of those visitors ultimately chose to purchase the course.

Using visitor and conversion breakdowns and a list of referral links, we aimed for the Traffic & Conversion tool to give instructors specific and actionable insights into the sites and channels that drive traffic to their course landing pages. This required us to define concepts such as a channel attribution strategy (how to categorize site traffic by channel, such as Udemy search, external sites, instructor promotion links, etc.). We hypothesized that the most meaningful information for instructors would be the action a student took just before discovering and enrolling in a course (i.e., the “last touch”).

There were significant differences between how a student started their session on Udemy and how they discovered a specific course. We started thinking about different opportunities to reconstruct hypothetical user journeys, but ultimately revisited our initial goal and determined that this wouldn’t answer the question at hand. Instead of getting distracted by all the data we had at our disposal, we chose to focus on referrers that had led directly to the students’ discovery of the course landing page.

Additionally, don’t fall prey to the temptation to reuse internal data definitions or labels that users won’t find self-explanatory. Just because there is a business use for the certain labeling of data, this doesn’t necessarily translate to the product you are building. Feel free to rename labels and group data to fit the product, not the data source. For example, at Udemy we divide our paid channels into several categories, but we chose to display this data to instructors in one group labeled “Ads & affiliates” since the more granular data is not actionable for them.

When dealing with legacy features that weren’t built with analytics in mind, it is important to be flexible in classifying data to make information digestible.

On the Udemy platform, there are many cross-sell components, from topic pages to Udemy-generated course recommendations. Other teams are constantly iterating, A/B testing, and introducing new cross-sell features. We needed to decide how best to classify these components to ensure instructors can consume their referral data and make informed decisions.

One challenge is that the tracking data for these components was never intended to be consumed by instructors. It was introduced to track the success of each specific feature, without considering how to categorize it relative to other cross-sell components.

We decided to tackle this problem by looking only at the most common categories that comprise the largest subset of the traffic. Many components are similar enough to be grouped together, and conversion for similar components can be improved by similar actions: the instructor has the ability to improve the content of their course landing page, which would be reflected on the cross-sell components, but not to make changes to individual components themselves. Therefore, we chose to provide the high-level groupings rather than a more detailed breakdown to make the information more digestible and easier for instructors to improve upon.

How can you stay focused on your users’ needs when making data decisions?

Analytics tools that will serve a wide variety of users (with different levels of experience, knowledge, etc.) present a challenge when identifying which data will be actionable for the most users.

Clearly communicating our channel attribution strategy to all users was crucial, whether they were experienced instructors or beginners. We had to consider power users, who are familiar with our existing channel attribution models used for revenue splitting, and newer users, who are not familiar with our channel categorization. The goal was to communicate clearly how referrers were grouped into channels, as well as the types of data that are missing (e.g., users opening a link in a new tab) and how these “unknown” data types are categorized. This is a different attribution model than the one used for instructor payments, so we needed to communicate clearly how these categories were defined.

We made similar decisions in other parts of the Instructor Performance tool. For example, we realized that some data points were difficult to compute quickly. By exploring the different types of users who would engage with the product, we discovered that different instructors had different needs regarding data freshness. Instructors with thousands of enrollments don’t generally need their enrollment count to be updated in real time. On the other hand, brand-new instructors with only a handful of students are energized and motivated by real-time updates of new enrollments. Therefore, we chose to pre-compute enrollment values a few times per day for high-volume instructors, while showing real-time data to newer instructors. In this way, we were able to support the needs of various types of users while working within our data processing constraints.

Our instructors are highly invested in their success on Udemy, and it is crucial that we help them understand where data comes from, how often the data is updated, and where the data limitations lie.

In addition to clearly communicating the definitions of referral channels and our attribution strategy, we needed to communicate how frequently the traffic and conversion data is updated. We set expectations around data recency by indicating in the UI when data is “unprocessed.”

How can you make the most of your limitations/constraints?

Technical constraints can be your friend! We were challenged to think more critically about why we chose certain product definitions, which eventually led us to improve the product.

We faced data size constraints in our development of another feature of the Instructor Performance tool, which would show other topics and courses an instructor’s current students were enrolled in to identify opportunities to create new courses. However, this generated a massive amount of data, especially for instructors with a large number of students, so we chose to limit the data to enrollments during a specific time range. This decision resulted in an improved and more actionable experience for our instructors, since their students’ most recent behavior is the best indicator of their current intent to purchase additional courses.

Understanding the limits and accuracy of our data helped guide decisions about scope and offered insights that would drive more value to our users.

While developing the Instructor Performance tool, restrictions around the development time and effort required to recreate every Udemy-sourced URL pushed us to make further improvements to the product. We decided to focus on the most common types of Udemy-sourced traffic and bucket the rest into general categories in order to unblock development and help instructors identify the greatest opportunities to optimize their course landing pages.

If you are able to answer these three essential questions, you’ll be ready to build a valuable analytics product. This is an iterative process and your product will evolve; it’s essential to be flexible and willing to change your plans based on what you find when you dig into the data. One last tip…

Keep a record of your discoveries and develop a set of best practices you and other teams can refer to during future projects.

Here’s our preferred process we identified for building a data product:

Brainstorm on the design
Explore the data
Revisit your designs based on limitations in the data
Development and user testing

In other words, measure twice and cut once.

Throughout the development process, we learned how to effectively use the data and resources available to us while refining our product requirements to communicate clear, actionable insights to our users. We hope our learnings from this project will help future teams at Udemy and beyond to work more effectively when building new analytics tools.