Understanding Your Users with Consistent and Reliable Event Data
Understanding how our customers interact with our product is crucial to Airtasker. In the data space we call this user behaviour analytics. User behaviour data is collected as events that are triggered by user actions. Because they’re specific to users, we’re able to understand our customer base in aggregate, drop into specific cohorts, and even review an individual’s journey to understand the challenges faced. User behaviour analytics plays a critical role in feature planning (both development & decommissioning!) and experimentation.
Importantly for Airtasker, we’re also using it to drive an outcomes focused culture for our product and engineering teams. User behaviour analytics offers a window into the real world impact of their work, a connection that’s hard to maintain when residing in the often abstract world of software development.
Maximising the value of your user behaviour data requires that it is consistent, both semantically and structurally. Semantically, it’s important to ensure events having the same meaning across platforms (e.g. web, iOS, Android). For example, avoid having a Sign Up Completed event defined as having completed a user profile on iOS, but as providing only the username and password on Android. A lack of consistency makes it difficult to understand how your product performs overall, and to compare performance between platforms.
It’s also imperative to define events with a common structure, so you can interrogate them in a consistent and expected way. This makes it easy for users to understand and work with the data. This is particularly challenging when multiple teams working on common areas of a product generate event data independently. This common problem typically results in event data sets that are difficult to use. You have multiple different events for the same action, casing differences in event and property names (ouch!), and a mixture of similar but only slightly different property values (“logged in” vs “logged_in” anyone?).
Being aware of these challenges at Airtasker, we put thought into how we could solve these when we kicked off a cross team effort to reset our product instrumentation.
We started by developing a product instrumentation plan that abstracted away specific platform and implementation details, and focused on the product actions the user was taking in the real world. For example when a user posted a task, the event was Post Task Completed, as opposed to Post Task Button Clicked, which might need to be different on the mobile apps if the UI for posting a task was different (e.g. Post Task Button Pressed). This method of defining events ensures the product and UI can evolve at an implementation level without rendering critical events on the product journey obsolete or misnamed.
Once we had our instrumentation plan, we turned to the challenge of semantic and structural consistency. We felt the best approach was strong data governance enforceable via code. We created a single Github repository that defines every event we track, as well as the associated properties. This single source of truth for event definitions provides a company-wide view of the events we collect, enforces consistency and reuse across platforms, and supports a strong data governance process, as new instrumentation can only be added through the use of approved pull requests.
To give you some insight into how we structured the repository, there are two key YAML files,
The event definitions file defines every event to be instrumented, along with its structure, attributes, allowed values and so on. For example:
name: Assign Task Accept Offer Initiated
category: Assign Task
description: An event that fires when a poster begins the flow to accept an offer made by a tasker
- parameter_name: initiated_from
- Task Details
- Review Offers
- Offer Chat
- parameter_name: time_since_offer_made_mins
description: The age of the offer, measured in mins (use a floor() approach, i.e. 59 seconds = 0 mins)
The model definitions file defines the models that could be added to an event if that model was in context for the event. The models are our common business entities, like tasks and offers, and the purpose of defining the models was to ensure the model properties were consistently structured across events and platforms. For example:
description: The ID of the poster, sent as a string
description: number of characters entered into field
description: number of characters entered into field
To support all this, on our client platforms, web, iOS and Android, we implemented an Airtasker Analytics library that sends events to Segment, and allows developers to trigger an event in a type-safe manner. When developers use the library, they don’t provide the formal event name, or model details, instead each event is associated with a method which can be called in order to trigger the event. All the developer has to do is provide the required models and additional parameters as method parameters.
One of the coolest things about this entire project is code generators which read the event_definitions YAML and automatically generate these methods. We currently have code generation for event methods in Swift and Kotlin and aim to have TypeScript definitions available soon. Code generation guarantees event, property name and property value consistency and makes it simple for engineers to add new events without mistakes. On top of all this, we implemented a linting library (Cerberus) to ensure that only valid YAML was committed to the repository.
As an outcome of all this, our user behavioural analytics has taken a huge step forward. We’re gaining new insights into how users interact with our product, and we’re encouraging a culture that values customer impact. All of this comes together to contribute to our ultimate goal, building a loved product that empowers our community of taskers to realise the full value of their skills.