How Dailymotion improved by collecting billions of events
In a world where quality has become everything, marketing teams are looking for even more user feedback to stabilize and improve their products while maintaining respect and protecting their users’ privacy. That is where the Support and CRM teams traditionally come into play as collectors of this precious source of information. Yet we still need more instant and reliable feedback to encourage Dailymotion’s development. That’s why we’ve decided, in conformity with the applicable personal data protection legislation, to collect billions of “events” every day on our apps. Not thousands… billions. Here’s why it’s definitely a good idea.
Why did we need to collect so many events in the first place?
A few years ago, we were using tools like Google Analytics to study use cases on our applications. It was OK for the purpose of traffic source identification and audience analysis. However, it was not accurate or flexible enough for advanced usage. Getting the display/click ratio on specific features (“recommendation”, “related video”, “search page”…) or monitoring the fluidity of our application worldwide was not supported.
For all the reasons mentioned above, we decided to collect billions of events from our player and website/mobile apps using a homemade event bus. In doing so, we took into account the legal constraints applicable to the processing of personal data. For example, we identified the legal basis for processing data and ensured that adequate technical and operational measures were deployed to provide security and confidentiality for any personal data subjected to that processing.
The use of an in-house event bus for collecting information offered multiple advantages:
- A specialized SAAS product would have been too expensive considering Dailymotion’s heavy traffic load (5000 Player start-ups per second).
- We wanted to have full control over our data and have it securely reside on our Databases.
- We also wanted to limit third-party access to personal information that may be part of analyzed data to ensure its confidentiality.
- We needed flexibility with the ability to define new events and use cases.
How we handle our catalog of events
In a joint effort between several Dailymotion teams (Marketing, Product, Engineering, Business Intelligence…), we defined a catalog of events to capture specific application usage. We refer to this as a tagging plan.
We defined a strong rule on our events: each event is defined by a fact in a context and not by a use case. This helped correlate different kinds of events and create new tunnels of events.
- Scroll page
- Click on an element
- API call
- Error log…
These facts are captured in the context of a user session with a specific id and a specific timestamp (ex: user_id, session_id, timestamp, page_id, page_version, element_id, element_version, query_id, stack_trace…).
How did we build our event bus?
The diagram below provides the overall architecture. The design of the event bus is technology agnostic. At Dailymotion, we use Python as the main language for dispatcher/consumer and NSQ for our broker:
The event producers
“Event producers” are emitters on our website and mobile application. They create events defined in our tagging plan and send them to Dailymotion servers via HTTPS (POST, with an “application/JSON” content type). We try to optimize networks by wisely batching them and compressing data on the client side.
JSON output template
- payload_version: the version of payload allows for future upgrades
- sent_ts: timestamp in ms of the transmission from client
- events: batched events
- event_name: unique identifier for an event
- event_version: the version of this event that allows us to read the “data” field
- created_ts: timestamp in ms of the creation of the event
- data: all “application data” defined by a couple of the event name/version and described by a JSON Schema (ex: user_id, session_id, …)
- stack_ctx: all technical data only used to debug the event bus (ex: the version of the emitter)
The event dispatchers
The “event dispatchers” are our HTTP servers. Their main job is to split all events from a batch, aggregate important information received from an event producer and store each event in a defined broker’s topic. The event dispatchers never deal with functional tasks, they just work on the events’ format.
JSON output template
- query_ctx: all data bound to the HTTP request (headers, timestamp of sending/reception…)
- stack_ctx: same technical data from the emitter plus technical data from the dispatcher
Note: Given the randomness of the time reference that set on the client, we added a received timestamp for each event: with a margin of error, we can consider that “sent_ts” should the equivalent of “received_ts”. If not, we apply this delta on “created_ts” to get a more accurate timestamp.
The event consumers
The “event consumers” are our business workers. Each group of consumers subscribe to the brokers and process their specific task. Typically, workers are meant to:
- Store events in different Elasticsearches to monitor a release in soft real or to explore our events for light analysis.
- Store in a dedicated monitoring service (Datadog, Prometheus…) to keep specific metrics and feed alerting services.
- Aggregate data deprived of our users’ personal information and proxying events to external services (Mediametrie, Google Analytics…).
- Store all events in Big Query to have a long term analysis/or feed alerting services.
5 main benefits of collecting events
Thanks to this event bus, Dailymotion has improved its products in many ways. Here are some of the main benefits of collecting billions of events every day.
1. Feature improvement
When we release a new feature or improve an already existing one, we now have access to quick and precise feedback that helps us understand how our work impacts our users. And the more feedback we have, the better we can act in case design or an algorithm need to be optimized. For example, the efficiency of our search feature has been monitored by specific events such as clicking on a link, using the search keyword as context, or the “rank” of the chosen video.
2. Streaming, monitoring and optimization
We need to ensure that Dailymotion’s streaming experience is always optimal for each user, regardless of what device (desktop, mobile, tv) or browser they’re using. To do so, we rely on metrics sent by specific events. Thanks to this data, our teams are able to closely monitor the effects of the bugfixes and improvements for our end-users in terms of video-loading, buffering and quality... (especially helpful to tune hls.js).
3. API monitoring
It’s not a secret that the reactivity of an app is one of the cornerstones of quality. On a single page application or a native application, the response time for each request is key. In addition to our backend API response time, we also collect real user monitoring on the client side. This allows us to evaluate our efficiency at different points in the world, with a minimal set of context data (IP, connection, Platform).
4. Application fault analysis
When we release a new version of a client application, the risk of introducing new issues or regression to existing features is high. To avoid this, Dailymotion’s teams collect and monitor all exceptions/errors generated by our client-side application in real time. One of the main advantages of collecting these events is that we can correlate them with previous user actions to replicate errors on our development stack.
5. Third-party JS proxying
We realized that the third-party JS integrations had a real impact on our page performance by slowing down some of our features. To fix this, we now centralize all the gathering with our event bus, then proxy server side to the external services (like Google Analytics). By handling this behavior, we regain fluidity on our applications.
The event bus has rapidly become a vital tool for us. Our event catalog is continuously evolving, allowing us to accompany Dailymotion’s transition toward a more Premium model. Collecting billions of events on a daily basis had quickly opened new analysis perspectives and has been key in many of our successes like the launch of our new video player and hls.js library.