Apache Kafka Guide #42 TV Show Application example

Paul Ravvich
Apache Kafka At the Gates of Mastery
5 min readApr 23, 2024

--

Apache Kafka Guide #42 TV Show Application example

Hi, this is Paul, and welcome to the #42 part of my Apache Kafka guide. Today we will discuss TV Show Application examples as practical training in learning Apache Kafka.

TV Show Application Task

The company offers a service enabling users to stream TV shows and movies on demand. They are seeking to implement several features to enhance user experience. Specifically, they want to ensure users can pick up videos right where they left. Additionally, they aim to construct user profiles in real-time, recommend new shows to users instantly, and store all related data in an analytics repository. Given these requirements, my inquiry is about how one might utilize Kafka to achieve these objectives.

Requirements

  • Users can resume the video point in time where they left
  • The next show recommendation
  • Store the analytic data in some storage
  • Put collected user analytics into the user profile in real time

Solution with Apache Kafka

Resuming to video timestamp

In the architecture being described, Kafka serves as the central communication hub. The primary topic of interest revolves around tracking the “show position,” which is essentially a means to monitor how much of a TV show or video a user has viewed. This process involves a video player embedded in a web browser that plays videos for users. As the video progresses, the player periodically sends data regarding the viewer’s progress to a “video position service.” This service acts as an intermediary, processing the data before forwarding it to Kafka under the “show position” topic. This setup intends to ensure data accuracy and integrity before it is sent to Kafka.

To enable a feature where users can resume watching a video from where they left off, a “resuming service” is proposed. This service consumes the data from the “show position” topic and maintains a database that records the furthest point each user has reached in any given show. The key aspect of this service is that it only needs to store the most recent viewing position for each user, discarding any previous data points. When a user wishes to continue watching a video after a break, the video player queries the resuming service to retrieve the last known position. The resuming service then provides this information, allowing the video playback to resume from that exact spot. This approach ensures a seamless viewing experience for users, enabling them to pick up right where they left off, regardless of any interruptions.

Recommendations

Now, let’s delve into the notion of generating recommendations. We possess insightful data detailing each user’s interaction with television shows. This encompasses which specific user watched which particular show and the extent of their viewership. Such data is invaluable as it precisely reveals users’ preferences towards different shows. For instance, if a user watches an entire season of a show, it’s a clear indicator of their enjoyment. Conversely, if they discontinue watching after just five minutes and never return, it’s apparent they didn’t find the show appealing.

Considering this, a real-time recommendation engine powered by Kafka Streams appears to be a promising solution. This engine would leverage sophisticated algorithms to analyze user viewership metrics comprehensively. Based on this analysis, it would then generate personalized show recommendations in real time.

These recommendations would be relayed to a specialized recommendation service. This service operates in such a way that, when users exit their video player and navigate back to the movie portal websites, it suggests the next television show they should watch. This concept represents just one potential avenue for enhancing user experience through tailored content recommendations.

Analytics

In conclusion, the data we’re dealing with is of excellent quality and warrants being utilized beyond its current storage in Kafka. I propose to establish an analytics consumer, potentially leveraging Kafka Connect, to transition this data into an analytics storage solution like Hadoop. This would facilitate further data processing. Furthermore, such an analytics storage could directly enhance our real-time recommendation engine. While I haven’t explicitly illustrated this connection, it’s an important aspect to consider. This approach outlines the fascinating architecture surrounding Kafka.

Summary

Now, let’s delve into some reflections regarding the topic of show positioning. This subject matter inherently invites contributions from various producers worldwide, as my television shows garner global viewership. This necessitates a highly distributed approach to accommodate the anticipated high volume of data, with updates transmitted every 20 to 30 seconds. Consequently, a preliminary estimate would suggest the necessity for at least 30 partitions, although this number warrants empirical validation through measurement.

In selecting a key for this topic, I would opt for user_id. This choice is motivated by a desire to maintain a chronological order of data of individual users, as opposed to a collective sequence among all users. Ensuring an orderly data stream per user is paramount, hence the preference for user_id the partitioning key.

Turning our attention to the recommendations topic, it would typically be fueled by data from an analytical store. This data serves a dual purpose: it underpins historical analysis and facilitates periodic model training. Contrasting with the show position topic, recommendations represent a lower volume affair. The frequency of new recommendations certainly won’t match the every 30-second update rate seen with show positions. Given these dynamics, the choice of user_id would again serve as the key for the recommendations topic, albeit with a decidedly smaller number of partitions than those allocated for the show position topic, reflecting its lower data volume.

Topic: Show Position

  • This topic can be produced by multiple sources.
  • It should be highly distributed with more than 30 partitions for high-volume data.
  • Given the choice of a key, “user_id” would be the selection.

Topic: Recommendations

  • The Kafka streams recommendation system might source its data from an analytical store for historical learning purposes.
  • It’s likely to be a topic with lower data volume.
  • “user_id” would be the preferred key if one needed to be chosen.

Thank you for reading until the end. Before you go:

Paul Ravvich

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!