Data Engineering — Netflix movie suggestions case — stage 2 — Defining the OLAP system

Joaquin Gonzalez Budiño
3 min readJun 7, 2024

--

Now that we’ve established our Online Transaction Processing (OLTP) in the previous stage, system for storing structured data, it’s time to implement an Online Analytical Processing (OLAP) system.

The OLAP system will store detailed information and enable us to perform complex queries and generate comprehensive reports. This will enhance our ability to understand user preferences and refine the recommendation system.

The primary focus of our analysis will be how engagement a movie is. This can be inferred by tracking the number of times a movie has been watched, the movie details and user interactions. These perspectives will serve as dimensions linked to our primary fact. Adding additional dimensions such as performer, interaction, and score enriches our analysis.

  • By including the movie’s performers, we can assess how engagement a movie is by the people who worked on it. In fact, people who like some actors or directors from one movie are used to watch others movie where the same actors and directores are somehow participating.
  • Interaction metrics (e.g., click count, play count) can help to identify patterns in user behavior. We can suppose that if a user start watching a movie without finishing it, it is maybe because the user did not like it.
  • Movie scores from multiple sources (e.g., Netflix, IMDb) allow us to quick view users feedback and opinions with engagement metrics.

These additional dimensions will lead the algorithm to better recommendations. So, our database will be designed following a star schema to facilitate efficient analysis and reporting:

This structure allows for quick data exploration at various levels of detail, from high-level summaries to detailed breakdowns, leveraging OLAP systems’ capabilities for data aggregation and summarization across different dimensions. I.E: summarize the percentage of movies started but not finished and correlate this with user ratings and interaction counts, or determine which streaming platform has the highest engagement for different movie genres. Perhaps Netflix has the highest engagement for documentaries, while Amazon Prime is widely chosed for action movies.

Let’s move now to the next chapter where we are going to define a repository on github to store all the scripts seen up to now and save the further ETL scripts as well.

--

--