ML & AI

Data Science in the Entertainment Industry.

10 min readSep 27, 2022

A comprehensive guide with Intuitive & detailed notebooks to understand data-driven solutions in the entertainment sector. I would recommend going through the business problem and understanding it before skipping to the solutions below 😏

AI robots directing a movie — Can Movies in the future be automated?

Creating blockbuster hits or making best-seller video games is not something people can do in a day. A huge amount of resources go into producing them whether it is in the form of millions of dollars or a good number of years. There is never a lack of options when it comes to watching a show or picking a video game to play but the majority of them fail to actually maximize the potential of their products due to various factors. Some of these keys factors are :

Failing to understand the target audience: Not meeting customer satisfaction and providing products to meet their needs is often the case.
Poor resource management: Filmmakers often tend to create redundant reboots in the pursuit of holding back the current traffic which fails in the long run. This creates a system of inefficiencies that compromise the stability and success of organizational workflow
Lack of exposure: Poor marketing strategies are a major cause of missing out on the Untapped Global Audience.

As Data Scientists, we possess the ability to create intelligence. Petabytes and Terabytes of data that have been collected over the years can now be used to fuel Machine Learning and Deep Learning approaches to come up with robust data-driven solutions to address these major issues that enable filmmakers and production houses to optimize their products in turn generating higher ROI’s.

Let's look at what these Machine Learning and Deep Learning approaches can be...

Predictive Analysis and customer segmentation: Before Allocating resources and investing in a product, predictive analysis enables organizations to study their audience and make products accordingly resulting in reduced costs and good resource management. Customer segmentation is the process of grouping similar customers based on their behavior and trends. Some major benefits of predictive analysis and customer segmentation are:

Low-risk modifications → Making modifications to attract the uninterested groups without losing already present traffic.
Generating customer lifetime value → Allows organizations to track the relationship between them and their clients. This allows organizations to not lose their existing important customers.
Improvised risk management → Reacting to problems before they become threatening and reducing them to zero in real-time.
Optimized Marketing → Studying audience behavior and understanding trends in order to create efficient and optimized marketing campaigns.

2. Time Series Analysis and Forecasting: Forecasting is an amazing tool that allows us to study the past and predict the future. Over the last decade, forecasting has played a crucial part in businesses setting reasonable and measurable goals. Yes it’s not magic and it is impossible to actually predict the future with 100% accuracy but what forecasting does is give us a temporary foresight superpower 😜 . Let's go through some points and understand this superpower…

More effective production scheduling → Forecasting gives the production houses a leg-up on elements of planning and production cycles and enables them to operate with more agility, transparency, and flexibility to adapt to changing production environments or schemes
Optimized transport logistics and Inventory Management → Changing film locations and transporting tons of equipment has been revealed to be a major part of the investment bill. Forecasting allows for proper systematical transportation strategies by identifying areas where efficiencies can be increased and redundancies eliminated while maintaining the clarity of supply situations.
Efficient Resource Management → An accurate sales forecast allows companies to efficiently allocate resources for future revenue growth and manage their cash flow.

Time Series allows us to identify patterns, trends, and seasonality in data

3. Sentiment Analysis: Sentiment analysis is a Natural Processing Language(NLP) method to analyze the sentiments associated with the data. By using this method in movie reviews and discussions that are available on the internet, it is possible to gain an overview of the wider public opinion and understand their needs and wants more accurately. I have actually given a detailed explanation of the use case in my Jupyter notebook.

4. Recommendation Systems: It's like when you think the FBI is stalking you when you see shampoo ads come up after you search for one on Google 😆. Just like any other sector that has been majorly utilizing recommendation systems to promote products similar to what the customer likes, the entertainment industry can also utilize this ability to optimize its marketing strategies. Recommendation systems and association rule-based models open up the pathway for upselling and cross-selling.

5. Applications of GANs: This one’s my favorite.. General Adversarial Networks or GANs is a deep learning model that can generate new images from a given set of images that are similar to the given dataset, yet individually different. Explaining what this is, would itself be another medium post, so let's go through the applications and see how it is benefitting the entertainment sector…

Generating animation models → GANs can be used to automatically generate 3D models required in video games, animated movies, or cartoons by analyzing 2D photos in a short period of time significantly helping animators save time and utilize their time elsewhere for other important tasks.
Editing and Translating Photographs → GANs take editing photos beyond the usual enhancements. GANs can be used for reconstructing images of faces or completely removing elements from an image. Besides editing the more attractive feature would be translating text to image, Semantic image-to-photo translations, or image-to-image translations. An example of a real-life application would be Animation and gaming studios using StyleGANs to generate high-quality animated facial features saving them a lot of time and effort.

These are all facial features that are created using StyleGANs

I have demonstrated, how we can apply these Machine Learning and Deep learning methods in my GitHub Repository.

What Does the Repository Contain…

To keep this article short, I am not going to dive deep into the terminologies and explain them. All the notebooks are very intuitive and easy to understand. Most of them have detailed information with comments to explain the code, so please check out the Repo.

1. Exploratory Data Analysis on Anime Dataset

Exploratory Data Analysis is a crucial part of addressing every problem. In this Notebook, I utilize the MyAnimeList Database which contains data from 320.0000 users and 16.000 animes at myanimelist.net which is like IMDB for otakus.

The Objective of this notebook: Initially, to form intuitive questions such as what factors are driving people into watching this show? or why did they leave this particular one on hold? Are seasons affecting the viewership or is the format length? We wrangle this data and visualize the distributions to address these questions giving us a better understanding of the audience.

2. Anime Recommendation Systems

Using the same anime database, we apply Collaborative Filtering to analyze similar animes and similar users by studying their ratings and watch history to recommend similar animes to random users.

The Objective of this notebook: To promote shows/movies by recommending them to people with similar interests. We approach this issue by building a Neural Collaborative Filtering Network that takes user inputs and anime inputs as vectors in two different embedding layers.

The architecture :

Neural Collaborative Filtering Framework

We divide the solution into 3 separate tasks where task 1 takes care of finding similar animes by finding the cosine distance to measure the similarity between the animes. . Similarly in task 2, we find similar users. Finally, in task 3 we study user preferences, and based on the results of tasks 1 and 2, we recommend the user animes that they would most likely prefer to watch. I have also created a Recommendation system based on predicted ratings, in this case, we consider the dot product between the user and anime vectors.

The notebook also contains the model and the model weights in a ‘.h5’ format that you can download and import into your code.

3. Time Series Analysis and Forecasting Revenue Growth

As we read about how Forecasting is a crucial tool for organizations, I have used the Cinema Tickets dataset that contains 15000 rows of data regarding information about ticket sales and some key features affecting those sales.

The Objective of this notebook: To predict the revenue growth of the Cinema. We approach this task by using a number of Timeseries analysis tools such as simple moving averages also known as the ‘rolling mean’, the ARIMA and SARIMAX models use autocorrelations and moving averages over residual errors in the data to forecast future sale values, and Machine Learning Regression Algorithms such as XGBoost, RandomForestClassifier, and Linear Regression techniques to forecast the total sales. Finally, we use the R² (which tells us how good the model fits the data) metric to evaluate all the models.

Visualizing the Forecasted Sales using ARIMA

4. Sentiment Analysis on Movie Reviews

We demonstrate Sentiment analysis on the IMDB Dataset of 50K Movie Reviews dataset that contains as you guessed 50000 movie reviews. This is a case of supervised learning where the reviews are already labeled as either ‘positive’ or ‘negative’.

The Objective of this notebook: To analyze the sentiment behind the reviews and keywords that contribute to the labels. We approach this task by using different NLP preparation techniques to wrangle the textual data and convert it into a matrix of vectors which we then use to train a Logistic Regression model to predict movie reviews as either ‘positive’ or ‘negative’. By doing so not only can we understand our audience better but also avoid negative marketing by suppressing & addressing the noise.

Confusion matrix indicating a 90% accurate model

5. CycleGAN | Creating Monet-Esque Images

There are many applications of GANs that I have been studying and would like to test out, although when compared to Machine Learning Algorithms, Deep Learning models take up a lot of memory and computational power. I have managed to show the application of GANs on the ‘I’m Something of a Painter Myself’ dataset on Kaggle. It is part of a Kaggle competition that contains 300 Monet-style paintings and test images in Jpeg and TFRecord formats. I would recommend using the TFRecord format as they are more storage efficient and would be a great way to experiment with something new.

The Objective of this notebook: To generate new Monet-Esque images similar to the photos in the test photos directory. In this Notebook, we use Data Augmentation techniques to increase the diversity of the training set by applying random (but realistic) transformations such as image rotation, etc.

Model Architecture:

General Adversarial Networks are made up of two different neural networks known as the ‘Discriminator’ and the ‘Generator’ of the model.

Discriminator → It is the part of the model that classifies if the image is fake or real. In an ideal GAN model, the Discriminator would give us 0.5 as the output where it fails to identify if the image is real or fake.
Generator → The Generator’s goal is to trick the Discriminator by producing a distribution from a latent random distribution to be as similar as possible to the original distribution.

To be fair this GANs is such a beautiful representation of how you can pursue your passion for art even though you suck at drawing.

Results :

The Monet-esque images generated are not part of the original distribution

Note: The notebook is present in the GitHub Repo but I would recommend accessing my Kaggle Notebook as you can utilize Integrated TPU accelerators that allow faster training for TensorFlow models.

Conclusion

As the AI and ML field is still growing, I can hardly imagine what the future holds for the entertainment sector. Maybe we can fantasize about a real-life virtual-multiplayer-online-role-playing game like Sword Art Online or probably not.. 🤷‍♂ ️Even if we don't get one of those, the potential for growth is sky high. Imagine when production houses will be able to save ample amounts of time and resources by automating so much of the workflow that goes into making films and games, we wouldn’t have to wait a week for an episode.

I also believe it is mandatory for organizations (not just in the entertainment sector) to integrate Data Science Approaches into the operational systems in order to keep up with the fierce competition that keeps growing every day.