Blending Data Science with Bollywood

Using emotional curving trajectory of movies to anticipate the most profitable and viewer-centric blockbusters

Palak Jain
The Research Nest
5 min readMay 19, 2020

--

With the advancement of technology and evergreen interest of the viewers in cinematography, this field has come up with a tremendous competition in releasing the best hit motion pictures that remain in the hearts of the people for their lifetime. Fans mimic the aura of their favorite characters from the movie and quote the famous dialogues from the mega hits now and then.

Given the scale of the industry, the pressure lies on the film-makers to deliver a design such that the product precisely expresses the spectators’ preference. What could be better than to design a model that takes into consideration the comprehensive key factors that include IMDb rating, movies nominated for the awards, revenue from the box office, the number of reviews they receive from the viewers and critics, etc. that foretells which category of the motion pictures succeed the most and remains the most talked-about movie.

The Indian film industry was one of the largest producers of films in the world with monetary worth of $2.5 billion in the fiscal year 2019. Apart from being a billion-dollar market, the platform is a prominent storytelling pursuit. The stories expressed in the motion pictures tend an individual to emotionally connect to the characters and to bethink and correlate with their life experiences. The art of scriptwriting defines a lot about the success of a movie. The Computational Story Laboratory at the University of Vermont devised a methodology using NLP to delineate the story plots of the novels with six emotional arcs that can befit all the story types. They were namely

  1. Rags to riches: Emotional arc depicting a continuous emotional rise.
  2. Riches to rags: Emotional arc depicting a continuous emotional fall.
  3. Man in a hole: Emotional arc depicting an emotional fall that is followed by an emotional rise.
  4. Icarus: Emotional arc depicting an emotional rise that is followed by an emotional fall.
  5. Cinderella: Emotional arc depicting a continuous emotional rise fall rise pattern.
  6. Oedipus: Emotional arc depicting a continuous emotional fall rise fall pattern.

This system of emotional arcs can also be used to cluster movies under the aforesaid six categories. To implement this innovative model in the Bollywood industry, we first require the dataset. For that, we need to extract the subtitle file of various films from different sources and remove duplicity to categorize them under the six clusters. Once we have a collection of subtitle files, then the next step involves the collection of data about the films’ figures of the production budget and domestic gross revenue in the identical currency for all the rows of data.

At last, the information from IMDb needs to be used to gather information about the following-

  • The release date of the movie
  • Average IMDb rating from 1 to 10 (very bad to excellent)
  • Reviews from critics and viewers
  • Various genre information as stated on the website
  • Awards information
  • Movie length in minutes
  • The director of the movie.

After that, cross-validate all the information from varied sources and form a filtered dataset of the intersection of all the information required for a movie. List the movie in the final dataset only if it contains all the above-stated data.

Once we have our final dataset, next comes the step of evaluation. Now we can implement the sentiment analysis on the subtitle file to get their emotional arc category using the TextBlob library from python. TextBlob locates all the words to which it can assign polarity and subjectivity averages them together. Polarity describes how positive (+1) or how negative (-1) a word is while subjectivity provides insights about whether it is a fact (0) or a subjective opinion (+1).

The resulting curve for each movie can be uniformly distributed into a hundred parts or elements such that each movie’s sentiment arc runs from 0% (starting of the story) to 100% (end of the story).

Next, to estimate the success of the movie, we can compare each cluster of the emotional trajectory with the factors starting with the data obtained from IMDb. We take the average of domestic gross revenue for movies belonging to respective clusters and analyze which emotional cluster acquires maximum average domestic gross revenue. We can also calculate the average length of a movie, an average of the award nominations, the number of users’ and critics’ reviews and ratings to see which cluster caters maximum likeliness among the people.

Production of motion pictures is expensive and it is definitely important to know how much initial investment needs to be made and have a fair estimate of the revenue. The movie genre is another factor that influences the success of the movie. A list of possible genres can be a comedy, thriller, romantic, action, SciFi, horror, drama, animation, biography, and adventure. Motion pictures mentioned on IMDb have the genres associated with it and each movie can be categorized by more than one genre. The results for the genre and cluster of the emotional trajectory can be analyzed using a heat map to see which category for what kind of cluster gains maximum interest among people.

The recent advancement in data science has given us a broader outlook to understand human emotions. This knowledge enabled us to predict what the viewers have a keen interest in watching on the big screen. By using NLP methodology, we can explore up to what extent emotions drive consumer interest in the entertainment industry. Thus, by this, we can examine what kind of stories expressed by motion pictures get maximum success.

Interestingly, by carefully choosing a blend of genre and production budget, any emotional trajectory may produce a film that is financially successful.

--

--

Palak Jain
The Research Nest

Student, Indian Institute of Information Technology and Management, Gwalior | Machine Learning Enthusiast | LinkedIn: https://www.linkedin.com/in/palakjain2512