Our Algorithm

Filmify Team
Filmify
3 min readApr 18, 2021

--

We calculate Diversity Scores to assess the gender and racial diversity of films, based on the directors, producers, writers, and top five billed actors.
Let’s look at the cast, crew, and algorithmic breakdown of The Sun is Also a Star (2019).

We analyze crew members Ry Russo-Young, Elysa Koplovitz Dutton, Leslie Morgenstein, and Tracy Oliver. We also analyze actors Yara Shahidi, Charles Melton, Hill Harper, Gbenga Akinnagbe, and Jake Choi. Russo-Young, Dutton, Oliver, and Shahidi have Diverse gender. Oliver and all actors have Diverse race & ethnicity.
We factor in cast and crew information in our diversity algorithm.
The Diversity Score Algorithm is calculated by adding the Number of Diverse in Film and 0.25 times the Number of Intersectional in Film. That value is divided by Number Known in the film. Then it is then multiplied by 10 and divided by the Percentage of Diverse in US population.
The general Diversity Score algorithm
Gender Score for “The Sun is Also a Star” is calculated by adding the Number of Diverse in Film, 4, and 0.25 times the Number of Intersectional in Film, 2. That value is divided by Number Known in the film, which is 9. It is then multiplied by 10 and divided by the Percentage of Diverse Gender in US, 0.51. So the Gender Score is 9.8 out of 10. Similarly, Race/Ethnicity Score is calculated as 6 plus (2 times 0.25), divided by 9, multiplied by 10 divided by 0.401, which is greater than 10.
Our calculated Gender Score and Race/Ethnicity Score for the film “The Sun is Also a Star”

Diversity Score Algorithm

Our scoring algorithm considers each film’s producer(s), director(s), writer(s), and top 5 billed cast members and outputs a Gender Score and a Race/Ethnicity Score, each ranging from 0 to 10.

A score of 10 means that the film’s % of known diverse people matches or exceeds the US population’s % of diverse people.

  • Each intersectional person (a woman or non-binary person of color) adds a bonus of 0.25. Intersectional people are significantly underrepresented compared to all other groups.
  • People with multiple positions (e.g. producer and director) are counted for each credit. Holding multiple positions would afford them more power and control over the production.
  • Every position is weighted equally.
The Diversity Score Algorithm is calculated by adding the Number of Diverse in Film and 0.25 times the Number of Intersectional in Film. That value is divided by Number Known in the film. Then it is then multiplied by 10 and divided by the Percentage of Diverse in US population.
The general Diversity Score algorithm

Gender Score

The Gender Score is calculated by adding the Number of Women, Non-Binary, and 0.25 times the Number of Intersectional in Film. That value is divided by Number of People Known in the film. It is then multiplied by 10 and divided by the Percentage of Women and Non-Binary in US population.
The Gender Score algorithm

Race/Ethnicity Score

  • 40.1% of the US population describe themselves as Asian, Black, Latinx, Middle Eastern/North African, Mixed/Multiple Race, or Native (global indigenous peoples) (US Census 2020). Hollywood films underrepresent these racial and ethnic groups (UCLA Hollywood Diversity Report 2020).
  • We do not count people whose race or ethnicity we do not know towards the Race/Ethnicity Score.
The Race/Ethnicity Score is calculated by adding the Number of Non-White and 0.25 times the Number of Intersectional in Film. That value is divided by Number of People Known in the film. It is then multiplied by 10 and divided by the Percentage of Non-White People in US population.
The Race/Ethnicity Score algorithm

Philosophy

While we chose to focus on gender and race, we understand that diversity comes in many forms — we could have accounted for LGBTQ+, age, ability, religion, body type, and more.

Gender and race have the most information readily available, making them easiest to focus on for our project. There is extensive research linking the lack of representation in these categories to stereotyping of diverse people (Sociology Compass 2015).

Using US Census data, we compare a film’s race and gender to America’s population to emphasize the lack of diversity in Hollywood. America is known for having a diverse population, and its film industry should reflect that.

In terms of who we consider, we prioritize the producers, directors, and writers because their positions are the most influential in a film’s production. They are also the three most mentioned positions in other algorithms/rating systems (e.g., GradeMyMovie, Reframe Stamp, F-Rating), created with filmmaker/industry input.

We include the top five cast members as well. Although our primary focus is behind-the-camera diversity, on-screen diversity is essential too! We want to ensure that diverse talent is recognized and celebrated in all aspects.

Data Collection

Our platform uses film, crew, and cast data from TMDb.

Our team manually inputs missing gender and race data through online research. We mark data as unknown if we do not find explicit information from IMDb or a valid news source. We do not assume a person’s gender or race/ethnicity solely on their name or how they look.

--

--