The 2021 movies finally restarting production after COVID-19

Kantida Nanon
Web Mining [IS688, Spring 2021]
6 min readFeb 17, 2021

WB has officially announced their lineup of 17 films in 2021 that will come out simultaneously in theaters and at home on HBO Max.

Have you heard? There are a lot of new and anticipated movies coming out this year. This year will be filled with new film trends. The question posed is what are the common movie trends in films? What will be the discussion around movie suggestion groups on Reddit? Why are we looking for this answer and for whom/what? This data analysis from Reddit aims to illustrate the relationship between different groups on the platform and common trending subreddits within the community.

Image from Filmdaily

On the Reddit platform, the movies2021 hashtag (#movies2021) has over 210k members present on a movie suggestions community (r/MovieSuggestions), over 103k members are present on a Disney+ community (r/DisneyPlus), and other popular communities including Megami Tensei (r/Megaten), Superman (r/superman), MarvelStudios+ (r/MarvelStudiosPlus), Justice League movies (r/justiceleague), etc. In order to get an answer from the data, I surveyed the Reddit community to find an answer. For this dataset, I obtained the trends of movies Redditors have mentioned within the movies2021 hashtag in related subreddits. The most mentioned subreddits were focused on each party’s movie suggestion groups. This is important in determining which subreddit community has the strongest relationship. The data extracted from Reddit illustrates the relationship between groups. This study focuses on the most popular subreddit in the movies2021 hashtag community which contains r/MovieSuggestions, r/DisneyPlus, r/Megaten, r/superman, r/MarvelStudiosPlus, r/justiceleague, etc.

How does the connection work in this community?

Data collection with PRAW

4,957 newest posts (February 13) with the movies2021 hashtag were collected with the Python Reddit API Wrapper (PRAW). To see the relation or network on the movie 2021 community, First, I have cleaned by removing the unuseful or unrelated data from the data set, formatting, and prepared data sets as nodes and edges. Nodes are the Reddit accounts (Redditors) who participated with the movies2021 hashtag on the Reddit platform (who mentioned or posted) and Edges are the action or connection that Reddit accounts had within the movies2021 hashtag on the Reddit platform.

Data set with the Gephi

After the data manipulation process, I have imported the remaining 3,096 nodes and 4,955 edges from the spreadsheet files to the Gephi program. As mentioned, Nodes are the Reddit accounts (r/MovieSuggestions, r/DisneyPlus, r/Megaten, r/superman, r/MarvelStudiosPlus, r/justiceleague, etc) and Edges are the connections between Reddit accounts (posts /shares /comments in #movies2021).

Importing data set on Gephi program

The picture below shows the collaboration graph as the initial relation graph in this community. It seems like there are four crowded clusters that have appeared which indicate the most mentioned subreddits or a strong connection in this community.

The initial relation graph

Yifan Hu Algorithm

As the initial relation graph seems slightly hard to understand. I have used the Yifan Hu layout to make them easier to read. Now we can see the relation between the pair of nodes in the picture below. Surprisingly, five crowded clusters appeared in this community. Initially, I thought there were four crowded clusters.

The graph with a Yifan Hu layout

Also, there are filters that we can use to visualize the network of this data set to present the degree range, neighbor network, betweenness centrality, closeness centrality, modularity class, and other attributes in the Gephi library. The pictures below show the network with the degree range filter, harmonic closeness centrality partition, modularity class partition, and betweenness centrality partition respectively.

The network with the degree range filter

Surprisingly again, the pictures below show that there is one more crowded cluster that has appeared in this community. It is clearly seen that there are six crowded clusters in this community.

The network with the harmonic closeness centrality partition
The network with the modularity class partition
The network with the betweenness centrality partition

Graph characterization

As illustrated, the below graph represented in purple is the Reddit account that participates in the movie suggestion subreddit community which has the highest degree of centrality. There are similarities in green, orange, and blue representing the account who participated in the justice league community, the Megaten community, and the superman community. From 3,096 nodes, there are 57 nodes that connect among communities, called bridge nodes. Within the movies2021 hashtag, there are 19 Reddit accounts that interacted with both the justice league and the superman community as presented in the mint green edge. Also, there are 6 nodes that connect between the Disney plus and the Marvel studio plus community.

The graph presenting a bridge node of these communities

The statistics of this network shows as the picture below which contains:

  • The average degree, the average number of edges that connect to a node in this network, is 1.015
  • The average weighted degree is 1.6
  • The network diameter, the longest of the shortest path in this network, is 5
  • The network density, the number of actual edges over the potential edges in this network, is 0.001
  • The modularity in this network is 0.793
  • The average path length, the average of the shortest path in this network, is 3.613

Centrality

The picture below shows the node with most of the shortest paths having the highest closeness centrality which means it can reach the most nodes quickly. Also, it shows the node that appears most often in the shortest paths, which have the highest betweenness centrality which means it has many paths that must flow through the node.

The graph with a closeness centrality and a betweenness centrality filter

The graph represented in each cluster is how many people participate in each community within the movie 2021 community. Represented in purple, the movie suggestion community has the highest degree of centrality. There are similarities in green, orange, and blue representing people who participated in the justice league community, the Megaten community, and the superman community. In black representing how many people engaged in the Disney plus community. The less crowded cluster in pink represents the Redditors who participated in the Marvel studio plus community.

From 3,096 nodes, there are 57 bridge nodes. Within the movie’s 2021 hashtag, there are 19 Reddit accounts that interacted with both the justice league and the superman community as presented in the mint green edge. Also, there are 6 nodes that connect between the Disney plus and the Marvel studio plus community. There are a few nodes connected between the Disney plus and the Megaten community, the superman, the Marvel studio plus community, and from the movie suggestion community to the Disney plus, the superman, the Marvel studio plus, and the Megaten community.

Discussion and limitation

Some limitations include the number of data, as this study only collected subreddit from 3,096 Reddit accounts on February 13, 2021. It is a small sample to predict the movie trends for this whole year. These limited results and outcomes are based on posts only during the week of the experiment. This might cause weak relationships among the communities as they are lacking relationships between the communities.

--

--