Who supports Donald Trump?
Do you support Donald Trump? Are you Republican or Democrat?
President Trump is running out of time. Joe Biden leads by double digits in national polls and state-level polling is only slightly closer. In fact, Biden’s lead is so large that traditional areas like Georgia, Iowa, Ohio, and Texas [1]. Moreover, on the social media platform like Reddit, the previous article states that Joe Biden was mentioned more often than Donald Trump by Reddit users leading one to believe he was the most talked about during the first debate as opposed to Donald Trump. And now we have entered the last two weeks before Election day, and Donald Trump needs the race to tighten.
The question is whether how the user’s political stance on social media (Reddit) related to their political personal preferences? what related to the user’s behavior on mention these two candidates on social media and how. What are they really mean by mention or post about two candidates on Reddit? So my goal is to find the relation predicting the user’s political preference from the user’s opinion which expresses on their post.
Mining Interest
According to the previous article, I have retrieved the post’s author name and post content from 1,907 subreddits in r/donaldtrump, r/joebiden, and r/election2020 subreddits by using the function in the Python Reddit API Wrapper (PRAW). As illustrated in the figure below, I have 3 main clusters or communities on this network includes r/donaldtrump community, r/joebiden community, and r/election2020 community.
The node of the election2020 community has also participated in the JoeBiden and DonaleTrump community. To break down this network structure to assign information about interests here. So that nodes are in both election2020 and JoeBiden. Suggest that that node participated and has interested in both of them but is much more connected to the election2020 community meaning they are potentially much more interested in election2020 than JoeBiden but they are interested in JoeBiden more than DonaleTrump.
Data cleaning and formatting
To extract node and edge attributes from the data set in each community, I will clean and formate data of the user’s post content with the split text to column function, the substitute function (=SUBSTITUTE(text, old_text, new_text, [instance_num])) on Excel, and the replace function on Python.
Missing and incomplete attribute
After I formatted and cleaned the data, I found many attributes are missing from rows. For example, there are some Reddit users who participated in the election2020 subreddit but their post content was lacking an opinion or meaning of the post. As my goal is to find the relation predicting the user’s political preference from the user’s opinion which expresses on their post. Data rows that missing the post content or user’s opinion are not useful in predicting the user’s political preference for election 2020, so they could very well be ignored and removed before running the algorithm.
Attribute implementation
As this mining goal is classification, I will extract node attributes from the post’s content which contains political preference or opinion mining of Reddit users (liberal, conservative, democracy, republican, etc) along with edge attributes (edge weights, tie strength, etc).
The most common keywords from this network are republicans, vote or election, jobs, win, join, president, bad, america, people, trump 2020, and joebiden. More subreddits come from the Joe Biden community as he was mentioned more often in the collected data set. This is true because keywords were filtered for each candidate to break down which keywords were associated with each. Of the collected subreddits 934 peoples were for the Joe Biden community and 558 peoples were for the Donald Trump community.
Keywords like “win” came from both r/joebiden thread and the r/donaldtrump thread and were mentioned equally as often. Keywords like “vote/election” were more common from the r/donaldtrump thread. This proves r/donaldtrump thread paid more attention to the voting keywords to influence potential voters, whereas Biden mentioned it less.
Also, words like “america” and “president” were more mentioned under r/joebiden. Words like “people” and “republicans” were mentioned more under r/donaldtrump. The r/joebiden thread was most searched for within subreddits compared to r/donaldtrump thread. The second most viewed thread was r/election2020, followed by r/donaldtrump.
Through the previous article more people mentioned the r/joebiden thread. Going through the data closer, it appears even though more people mentioned Joe Biden, it doesn’t seem like they will support him. As illustrated in the table below, over 45 times that keyword “vote/election” came from r/donaldtrump thread compare to 20 times in the same keyword from r/joebiden thread. This shows that Donald Trump has some keywords which were mentioned more in some instances. It appears the voters will use their own judgment on the basis of the candidate, and not be as heavily influenced through these threads.
Discussion
As illustrated in the figure below, we can see some relation between the user’s political stance on social media (Reddit) and their political personal preferences. People who are republicans support Donald Trump (at least on social media posts) more than Joe Biden. The below chart shows the association between the republican party, Donald Trump, and Joe Biden. Unfortunately, no one mentions or expresses that they are democrats from this experiment.
Limitation and Ethic issue
This study is a collected post from 1,011 Reddit accounts from September 30 — October 3, 2020. Also, during the data cleaning process, I lost many data rows as they are unknown or N/A value. It is a small sample to show the relationship between predicting the user’s political preference from the user’s opinion which is expressed on their post. Also, this community has newer data and changes every day. These very limited results and outcomes are based on people’s opinions only during the week of the experiment.
Also, for the ethical concerns related to this experiment, we can’t make a conclusion or infer on a certain group of people who is going to be the president in the future. There are a lot of people who do not have social accounts, even if they do, there are some people who do not post what they are thinking or who are supporting due to the election or voting is too sensitive to express their opinion in the public along with the data attribute contains the protected characteristic like political preferences which are publicans and democrats.
References
[1] https://projects.fivethirtyeight.com/2020-election-forecast