Can An Algorithm Find Positivity in the Twitterverse?
This post is a part of my Weekly Make series, where every seven days I force myself to create something and put it out in the wild for you to gawk at.
The internet can be a nasty, nasty place. Vicious comments seep their way into even the mildest posts. Throw politics into the mix and you’ve set yourself up to see the worst of humanity.
Twitter facilitates anonymity and has become the chosen one-to-many communication platform for celebrities and politicians, making it the perfect environment for invective.
But I, the naive millennial that I am, believed that there were still communities of good people. So I made an algorithm to find them.
Here’s how I built it.
Traversing the Network
Our goal is to find as many positive users as possible, scored by the average sentiment of their tweets.
Certain researchers, business partners, and Twitter themselves can perform analysis directly on the entire set of data. For instance, if they want to find a certain group of users and all the connections between them, they only have to write a single query for each.
We, on the other hand, only have access to Twitter’s API, which means we can only get comprehensive data for a single user at a time. Fortunately, Twitter allows a relatively high volume of queries. That means we still have access to a large amount of data, but we need to approach it differently.
In our problem, we start out from a single user and can analyze the sentiment of their tweets. We can see all a list of users they’ve mentioned, but we can’t measure their positivity yet.
So we have to select one of these connected users to analyze. This new user will also have their own set of connections, so we’ll always have more connections to look at than we have API calls available.
This leads us into a classic probability problem: the multi-armed bandit.
The Multi-Armed Bandit
This is a problem that dates back to World War II, where it took so much time and energy that Allied mathematicians suggested it be dropped over Germany to intellectually sabotage them. It goes like this:
Say you have a set of slot machines, and certain ones give better rewards on average. You have a limited number of total spins and you want to make the most money possible.
You spin each of them once. They give you payouts ranging from nothing to $10. So, here’s your dilemma: that $10 could just be a random fluke and doesn’t mean that slot machine is the best one. But it’s the best information you have for now, so do you continue to use that machine or try other ones?
This also known as the explore vs. exploit tradeoff. As we explore more users, we’ll notice that certain users tend to be more connected to positive users than others. Do you continue to look at their connections, or take a look at other, unexplored users?
The Strategy
There many in-depth ways to estimate probability distributions based on a limited set of point. My approach is a little bit dumber, and is known as the Epsilon-greedy strategy.
With this reinforcement learning strategy, we will exploit the user with the most positive connections except for a proportion ε, where we explore a random connection. While it’s a simple heuristic, it will self-adjust as new connections give more information about an individual user.
Additionally, we have the benefit of being able to analyze the positivity of the mention itself. When we select a connection, we select the one that has the most positive mention.
If you’re interested in the code, it’s incredibly hacky and messy but is available here:
Observations
You know who tend to be the most positive people? Politicians and companies. Even the ones you don’t like. Which makes sense, really — these are the types of Twitter profiles that have to maintain a sense of positivity, genuine or not.
The other thing I noticed was that no matter how nasty and negative the comments coming out of any one individual Twitter account, they are still often connected to positive users. In the graph itself, you see users color-coded by the positivity of their immediate connections.
There are also a lot of wholesome tweets that can be found!
As much as we see negativity in our social media bubbles, there’s still a lot of positivity out there. TherealDonaldTrump and HillaryClinton graphs reminded me that good people happen on all sides. Maybe, someday soon, that will be the common ground that brings us together.
Thanks so much for reading! If you enjoyed, feel free to follow me and hold down the Applause button. Stay tuned for the next Weekly Make!