Cops Should Beware. Twitter is Tracking Their Every Move

ericyeonpark
8 min readJul 28, 2021

On February 26, 2012, Florida cop George Zimmerman fatally shot Trayvon Martin. On July 13, 2013, George Zimmerman was acquitted of his charge of second degree murder, sparking the social movement Black Lives Matter, which protested against incidents of police brutality and racially charged violence against black people. Nine years later, although the movement became a global phenomenon, police brutality incidents still made headlines yearly, with victims Eric Garner, Breonna Taylor, George Floyd, and many more engraving their unjust deaths onto hearts across the world.

Human Rights First: Blue Witness is a movement to give the public access to reports of police use of force incidents. They created an app that tracks police violence incidents by crowdsourcing incident reports from twitter, displaying compelling visualizations of police violence incidents, and creating an accessible database where users could look more deeply into each incident.

Blue Witness map tracking police incidents across America

Blue Witness has a partnership with Lambda School, where every month, a team of students (who are completing a mandatory internship as their last month of curriculum) helps develop and improve their app that tracks police violence across the United States of America. The Lambda School team consists of data scientists (my role), backend engineers, frontend engineers, UX/UI designers, and project managers. And I happened to be part of the class where Lambda and Blue Witness were finalizing their app, and our main job was to put the finishing touches on the app, especially on the data science side.

Understanding is the First Step

Oftentimes, whether its tackling racial discrimination or updating the functions of an app, one of the first steps should be trying to understand the problem or tasks. Coming into the project, we inherited the code that hundreds of students had previously worked on, and it was essential that we talk to Blue Witness to understand what they wanted to achieve with its app for users, as well as the the previous teams to figure out what their code was accomplishing. And after talking to Blue Witness, it was clear that they wanted to be a hub of reliable and eye-opening information that shows how prevalent police violence is across the United States of America.

An example of what information Blue Witness wants to show users

And from talking to the previous Lambda data science teams, they showed us two key components of the app.

  • a Twitter scraper that searched for tweets that contained any mentions of police violence
  • a machine learning, natural language preprocessing (NLP) model called BERT, that would rank how violent each incident is and filter out any non-violent or false incidents

Awareness is the First Step. Action is the Second

Now that we have a better understanding of the problem, now is the time to take action. Not later, not never, now. In terms of the Blue Witness project, the Twitter scraper that we inherited had a couple problems that we needed to solve. First, we had to redefine what keywords that our scraper would use to search for tweets containing police violence. Since we were using the Tweepy (or Twitter) API, it could only scrape a max of 500 tweets before it stopped for every run. Therefore, in order to increase efficiency, it’s crucial that we define our keywords well. Currently any keywords we defined, the API would look for any tweets that are related to the keywords we inputted (i.e. if the keyword was ‘police violence’, it would still return any related tweets such as “I saw a police hit a person yesterday”). It doesn’t necessarily have to match the search term exactly to the tweet text.

an iteration of the Twitter scraper code

Previously, our only keyword was “police”, which would then search for 500 tweets that are related to “police”. Then our BERT model would filter which incidents contained police violence. We decided to streamline the process a bit and add “police violence”, “police abuse”, and “police brutality” to the list of keywords. We also added the functionality to the scraper to randomly choose one of the keywords from the list every time it scraped, therefore making the quality of tweets it scrapes on average more reliable. We kept “police” as part of our keywords as a safety net just in case some tweets that contained police brutality but slipped through the net of our other keywords.

Second, the scraper wasn’t deployed anywhere, therefore our team would have to manually run the code in order to scrape tweets to the database. And after many discussions internally and with the client, we decided the best course of action was to launch the scraper and database on Amazon Web Services(AWS), and the backend team would access the data through an endpoint that we created. We also created a table in our database using pyscopg2 and PostgreSQL to match the table schema that backend would need to successfully parse our scraped data. In the end, we successfully deployed it on AWS and it currently scrapes and adds tweets to our database every 25 minutes.

A successful deployment on AWS

Sometimes You Got to Be Willing to Get Your Hands Dirty

For the BERT model, the previous Lambda classes had created and coded a fully functioning BERT model. The model would analyze a text input, and assign it rank of 0–5.

  • Rank 0 — No police presence
  • Rank 1 — Officer Presence: Police are present, but no force detected.
  • Rank 2 — Empty-hand: Officers use bodily force to gain control of a situation. Officers may use grabs, holds, joint locks, punches and kicks to restrain an individual.
  • Rank 3 — Blunt Force: Officers use less-lethal technologies to gain control of a situation. Baton or projectile may be used to immobilize a combative person for example.
  • Rank 4 — Chemical & Electric: Officers use less-lethal technologies to gain control of a situation, such as chemical sprays, projectiles embedded with chemicals, or tasers to restrain an individual.
  • Rank 5 — Lethal Force: Officers use lethal weapons to gain control of a situation.

If the model ranked the tweet a rank 2 or above, the scraper would send the tweet to our database. Therefore it was essential that the model be able to accurately read any text input and rank each tweet appropriately in order to provide reliable data to the administrator for approval before they add it to the app. Client had also asked that we get the model to an 85%+ accuracy rate for predicting violent vs non-violent incidents. However since Bert is an NLP machine learning model, in order to accurately rank tweets, it needed to be trained on thousands (ideally hundreds of thousands) data.

In order to achieve an 85%+ accuracy rate, we had to get our hands dirty, and manually label the training data ourselves. Our solution was that we accessed a GitHub repo that had thousands of tweets that contained police incidents and hand labeled each tweet a rank from 0–5 in order to train the model. For two weeks, a team of 14 students hand-labeled 6,000 tweets in order to train our BERT model to become more reliable.

And afterwards, as a result, we actually achieved great precision, recall, and accuracy metrics for our model. We far surpassed our clients request to get an 85% accuracy rate by the end of the month.

BERT model performance metrics

Be Proud of Progress. But Realize Much Work Still Needs to be Done

After my month on the project, I was proud of all that I achieved. We automated and deployed a Twitter to scraper, allowing Blue Witness get a steady stream of new data to reliably find new, current police violence incidents. I also helped improve upon a BERT machine learning model to help provide accurate incidents to Blue Witness. I also drafted and presented the first analytics report of the model to client, whereas previous teams only verbally assured the client that the model was working.

However, as proud as I am of my accomplishments, I do realize that much work is needed in order to continue upon improving and solving the problem. For instance, our metrics for the recall, precision, and accuracy for predicting each rank of each tweet specifically is still slightly low due to not having enough training data. As the image shows below, ranks 2–4 have less than a hundred data points tested on.

BERT model performance metrics for each rank

In order to improve, we ideally would continue to train our BERT model on more data, and therefore, would have more testing data to work with. However it should be noted that tweets that are rank 2–4 are just in reality, more rare than any of the other ranks.

Another aspect we could work on is improving the data science data table schema. Currently our database is based off of backend specifications for their table schema. However ideally, the database should be molded off of our specifications and the backend team could then access the data and reshape it on their end. The only reason we shaped it to their specifications was they had much more on their plate and asked for our cooperation. But by doing so, we lost some functionality on our end. For example, they asked us to convert the twitter source URL’s to JSON, which makes it a text instead of a clickable link, which would be ideal for an API we’re using to access the data. Ideally, our data science database would be updated to serve the needs of the future data science team.

Lastly, one other way to improve the app, is that in addition to a twitter scraper, the team could create a Reddit scraper that scrapes reddit posts for police violence incidents. One precaution they would have to take is that they would have to develop a system to avoid duplicate incidents between the Twitter scraper and Reddit Scraper.

However, at the end of the day, I’m sure that I will still be in contact with Lambda and Blue Witness after the project, and I’m excited to see how much the app develops in the future.

--

--