Photo by Alejo Reinoso on Unsplash

Sentiment Analysis to Detect Threatening Tweets in a Collaborative Team

Yang Gao
Omdena
Published in
3 min readOct 9, 2019

--

My journey with Omdena started from when I was looking for Kaggle teammates in order to boost my machine learning experience. Just as the founder of Omdena has written before on the challenges of Kaggle, I had trouble finding even a single teammate. However, through my search, I was able to find Omdena, which immediately connected me to dozens of machine learning engineers and AI enthusiasts.

The problem to be solved

The challenge I participated in was with the NGO Voice4Impact; the goal was to implement a machine-learning algorithm to detect threatening tweets related to gang violence. Gun-related violence was a particular focus and having recently moved to the US, I was eager to contribute to reducing this problem that has become a new reality of my life.

As the task manager for the Sentiment Analysis team, I was leading the team to predict whether the tweets are threatening, including picking models, libraries and preprocessing techniques. On the technical side, we needed to address the challenges of an imbalanced dataset where over 90% of the tweet feed was non-threatening, and the scarcity small size of the labeled dataset. We tested multiple techniques, including loss functions specifically designed for imbalanced datasets, undersampling, transfer learning from existing word embeddings algorithms, and ensemble models. The whole process of trying to improve the performance was a rewarding experience, it was a game of anticipation to see if the accuracy increases with every new method we tried.

On the non-technical side, I needed to utilize a remote collaborative team with different skills to the fullest. This is where I drew on my previous experience in project management. First, I implemented a task sign-up sheet for coordinating efforts, the items ranged from pre-processing techniques (including several word embeddings, n-grams) to model building, with the flexibility of allowing members to add news items they want to try. The undersampled ensemble model architecture I chose lends itself well to the team structure, where individuals can choose to do as little as training one training split, to multiple models on all the splits. As the task manager, I presented our weekly accomplishments and set plans for next week. Even though I see the value of a dedicated project manager who manages timeline, priorities, risks etc., I have found the best results were achieved when a technical team member manages the progress of a technical project. However, this can also lead to the worst results if the team members view project management as an administrative task that should be avoided as much as possible.

As a woman in AI, I found Omdena to be a supportive platform that makes a point in acknowledging the need for more women in AI. There were several women participating in the Voice4Impact challenge, but it would be nice to have more women sign up to be task managers in the future.

I look forward to more women in AI leadership roles, who will help provide more diverse points of views.

Overall, I have enjoyed contributing to the Omdena Voice4Impact challenge.

This experience has helped me land multiple machine learning interviews, and one interviewer specifically noted my team management experience at Omdena.

It shows that Omdena is not only a platform to hone your machine learning skills, but also a platform that helps to take your career to the next level in management.

Want to become an Omdena Collaborator and join one of our tough AI for Good challenges, then apply here.

If you want to receive updates on our AI Challenges, get expert interviews, and practical tips to boost your AI skills, subscribe to our monthly newsletter.

We are also on LinkedIn, Instagram, Facebook, and Twitter.

--

--

Yang Gao
Omdena
Writer for

Machine Learning Engineer | Data Scientist | I write about how data transforms the world, and how it changes my life.