Network Analysis to Find out the Context of a Tweet: A Community Detection Based Approach
Being someone who believes in learning by getting my hands dirty, Omdena came to me as a savior when I first got to know about it on a social networking site. After working in several small ML projects, I was looking for an opportunity to work for a larger group collaborating together and working to build something meaningful. I applied for an Omdena challenge as soon as I got to know about them and eventually got selected.
One of the most interesting things that I loved about the first challenge was- my personal learning curve. I observed that I am learning about a particular thing more quickly than I usually do. The reason behind this is the frequent communication between other collaborators who have different levels of experience and the environment of sharing those selflessly.
For my first challenge, I was assigned the role of a Data Wrangler where I was really active and received accolades and appreciation from my seniors. The experience of working with the Data Labelling team and then helping fellow teammates running the model on AWS EC2 was outstanding.
This collaborative spirit is one of the reasons I definitely want to keep collaborating in this community.
The challenge: Analyzing tweets to prevent gun violence
For my second challenge, I was assigned the role of an ML Engineer and hence I got an important opportunity to get hands-on experience with a real-world ML project.
The problem statement of this challenge was straight forward- performing sentiment analysis on tweets without profiling the user so that the results could be used to prevent gang violence in Chicago.
As a subtask of the sentiment analysis, a network analysis team was formed which I was assigned as the task manager. The main purpose of the task was to help the sentiment analysis team by providing them with the context that the text of each tweet is about. It was tricky to come with a quantitative value that would depict the context if a tweet is violent or not. After reading several research papers and graph theories, I decided to implement community detection algorithms and combine the reservoir of violent signal words to come up with probability value (the probability that a tweet is more prone to using violent words) against each tweet.
5 steps to build an effective network analysis of tweets
- Using python’s networkX, I created a graph using the mentions and authors of the tweets
The nodes represent mentions in the tweet/author of a tweet. Edge A →B means B was mentioned in the tweet posted by A.
2. Thousands of tweets were used to create a directed graph and using Girvan Newmann algorithm, the communities in the networks were detected. Also, using PageRank values of each node, the influential members in the network could be identified. This value is not crucial to the network analysis but can be useful if one tries to track any gang member who is influential in the network.
3. The members in the communities are either authors or mentions. So, the tweets were then tagged with the community number based on the mention or author names.
4. The total number of signal keywords in all the communities was calculated and so was the total number of signal words for individual communities.
5. The final result was a dataset of tweets that had the community tag and probability of using violent words — based on usage of signal words within the community relative to all the communities. For example, In the picture below, members from Community 1 who are authors or mentions in the tweets are more likely to be inclined towards using violent keywords. So, the tweets which contain authors/mentions from this community are contextually more violent.
Also, the network analysis can give an insight into which members are more influentials within the community. One can get a notion by looking at the PageRank values of the members of the community. The greater the PageRank, the more influential a member is.
It was an amazing group work with frequent feedback sessions about my work. The whole experience was really motivating. It is not only ML Engineering all through- you get to learn a lot about Data Cleaning, Data Engineering, and Data Analysis as well. Lastly, the real motivation came from the fact that I am working on this project to solve an existing real-world problem. What can be more exciting than this?
It was an amazing group work with frequent feedback sessions about my work. The whole experience was really motivating. It is not only ML Engineering but also you get to learn a lot about data cleaning, data engineering, and data analysis as well.
Lastly, the real motivation came from the fact that I am working on this project to solve an existing real-world problem.
What can be more exciting than this?