When The Marketing Team Wants You To Analyze … “A Few Tweets”

Zlatka Staykova
PowerToFly
Published in
6 min readNov 9, 2016

In my role as data scientist here at PowerToFly, I was asked to provide data on what conference attendees were tweeting about during the HR Tech Conference. This post details the steps I took to data mine 6,400 tweets (click here for a quick visualization of the data).

First, I took a statistical approach of analyzing patterns and trends across keywords on twitter during the conference through text mining (also known as text data mining), and identified some interesting insights.

So how does text mining work? Let’s get into the process using a random quote from Antoine de Saint-Exupery’s most famous novella, The Little Prince:

“One only understands the things that one tames,” said the fox. “Men have
no more time to understand anything. They buy things all ready made at the
shops. But there is no shop anywhere where one can buy friendship, and so
men have no friends any more. If you want a friend, tame me. . .”

The word “one” is encountered three times in this sentence; the words “buy,” “friend,” “men,” “shop,” “tame,” “things,” and “understand” two times each. Already from these little glimpses we can get a feel of what the entire text contains. For such a small phrase we can count the words by hand, but what happens when you have a larger data sets of text?

For this project, I used the widely known algorithm for text mining quoted in a number of blog posts around the web. To start, I prepared the text in the proper format, which means cleaning the numbers in the text, removing punctuation, and transforming the letters to lowercase, so “If you want a friend, tame me” became “if you want a friend tame me.”

In the case of text from the web, I also needed to remove all odd characters and URLs, such as @, %, #, “http://” and so on. The last and most important step when cleaning texts is to remove the so-called “stopwords.” These are frequently used words such as pronouns, articles, and prepositions. To go back to our first example, after these steps the original phrase would be transformed into:

one understands things one tames said fox men time understand
anything buy things all ready made shops shop anywhere one
can buy friendship men friends want friend tame

In the case of tweets, we can add more words that can pollute our sample, such as the hashtag itself (i.e. #hrtechconf) as well as different combinations of the words it contains.

With the text clean and ready, I started mining it.

From 6,400 tweets I counted 4,700 unique words and encountered 3,000 of them more than once. If a tweet consisted on average of three words (besides those that were cleaned already), it meant that every sixth word was repeated. If we go back to our assumption about three-word tweets, it means the content of every second tweet appeared already.

The number of encounters of most frequent words

For an easy visualization of the words and their frequencies, I created the above chart. You’ve probably noticed the strange word ‘analyt’. Sometimes while preparing the text for text mining in the cleaning phase, words get cut off (“the method stem completion” at the source). After that, I restored the original form of the words while mapping them to the original text (stemCompletion2 method from the source). In this step it can happen that the algorithm doesn’t recognize the full form of that word, so as a result we received ‘analyt’ instead of analytics.

Another commonly used representation of these frequencies is the so-called “Wordcloud.” Here, I set the lower threshold to be much lower so I could populate the cloud (if I kept the lower boundary to 250 occurrences, our cloud would be populated with 17 terms. For a proper cloud we needed more words).

A Wordcloud from the HRTechConference tweets

Next, I wanted to see how these terms correlated with one another.

How did they connect? This graph shows the most frequent words with their correlations: the thicker the line is, the more frequently the terms encountered in a combination. For example, the thickest line in the graph is between the words “learn” and “data.” “Learn” is also connected with the word “recruit,” and the word “hr” is connected with the word “technological.”

The correlation between frequent terms. The strength of the connection represents the tighter connection between the words

Finally, I want to mention a few things about “Associations”. Associations in text mining are implicit expressions of the word X to the word Y.

Say we have a set of birds in different combinations:
1) Hummingbird, owl
2) Owl, goldcrest, starling
3) Owl, goldcrest, starling, penguin
4) Penguin, owl, goldcrest, hawk
5) Starling, hummingbird, hawk

In this set, an association would be {owl, goldcrest} to {starling}. We have two out of five rows that contain the three of them {owl, goldcrest, starling}, so this would be in 0.4 (or 40 percent of the cases). In other words, an association provides the occurrence of a word, based on the occurrence of another word.

I searched for associations of terms like “Women,” “Tech,” “Recruit,” and “Inclusion.” The results are shown in the table below. It is fascinating that the word “women” is associated with the word “fearless” with 60% correlation weight. The word ‘tech’ stood next to women with a strength of 24 percent. It is no surprise that “inclusion” involved “flexible,” while words like “tool,” “machine,” and “guide” showed many associations for the word recruit.

Associations can be useful if you want to find other people who are interested in your topics: we found @lisamsterling and @brynnespeak as top posters surrounding the HR Tech Conference, for example. It also highlighted other important hashtags, like #bloggingjobs and #hireonlinkedin.

Associations on keywords

As we have seen, the possibilities are endless with text data mining, and give us a new perspective on what we’re actually talking about — gaining valuable insights from attendees while I’m halfway across the world.

Read about the visualization of data from the HR Tech Conference:

PowerToFly is the women-in-tech hiring platform working with companies who are prioritizing inclusion, diversity, and making their workplaces woman-friendly. Don’t have a profile on the PowerToFly hiring platform yet? It’s easy! Sign up here and join our community of 100,000 women.

Read more about women in data:

--

--