A Former Marketer’s Struggle To Understand Twitter (Just a Little Bit Better) Using Natural Language Processing, Part 1

First, the following content will be expanded into a series of posts over the next few days. Why? My class and I are tackling a new project that involves both introspection and data analysis and I want to document and share my findings.

Our crew of data science students has taken on a large project where we analyze various texts and video, in order to provide a local marketing firm the strategy they need to boost user engagement across their social media channels. Sounds easy, right?

As a former marketer, this task is incredibly challenging and brushes up against some of the things I investigated prior to datsci. WRT Twitter, consider the following challenges:

  • You’re a Twitter user. Why do you choose to like some tweets but not others?
  • When scrolling through your Twitter feed, are you more likely to favorite, retweet, comment or reply to the tweets you find enjoyable?
  • Some people treat “favorites” as a way to bookmark a tweet. Do you fall into the category of people who hit “favorite” to bookmark? How does one distinguish a bookmarker from a liker? Does that action even matter?
  • What “matters” more coming from you, a favorite, a retweet, a comment or a reply?
  • Does a tweet matter more or less if it echos content also featured on another social media platform?
  • How does one account for time and seasonality, when discussing what tweets matter and and what tweets don’t?

None of these questions are easy to answer. And knowing the answers to these questions requires an understanding of other elements, like whether a tweet had promotional dollars behind it, how many impressions it garnered, whether it utilized a call to action, what its half-life was, who the tweet was directed at and how it was intended to be used in the first place.

There are some modeling techniques that allow data scientists to overlook the nuances of Twitter conversation. One of my interests in DS is natural language processing, or the interpretation and classification of natural language, which has made headways in the past decade but has hit a massive wall in the form of social media, which is NOT natural language. (You don’t talk in emojis. Do you?)

Fortunately or unfortunately, the impact that social media has on our lives is too import to overlook, which is why thousands of people have dedicated their careers to developing a scientific and business-level understanding of how to ‘game’ Twitter (and Facebook. And YouTube. And Instagram. And blogs. And Snapchat. And Search. Etc.) for the sake of generating more followers, and ultimately, getting people to convert in some way (through likes, buys, or shares).

My goal here is to connect the nuances of Twitter and “unnatural” language to the nuances of natural language processing, so I can derive models that predict and infer what an optimal tweet looks like.

There are some data-science stuff to follow shortly, including some initial analysis, EDA, and tweet categorization through Latent Dirichlet Allocation.

Thanks for reading. More to come shortly.