Member preview

Mining Twitter Data for Product Launch Insights

Millions of people use twitter every day to discover and communicate what is happening around them. With new features rolling out, your product team might be curious to learn how fans rave about the launch on Twitter, or how to leverage the powerful tweeters to get the news out to increase awareness and eventually drive adoption. Thankfully, Twitter offers a free, rich dataset that uncovers tweeter demographics, comments and behaviors that you can learn from, all accessible through public APIs. In this blog post, I am going to introduce the type of questions you can answer using Twitter’s open data, how to access the data through APIs, caveats of Twitter’s data quality and visualization techniques.

Background: I used to work for Microsoft SharePoint, a file-storage and collaboration application. Microsoft Ignite is a major annual conference for SharePoint to announce critical features and future roadmaps. I was curious to learn how SharePoint fans responded to MS Ignite 2017 that took place in Orlando last September, as users witnessed SharePoint’s transformative modernization in 2017.

Insights: I managed to answer the following questions mining twitter’s data, and the insights primarily serve to improve Go-to-Market strategies.

  1. What kind of rich media do people (re)tweet the most? Do articles such as blogs, official press releases, images, or videos resonate more with users?— The intention of asking this question is to encourage the production of popular types of content (i.e. blogs), making product launch news discoverable as tweets with certain kinds of rich media are more likely to be shared.
  2. Who were the SharePoint users that contributed the most original content? Did Microsoft Employees, SharePoint MVPs or ordinary users in the SharePoint Twitter community write more about the new features? — Microsoft MVPs are high-value contributors in the user community nominated by the company. SharePoint MVPs are expected to create more original content than regular users, but I was curious to find out if the SharePoint Org missed any die hard contributors who created valuable content on Twitter.
  3. Who were our influential users (measured by # of followers), and do they tweet more frequently during/after the Ignite conference than users with fewer followers? — I asked this question as I wanted to know where the influential fans’ tweeting frequency stood in comparison with regular users. If the influential SharePoint users didn’t tweet frequently in this year’s conference, can the marketing team work with them to maximize impact next year?
  4. Global distribution of tweets using #SharePoint and #MSIgnite (Interactive Map Visualization): This map shows users from the east coast of the US and western Europe were most excited about SharePoint’s presence at Ignite. However, Twitter’s wide adoptions in these two locales might have skewed the result.

Twitter API Access and Data Quality: Marco Bozanini wrote a great blog about Mining Twitter Data with Python, and I was able to follow it through by primarily accessing Search Tweets API. Here are a few caveats:

  1. Data Freshness: If you use a standard developer account, you can only access the latest seven days of tweets.
  2. Rate limit: Standard (free) search restricts your download to 450 tweets per 15 mins, and you have to apply keyword/hashtag filtering. As Marco Bozanini mentioned in his blog, you can programmatically retry after the 15-min window. I was able to download ~3K tweets until the search exhausted.
  3. Geolocation Data: The geolocation that powered the map above did not represent the place where the tweet was sent. Instead, the location was self-identified by the users, and sometimes users might enter a meaningless value like “Internet”. As explained in Tweet Object Data Dictionary, there’s a coordinate object that is supposed to reveal the longitude and latidude of the sent location of a tweet. But unfortunately, the field was 98% null in my sample. I had to resort to “location” in the user object and translated the string to coordinate tuples using geopy libary.

Map Visualization: I used geojson.io to produce the map above. This tool abstracts the visualization away from users and only requires input data formatted as a geojson. Steven Metts created a helpful video that explained how to format a geojson.

As a global social media platform, Twitter is indeed a goldmine when it comes collecting feedback for your product. Nearly all tweets are public and easily extractable, making quantitative analysis possible if product teams are curious about how users perceive a feature launch. Anyone with a moderate exposure in Python can gather large datasets rather quickly thanks to Twitter’s friendly API.