Time Segmentation: What Time Should You Tweet?

Another follow-up on my time-segmentation series, this time looking at how the approach might be applied in PR and marketing contexts. Whether selling a product, looking for a job, making new connections, or even just being on social media in general, the time that you attempt to connect with people can often make a big difference.

When you send out email-blasts, tweets, status updates, or any other “public” forms of communication, you’re essentially making a cold-call out into the world, hoping that someone, somewhere out there, will respond to what you have to offer. While it might not be possible to please everyone, generally speaking (especially if you’re trying to sell something) you want to maximize your chances of success by choosing a strategy that will reach out to the most number of people at the best possible time.

As an example of how this might be done, I took some public data from Sentiment140, a machine learning project that originally started as a class project by 3 graduate students from Stanford University a few years ago. The Sentiment140 algorithm uses natural language processing and data from social media APIs in order categorize tweets into negative (0), neutral (2), and positive (4) categories. Some sample data below might give an indication of what this might look like:

Sample data from Sentiment140's public archive.

Although the screenshot above is just a small sample of the overall data, one pattern already makes itself clear: cable companies have very bad branding! This shouldn’t come as a surprise to anyone, though, if you’ve ever had to deal with them in the past.

Challenges of Analyzing Time Data

Now onto the goodies. The second data set consists of 1.6 million tweets, each categorized as positive or negative, from a random sample of users and topics. Since we’re primarily interested in understanding the responsiveness of the users, we’re going to be focusing only on the positive and negative sentiment scores for now. “Neutral” sentiments also include inconclusive results where the algorithm wasn’t able to make a distinction either way, so we’re probably doing ourselves a favor by omitting them anyway.

For the sake of simplicity, we’re going to map our time data to United States Central Time (CT, converted from UCT) as our main point of reference as a middle-ground for the English-speaking demographic. We’re just here to look for broad, overarching trends:

# of tweets (out of 1.6 million) mapped to US Central Time

This data seems to match the oft-cited claim that Twitter is busiest during late mornings and early afternoons, except, uhh, there’s also a peak at 4am in the morning? This is actually a good example of what makes analyzing time data difficult: because the data that we have isn’t adjusted by local time, it’s probably the case that the “peak hours” have slided over into the wee-hours of the morning as the sun starts to rise from the east. When you send a message out to someone on the internet in a different timezone they’re actually receiving it in a different part of the day entirely —pretty weird, if you think about it.

Unfortunately this weirdness tends to cause a lot of issues in time-based analyses, since the process is still very ill-defined at this point. When someone makes a chart about user statistics during certain hours of the day, what timezone are they talking about? How, where and when was their data scraped? Are the methods of analysis and production mapped to the correct timezone? Since time is relative to the observer, it’s very easy to get the data out of sync with one another, so it needs to be handled with great care.

As a data practice, in order to avoid confusion and the possibility of contaminating your data, it’s important to have both the universal (UTC) and local timestamp (relative to the user/location) clearly mapped to every data point — from there, it becomes possible to build time-based products and features in a fairly reliable manner. This practice is fairly uncommon at this point, however, so it may become necessary to create a few input systems to make sure you’re getting the data that you need.

Are You a Morning or Night Person?

Now onto the final analysis! While keeping in mind of the “slidy-ness” of our data, when mapped against each other, the positive and negative sentiment scores actually create a pretty interesting pattern:

1.6 Million Tweets Categorized and Charted into Positive/Negative Sentiments

Positive tweets seem to happen more often in the morning, as the negative ones slowly start to catch up as day goes on. The two lines converge during “peak” hours, then eventually gets overtaken by negative tweets by for the rest of the day. I did a few informal polls with people I knew on an anecdotal level, and the consensus seems to be that people are generally in a better mood earlier in the day.

Not being a morning person myself, I can’t really relate to people who seem to have a lot of energy early in the day, but the data has spoken: if you’re looking for positive responses from Twitter, in general, it’s better to do it in the morning rather than night. Maybe the best Twitter strategy is to do all of your activity in the early morning before people start to get tired and grouchy. On the other hand, if you’re selling relaxation or comfort products you might take the opposite approach and try to catch people later on in the day.

Of course, this analysis is a random sample from a random point in time so its actual applications are fairly limited. In practice, we’d need more data points to work with (localized time, geography, demographics, topics/keywords, time of the week, etc.) and an actual company/brand objective to map our findings against. But hopefully this post gives an idea of the insights that can be gathered from time-segmented data — timing is everything, after all!