Rihanna Is(n’t) A Word
Understanding Hashtags When You’re A Robot
Hashtags have become a huge part of our online conversation. Every unique story happening on social media has one or two main hashtags and a multitude of secondary and tertiary ones.
ZeroBot’s goal is to understand each conversation online. Therefore, it needs to know what the hashtags are and how to read them. Although this sounds like an incredibly easy task, it turns out to be a pretty difficult task for ZeroBot. Here are three problems we’ve come across:
1. Separating Words
Hashtags are usually multiple words joined together. Some people might capitalize each word to help readers, but for the most part, no one complains about having to figure it out because it’s simply not that hard. Through years of reading and understanding context, we’re able to subconsciously add the proper spaces where they need to go.
Here’s a simple hashtag from Rihanna’s current tour:
You can find Anti, World, and Tour by taking a list of all English words in the dictionary and ranking them based on frequency of use. But this doesn’t take care of all issues.
2. There Aren’t Always Only Two Words
Within #AntiWorldTour we have the words “An” and “or” and “To,” prepositions and conjunctions that are much more common than any of the correct words. If you didn’t know the context of the letters, they could be separated into words that aren’t actually words: “An tiw or ld to ur.” If we assume that hashtags should be a stream of actual words, this will be the result — and this does not work.
3. Some Stories Have Words that Aren’t in Dictionaries
#AntiWorldTour is often tagged with #Rihanna or #riri; neither of these words are found in the dictionary. Because of this, our method for understanding these hashtags is to allow each story to be its own dictionary. Even those “an” may be more common than “anti,” that isn’t the case in a story about Rihanna. “Rihanna” and “riri” may not be words according to the Webster’s dictionary, but they ARE in a story about Rihanna. The posts by people at the event dictate the words that are acceptable in that subset of the English language, and this allows us to bypass the inherent logic behind how AIs read the world.
As always, feel free to reach out to us with any questions or comments at email@example.com.