GNIP: The Social Fire Hose

Wolox Engineering
Wolox
Published in
3 min readSep 3, 2015

In 2008 the period that redefined the interpersonal relationships paradigm began: The social media age. After the Facebook boom, many social networks were born. People use them to share their life experiences, say how they feel and publish videos to show their talents. This content generates countless data traffic that could be used strategically in order to gain value. Can you imagine having the possibility to access and process all of this information in real time? GNIP is the solution to this!

GNIP is one of the largest providers of social media data in the world. It is fully integrated with several platforms, such as Twitter, Tumblr and has an exclusive agreement with Foursquare. At the same time, GNIP allows the handling of a series of public APIs from many other social networks, such as Facebook and Instagram.

GNIP’s top service is “The Social Fire Hose”. This allows us to connect into the public data flow of social networks. For instance, for Twitter, GNIP will use each and every one of the real time tweets. The advantages it has over most public APIs providing similar access to their data streams is that GNIP has no limitations. The data registered in the social network will be provided to whoever is connected.

Imagine the amount of processing power needed to analyze all of this information in real time. We are talking about at least 6,000 tweets per second. Since not all data running through these streams is relevant and, as you are connected, you will be searching for particular data, GNIP developed a filtering language, giving users the opportunity to set rules and choose the information they want and don’t want to receive.

PowerTrack is the language developed by GNIP that allows users to filter data and take advantage of the stream. It works by creating rules that are later applied to the full stream in order to decide which objects follow the rules and which don’t.

On the rules syntax, the expression must be specified first and then, if you wish, a tag can be added that will be used to differentiate each rule, as more than one can be applied to the stream. Let’s assume we want to see all the tweets from Argentina, that refer to Wolox, and that the person tweeting follows at least 500 people but no more than 10,000. The rule would be specified as follows:

{“value”: “friends_count: 500..10000 contains:wolox country_code:AR”}

There are an endless number of possible filters. In social networks, many different subjects are discussed and, being aware of all this information has a really high value. If we use many filters, resources will not be used to process irrelevant information and we will be able to focus on what is truly valuable.

GNIP has a wide spectrum of use, allowing users to access large blocks of data in real time. It had such an impact in the social media world that on 2014 it was acquired by Twitter, their largest information supplier.

Posted by Ignacio Rivera (ignacio.rivera@wolox.com.ar)

www.wolox.com.ar

--

--