How our AI team geo-locate social media posts in non-creepy ways

Published in

Linkfluence stories

7 min readNov 26, 2019

Good social media marketing requires good data. You need to know who your audience is, when they’re most active, and what they’re looking for.

In general, the more you know about your target audience, the more effective your social media efforts will be.

One vital aspect of this is location. Knowing where your fans and prospects live, play, and shop lets you create more tailored campaigns and better experiences, both in-store and on social.

In short, it pays to know where people are talking about your brand. But getting that information isn’t always a simple matter.

Geolocation inference on social media

Social media has made finding potential buyers easier. Users gladly tell the world about their favorite getaways, luxury products, and brunch spots. You can know where they like to go, and even when they prefer to go there.

Of course, when Twitter and Instagram users geolocalize their own social media posts, extracting the location is easy.

Unfortunately, the majority of social media posts aren’t geolocalized. Most users simply don’t tag their locations!

We ran a quick check with our social intelligence platform to look at the percentage of posts geolocalized:

From the sample of Instagram posts we looked at (green line), the yellow line represents the proportion of Instagram posts that has been geolocalized by their authors (avg 37%)

From the sample of Tweets we looked at (green line), the yellow line represents the proportion of tweets that have been geolocalized by their authors (avg 0.3% )

As you can see, around 30% of Instagram posts are geolocalized. For tweets, it’s less than 1%.

This means if we relied solely on users-geolocalized posts, we wouldn’t have a big enough sample of users to identify global trends based on geographical data analysis.

Geo-inference and Artificial Intelligence to the rescue

One of our main missions as a social media intelligence company is to help brands and businesses find their target audience — both digitally and physically. Our platform, Radarly, analyzes billions of posts every day to deliver actionable consumer insights.

We need a way to find the locations for those users who don’t provide them. Not through hacking or taking private user information — this is creepy and almost certainly illegal.

Instead, the Artificial Intelligence Team at Linkfluence built a machine learning model. It uses internally developed AI and deep learning technologies to automatically infer geo-location from the users’ posts.

In other words, we can boost the rate of geolocalized users based on what we already know about social media use. Here’s how it works.

How social media geolocation typically works

Previously, we relied on the user to fill the location field on Twitter. For example, you can see that Neymar JR sets his location as Paris, France:

This makes life easy since the text is clear. Unfortunately, social media is filled with noise, and most users don’t fill this field. Sometimes, they even give information that’s completely unrelated to their location.

If you look at Rihanna’s Twitter bio, you’ll see that she put her new album name ”Anti” in the location field:

Classic social listening tools would be left with no location information. Or worse, perhaps “Anti” is actually a city somewhere!

To go a step further in our geo-location inference technology, we relied on deep learning to develop an upgraded inference model.

How our new geo-inference model works

We developed a new geo-inference model using deep learning and artificial neural networks. As mentioned in a previous post, deep learning is a sub-type of machine learning algorithms. Artificial neural networks are deep learning systems inspired by the biological neural networks that constitute the human brain: they are based on artificial neurons connected by synapses. When you start stacking many neurons layers, your network is becoming “deep” and that’s what we call “deep learning”.

Deep learning has proven to provide state-of-the-art results in Natural Language Processing, enabling to capture complex patterns by learning them from a large amount of examples.

Hence we took advantage of the large number of tweets collected by Linkfluence over the past years and we developed our own deep learning architecture. However, before “training” a neural network, it’s important to decide which “features” or “factors” it should focus on.

The Wikipedia definition of a feature in machine learning is straightforward: it’s “an individual measurable property or characteristic of a phenomenon being observed”.

For example, imagine you want to predict the result of a Football World Cup game. One interesting “feature” would be the number of times each team won playing against each other in the last 5 years.

For the geolocation inference on Twitter, we chose the following features:

The text of the tweet
The text of the biography
The text of the location
The name and surname of the user
The language of the biography

Here’s another example. Linkfluence’s beloved CTO, Hugo Zanghi, tweeted last April at the latest Facebook Developer Conference in San José, California:

You can see that Hugo’s Twitter location field is empty.

But we can use the tweet itself and the other text fields to add more information for the model. The time zone and UTC offset, combined with the creation time of the tweet, are also rich in information.

In this graph from Huang and Carley, you’ll see that users indeed have different Twitter posting habits depending on their location across the globe, which makes a lot of sense:

Distribution of the volume of tweets per country versus the UTC posting time.

We can train the neural network with this pattern to better locate a user (and thus a post).

Based on this graph, we added more “features” to the model :

The creation time of the tweet
The language of the text
The time zone of the user
The UTC offset of the user

Even though Hugo’s account is set to timezone in France, the creation time of the tweet was around 4 am in France. So the model will infer that this tweet doesn’t correspond to the tweet distribution graph.

In other words, Hugo was probably not in France when he posted it.

Our model will rely on other “features” to find clues for the location of the post. In this case, Hugo’s tweet mentioned San Jose, California in the text.

Using all of the information available, the model realizes that it’s more likely Hugo was in the US while posting this tweet, and not his usual location in France.

How do we know this information is accurate?

It’s natural to wonder how accurate this data may be. It’s based on “inferences,” after all. So naturally, we run tests.

An easy way to verify our machine learning accuracy is to isolate a piece of data before training the model. We then train the model and evaluate it on the piece of data our neural net has never seen.

From this, we can obtain the “test accuracy” of our model. Basically, does the model give us the results we expect to see?

Our new state-of-the-art geo-inferencing technology not only increases the accuracy from the previous algorithm but also gives our social media intelligence suite a bigger volume of geolocalized data.

This means that we can help brands access even more comprehensive social insights on their presence, audience, and industry, to feed their marketing and sales strategies.

Specifics: the neural network architecture

Because some readers need to know the nuts and bolts, here’s what the model looks like:

Basically, each text field is “tokenized” (split into chunks called “tokens” with one token corresponding approximately to one word), and each word is projected in a fixed dimension. (See Word Vector for more details on this subject).

We apply three different convolutions for each textual field and concatenate (link together) the result into one vector. We then add the other “features” like posting time or user language to the vector and feed it into a classic fully connected neural network.

We added two softmax layers to output the probabilities of a tweet to be geolocated in a country and a city.

Once the deep learning architecture is built, you need data to train your model. Luckily at Linkfluence, we have billions of posts coming from the web, across every major social media platform.

To train this model, we took a sample of 30 million geolocalized tweets and fed the model with it. Once trained using our graphical processing units (GPUs), we reached 0.964 accuracy on countries. That means our neural network predicts the correct country 96,4% of the time.

More social media location data, with more accuracy

Thanks to this new artificial intelligence model, Linkfluence users now know where every social post comes from. Not just those users who explicitly state their location.

And because people move around and post from new locations, you see the location of the post, not just of a user’s social media bio.

Which means more accurate data for you, and better information to build your next marketing campaign.

We’re always eager to exchange ideas and answer questions. Feel free to reach out anytime.