Switch on your GPS. Now!
This title does not perfectly represent the gist of this post. But it’s more like a reminder for myself. I never add geolocation to my tweets, just because ‘what for?’, right? Do you? Do you even have a twitter account? If not — well, no worries, as @bgoncalves said during the lecture I’m going to talk about below, ‘Twitter is the strange place, anyway, so we’d better switch to Wikipedia”. By the way, follow Bruno on Twitter! He needs the data:)
These will be the impressions on the Coffee&Data Science talk that took place at the UCUniversity yesterday. Apparently, as now it’s 2 a.m. and only the second paragraph so far, you will probably read this piece in the morning, that’s why I have to correct myself and put there ‘the day before yesterday’. But, anyway.

So, again, that question: do you use your geolocation somehow on the social media? Probably, on Facebook, most of us, just because we like to add this ‘attending something hu-u-uge’ status to the Timeline, don’t we? If you’re a girl then, probably, you also add geolocation on your Instagram photos, because it just gives you more likes, nothing extraordinary. But Twitter? Does it help anyhow? Most likely, it just bothers you, sometimes you even don’t want other people to know that you exist on Twitter (like, your employees, right?) I mean, look at these tweets on my Favourites panel :)

But I never thought of this from another perspective: everything around us is data. More or less, useful data. Why not get the benefit?
When I was travelling to Belgium this May, I didn’t know about that country much. Three things, maybe more: their national football team is sometimes called The Red Devils, the movie In Bruges was filmed in Bruges, and lots of international institutions, like EU Parliament, are based in Brussels.

After the trip, I learned a few things more: about the beer, for instance. And about this ‘imaginary line’ that cuts Belgium in two parts due to the language preferences, namely, French vs. Flemish. Well, sure, this separation has a historical background, and I’m not going into the details here, but I met some people that spoke English to me, Flemish to their friends and French to their parents. And they all claimed that these linguistic borders are very exact. But how can you get the precise partition curve?
The traditional approach would be to interview a small number of people and, based on the answers provided, generate the result. But you can’t select more than, say, thousand informants, because it’s not that easy to gather all this information manually. Moreover, we don’t want our data to be biased towards the standard collocations and formal language, so we can’t really use just regional newspapers or other mass media corpora. But, hey, we have tremendous amounts of data online! That’s the point where it gets to Twitter, actually. Sure, that data is very biased as well. Only one country in the world seems to have 1% of inhabitants using Twitter, and it’s … Kuwait! Belgium is nowhere near that point. But, nevertheless, just have a look at the distribution:

That’s how your tweets about your cat/evening meal/monstrous boss can contribute to the science :)
For further details on this topic — check PLoS One 8, E61981, The Twitter of Babel: Mapping World Languages through Microblogging Platforms.
Also, you can explore the distribution of languages in, say, London and suggest the expertise to your client on account of opening the oriental cuisine restaurant in the district inhabited by Asian people. So many applications and opportunities, just think about it for a while!
In Ukraine, we have a rather poor Twitter community. And we don’t use geolocations that much) So there’s still some room for tuning these experiments to our reality, but, who knows, maybe someday a group of smart guys will find out where the actual mental separation line lies on Ukrainian grounds.
Another cool thread during the talk was dedicated to the Linguistic Change. And the crucial question was, ‘Is English becoming American?’ To cut this story short — yes. Sorry, British accent lovers (read: Benedict Cumberbatch lovers), that’s the real world, and Nicky Minaj lyrics seem to influence the society more than Sherlock series.

There is no doubt that both countries, United Kingdom and the United States, determine what’s going on in the world, and what gives them this power is, among other things, English as the language. Sure, they have different spelling, vocabulary, and that’s how one can analyze the data and figure out who’s the dominant nowadays. (There was already a spoiler, sorry, no secrets left to keep). Actually, Grammarly suggests me use ‘analyse’ instead of ‘analyze’ a few lines above:) That’s the American dominance in practice, I can’t help writing the verbs with ‘z’ wherever it’s possible.
So, how did the ‘empire where the sun never sets’ end up with losing leadership? And what do I mean by ‘losing leadership’? Well, there is a lot of interesting stuff to discover if you read this, but I’ll show you just one diagram taken from that paper:

Surprisingly, in Western Europe where English teaching is traditionally British-English, the American influence is impressive. British English, though, prevails in former colonies, which is quite obvious.
One can use his/her prior knowledge of history to understand what actually happened. And when. The simple answer could be ‘wars’. Yes, but if we consider the wars to be the only factor, then America should have ended up a loser, as lots of immigrants moved to the USA and affected the language tremendously. So, there should have been something different. I would never consider that to be the case by myself (at least, not in a blink of an eye), but Bruno suggested that the biggest jump in polarization ratio happened around 1828 when An American Dictionary of the English Language with the explicit goal of systematizing the way in which English was written in America was published.

You can find lots of links on this particular topic posted by Bruno on his Twitter, and not just the scientific papers, but also even news articles. Here’s the joke (on Twitter, sure) of his collegue, Anastasios Noulas (also the teacher of Lviv Data Science Summer School, on Urban DS):

Yeah, lots of funny stuff there) Thank me later.
Soooo… There were lots of things discussed: can you get the predictions on the age/sex of the person based on his/her tweets? Can we predict the revolutions or other geopolitical events based on the language issues? Can we look how the new words emerge on Twitter and what can we do with that sort of information?
I don’t have an appropriate ending. It’s 4:30 a.m. and I have to write the English exam in a couple of hours. Let’s assume that this was my practice session. Wish me luck:)