Elezioni.io Is a News Aggregator and Analysis Tool About the Italian General Elections Held on March 4th, 2018
This article has been updated in August 2018.
Italian General Elections are a mess, they always have been, but in 2018 the chaos reached its peak. The disgregation of traditional parties that happened in the mid-90s (Second Republic) was followed by 20 years of center-right (read: Berlusconi) versus center-left (Partito Democratico) dualism. But recently Italy has been experiencing, just like most western countries, a rise of populism which led to the beginning of the so-called third republic. The political scenario is so confused, that the entire campaign has been built on fear and conflict, instead of positive proposals and pro-activism. This is the reason why Carlo Zapponi and I tried to build something to clear our ideas and, maybe, to help others vote in an informed way, without forgetting that Italy obtained a horrific 52nd place in RSF’s World Press Freedom index.
The concept
A lot of other websites and services already analyse voter’s sentiments, social networks, and politicians’ popularity. What we wanted to check was how the press was reacting to events related to the election and at the same time give voters an instrument to better understand the facts. These are the reasons why we started collecting news coverage about the elections, and tried to find a way to analyse them. How are different sources covering the same events? What is the sentiment of the press about politicians’ actions? How strong are the reactions from the readers?
What’s Elezioni.io?
We see Elezioni.io more like a platform than a website. We started our analysis on the Italian General Election of March 2018, but we are ready to start from scratch for the next local and European elections.
The website is basically built on two main areas: the archive, where a reader can find almost all the articles that have been published online about politicians or parties (elezioni.io), and a real-time data-visualization that allows the reader to access the same news, but with a different perspective (facciafaccia.elezioni.io). Each news is categorised based on the coalition it talks about (multiple categories are allowed if an article talks about, for instance, a street fight between communists and fascists): tags are added and the sentiment is extracted by using an NLP engine; the popularity of the article is then monitored for the five days following the publication, to check how the voters’ perspective changes and the Social Network longevity of each piece of news.
This huge collection of information is browse-able in two different ways: in the archives you can find all the articles, cross-referenced and tagged to allow you to easily aggregate news about the same story; in the faccia a faccia section, on the opposite, you can see the same news with a focus on their popularity and sentiment.
How does it work?
The data is collected and analysed by a set of web-bots and scrapers. The collection and validation of the data, its analysis, and Social Network monitoring are all asynchronous tasks: while one bot searches for new articles and validates them, another one analyses the queue of already-validated news, and a third one, for each valid article, monitors Facebook and Twitter at regular intervals to check their popularity. Other bots are used to clean and normalise the data to allow some real-time analysis. The archive and the faccia a faccia visualisation are fed through a REST API.
Analysis
We also run some analysis on top of the data we collected. To avoid the risk of influencing the data, we only run quantitative analysis and avoid any qualitative one.
… the data we collected was perfectly aligned with the final result when we talk about the winners, while it failed in predicting the fall of Partito Democratico…
There are different areas: in the one that we informally call fantapolitica (or fantasy-politics), we tried to predict the results of the election basing our projection on two different indexes: the number of news story about each coalition, and the number of shares of articles about each coalition. We stopped the generation of this data on March 4th, the day of the elections, just to highlight the difference between the “perceived” weight of each coalition and its real electoral weight. The result is stunning: the data we collected was perfectly aligned with the final result when we talk about the winners (Movimento 5 Stelle and Lega — the latter part of the center-right coalition), while it failed in predicting the fall of Partito Democratico (center-left coalition). Also, the two extremes (left and right) got results that are absolutely misaligned with our projection (Liberi e Uguali -left- obtained 3.3%, our projection was around 20%; Casa Pound Italia and Forza Nuova -both extreme right parties- obtained 0.95% compared to our projection around 25%).
Other analysis were run against single “milestones”: we wanted to check how a particular news story influenced both the news coverage and social network activity related to the coalition involved. What we obtained highlights a few facts: impartiality while running these kind of analysis is important, since facts that we considered important didn’t receive the expected social network coverage (just an example: the corruption case in Campania that involved prominent local Partito Democratico politicians); people get “bored” of or uninterested in similar news stories (in seven days two different Forza Nuova and Casa Pound activists have been beaten in two different street-fights: the first one caused a peak in both news coverage and social network activity for the right coalition, the second one has been almost ignored by social networks).
… impartiality while running this kind of analysis is important, since facts that we considered important didn’t receive the expected social network coverage; people gets “bored” of or uninterested in similar news stories…
We also built some collision matrixes: one covering coalitions, the other one checking on politicians of different coalitions. The idea is to highlight if there are news stories that quote different entities and to highlight which couples are most frequent. Again, these indexes do not consider why these entities are mentioned together.
The last area of analysis is focused on media. We wanted to check if there is a relation between activity (the number of news stories published) and popularity (the number of shares on Social Networks). This chart is updated once a day, but it did not show huge changes during the entire campaign. The result is that there isn’t a direct correlation between the two indexes.
After the elections
A few months after the elections, we decided to change the front page of the website and replace it with a final analysis of the media coverage for the period between February 1st and June 6th, the day after the new Italian Government took office.
We created a bunch of visualisations to highlight what happened during the four-month period we monitored, including: media coverage for each coalition, variation in coverage before and after the elections (this visualisation, in particular, has been inspired by the fact that the results of the elections for the main coalitions where very close to the coverage they received from the media) .
We included some additional visualisations highlighting how much these election’s hype words have been used by the press (including bill proposals like the so-called Flat Tax, Reddito di Cittadinanza and other hot topics like Racism or the insurgence of a new Fascist Threat). We also kept an eye on the leaders of the different parties (some of them aggregated into coalitions) to check if the results of the elections changed the way the press covers news about politics: the results are not surprising, and you can see that the leaders of the two parties who formed the new government (Matteo Salvini, Lega, and Luigi Di Maio, Movimento 5 Stelle) received more coverage after the elections. Pictured here, some examples of the featured visualisations.
Technology
From an architectural point of view, we split the application in two parts. The backend is built on an Ubuntu 16.04 VM with PHP, on top of my PHP Boilerplate. Using its CLI module we run the bots using cron.
The scraper leverages the capabilities of Guzzle, an HTTP client for PHP. The AI is powered by WIT.ai, that we query using Thomas Gallice’s PHP Client. Social Network monitoring is performed using Abraham Williams’ TwitterOAuth class and the Facebook Graph SDK for PHP. All the data is saved into a mySql database, while the JSON responses to API queries are cached on disk.
The frontend is static HTML built using ReactJS and D3. Each change to the frontend is then automatically deployed to a custom Dokku instance (the archive) or to GitHub Pages (Faccia a Faccia) using GitLab’s C.I. integration: each push to the correct branch on the GitLab repo triggers the rebuild and deploy of the project.
Challenges
During the few weeks we spent designing and coding the platform, the major concern has been the AI services: WIT.ai is a great service and works like a charm, but unfortunately it has been built with the English language in mind and training it to properly understand Italian, Italian politicians and parties’ names, and especially understanding sentiment, is something that took a lot of time, and still needs some improvement. Another challenge has been finding the correct way to compute the popularity index: to create it we consider tweets, retweets, favourite tweets, and the number of followers of the user that performs these actions; on Facebook we check shares, comments, reactions. Giving the correct weight to each of these data required a lot of tests. Our final decision is based on the fact that in Italy Facebook is more popular than Twitter; we also tried to take into greater account actions that are actual shares rather than likes and favourites.
My name is Simone Lippolis, after spending almost ten years as Design Technologist at frog, I am now with Cisco, as a Data Visualization Expert. This article is part of my online portfolio that you can access at: simonelippolis.com.