Elezioni.io Is a News Aggregator and Analysis Tool About the Italian General Elections Held on March 4th, 2018

Simone Lippolis

Follow

Published in

Simone Lippolis

8 min readMay 28, 2018

--

This article has been updated in August 2018.

Italian General Elections are a mess, they always have been, but in 2018 the chaos reached its peak. The disgregation of traditional parties that happened in the mid-90s (Second Republic) was followed by 20 years of center-right (read: Berlusconi) versus center-left (Partito Democratico) dualism. But recently Italy has been experiencing, just like most western countries, a rise of populism which led to the beginning of the so-called third republic. The political scenario is so confused, that the entire campaign has been built on fear and conflict, instead of positive proposals and pro-activism. This is the reason why Carlo Zapponi and I tried to build something to clear our ideas and, maybe, to help others vote in an informed way, without forgetting that Italy obtained a horrific 52nd place in RSF’s World Press Freedom index.

The concept

A lot of other websites and services already analyse voter’s sentiments, social networks, and politicians’ popularity. What we wanted to check was how the press was reacting to events related to the election and at the same time give voters an instrument to better understand the facts. These are the reasons why we started collecting news coverage about the elections, and tried to find a way to analyse them. How are different sources covering the same events? What is the sentiment of the press about politicians’ actions? How strong are the reactions from the readers?

What’s Elezioni.io?

We see Elezioni.io more like a platform than a website. We started our analysis on the Italian General Election of March 2018, but we are ready to start from scratch for the next local and European elections.

The website is basically built on two main areas: the archive, where a reader can find almost all the articles that have been published online about politicians or parties (elezioni.io), and a real-time data-visualization that allows the reader to access the same news, but with a different perspective (facciafaccia.elezioni.io). Each news is categorised based on the coalition it talks about (multiple categories are allowed if an article talks about, for instance, a street fight between communists and fascists): tags are added and the sentiment is extracted by using an NLP engine; the popularity of the article is then monitored for the five days following the publication, to check how the voters’ perspective changes and the Social Network longevity of each piece of news.

This huge collection of information is browse-able in two different ways: in the archives you can find all the articles, cross-referenced and tagged to allow you to easily aggregate news about the same story; in the faccia a faccia section, on the opposite, you can see the same news with a focus on their popularity and sentiment.

How does it work?

The data is collected and analysed by a set of web-bots and scrapers. The collection and validation of the data, its analysis, and Social Network monitoring are all asynchronous tasks: while one bot searches for new articles and validates them, another one analyses the queue of already-validated news, and a third one, for each valid article, monitors Facebook and Twitter at regular intervals to check their popularity. Other bots are used to clean and normalise the data to allow some real-time analysis. The archive and the faccia a faccia visualisation are fed through a REST API.

Analysis

We also run some analysis on top of the data we collected. To avoid the risk of influencing the data, we only run quantitative analysis and avoid any qualitative one.

… the data we collected was perfectly aligned with the final result when we talk about the winners, while it failed in predicting the fall of Partito Democratico…

Our projections before the elections based on press coverage (activity, first) and social network activity (popularity, second). As you can see, there is a huge scary black sector representing neo-fascist parties that in real election obtained around 0.9% of votes.

There are different areas: in the one that we informally call fantapolitica (or fantasy-politics), we tried to predict the results of the election basing our projection on two different indexes: the number of news story about each coalition, and the number of shares of articles about each coalition. We stopped the generation of this data on March 4th, the day of the elections, just to highlight the difference between the “perceived” weight of each coalition and its real electoral weight. The result is stunning: the data we collected was perfectly aligned with the final result when we talk about the winners (Movimento 5 Stelle and Lega — the latter part of the center-right coalition), while it failed in predicting the fall of Partito Democratico (center-left coalition). Also, the two extremes (left and right) got results that are absolutely misaligned with our projection (Liberi e Uguali -left- obtained 3.3%, our projection was around 20%; Casa Pound Italia and Forza Nuova -both extreme right parties- obtained 0.95% compared to our projection around 25%).

Other analysis were run against single “milestones”: we wanted to check how a particular news story influenced both the news coverage and social network activity related to the coalition involved. What we obtained highlights a few facts: impartiality while running these kind of analysis is important, since facts that we considered important didn’t receive the expected social network coverage (just an example: the corruption case in Campania that involved prominent local Partito Democratico politicians); people get “bored” of or uninterested in similar news stories (in seven days two different Forza Nuova and Casa Pound activists have been beaten in two different street-fights: the first one caused a peak in both news coverage and social network activity for the right coalition, the second one has been almost ignored by social networks).

… impartiality while running this kind of analysis is important, since facts that we considered important didn’t receive the expected social network coverage; people gets “bored” of or uninterested in similar news stories…

The timeline of the coverage (number of articles, gray) and reactions (or “popularity”, coloured) for four different coalitions related to specific milestones. From top to bottom: a) 5 Stars Movement’s leader announces the composition of his government (before the elections); b) activist of the far-left party “Potere al Popolo” gets stabbed in Perugia; c) activist of the far-right party “Forza Nuova” assaulted in Palermo; d) the leader of the right-wing party “Fratelli d’Italia” organises a rally in Rome, but her allies don’t show up.

We also built some collision matrixes: one covering coalitions, the other one checking on politicians of different coalitions. The idea is to highlight if there are news stories that quote different entities and to highlight which couples are most frequent. Again, these indexes do not consider why these entities are mentioned together.

The last area of analysis is focused on media. We wanted to check if there is a relation between activity (the number of news stories published) and popularity (the number of shares on Social Networks). This chart is updated once a day, but it did not show huge changes during the entire campaign. The result is that there isn’t a direct correlation between the two indexes.

“ilGiornale.it” is the most active website, but scores very low on popularity. The opposite of “Sky.it”.

After the elections

A few months after the elections, we decided to change the front page of the website and replace it with a final analysis of the media coverage for the period between February 1st and June 6th, the day after the new Italian Government took office.

Press coverage for each coalition, before and after the elections, in absolute values and in percentage

We created a bunch of visualisations to highlight what happened during the four-month period we monitored, including: media coverage for each coalition, variation in coverage before and after the elections (this visualisation, in particular, has been inspired by the fact that the results of the elections for the main coalitions where very close to the coverage they received from the media) .

We included some additional visualisations highlighting how much these election’s hype words have been used by the press (including bill proposals like the so-called Flat Tax, Reddito di Cittadinanza and other hot topics like Racism or the insurgence of a new Fascist Threat). We also kept an eye on the leaders of the different parties (some of them aggregated into coalitions) to check if the results of the elections changed the way the press covers news about politics: the results are not surprising, and you can see that the leaders of the two parties who formed the new government (Matteo Salvini, Lega, and Luigi Di Maio, Movimento 5 Stelle) received more coverage after the elections. Pictured here, some examples of the featured visualisations.

Press coverage variation for each coalition, percentage value before and after the elections, compared to their electoral results.

Technology

From an architectural point of view, we split the application in two parts. The backend is built on an Ubuntu 16.04 VM with PHP, on top of my PHP Boilerplate. Using its CLI module we run the bots using cron.

The scraper leverages the capabilities of Guzzle, an HTTP client for PHP. The AI is powered by WIT.ai, that we query using Thomas Gallice’s PHP Client. Social Network monitoring is performed using Abraham Williams’ TwitterOAuth class and the Facebook Graph SDK for PHP. All the data is saved into a mySql database, while the JSON responses to API queries are cached on disk.

The frontend is static HTML built using ReactJS and D3. Each change to the frontend is then automatically deployed to a custom Dokku instance (the archive) or to GitHub Pages (Faccia a Faccia) using GitLab’s C.I. integration: each push to the correct branch on the GitLab repo triggers the rebuild and deploy of the project.

Challenges

During the few weeks we spent designing and coding the platform, the major concern has been the AI services: WIT.ai is a great service and works like a charm, but unfortunately it has been built with the English language in mind and training it to properly understand Italian, Italian politicians and parties’ names, and especially understanding sentiment, is something that took a lot of time, and still needs some improvement. Another challenge has been finding the correct way to compute the popularity index: to create it we consider tweets, retweets, favourite tweets, and the number of followers of the user that performs these actions; on Facebook we check shares, comments, reactions. Giving the correct weight to each of these data required a lot of tests. Our final decision is based on the fact that in Italy Facebook is more popular than Twitter; we also tried to take into greater account actions that are actual shares rather than likes and favourites.

My name is Simone Lippolis, after spending almost ten years as Design Technologist at frog, I am now with Cisco, as a Data Visualization Expert. This article is part of my online portfolio that you can access at: simonelippolis.com.