Understanding italian politics with Twitter and Python.

Simone Ricci
10 min readNov 29, 2019

--

Last updated November 24th, 2019

Even in Italy Twitter has been used lately to perform analysis of the political landscape, basically for two main reasons: it is widely used by all political parties and politicians and, more than other social networks, it grants an easy access to its API, making it easy to request and process large amounts of data.

During the last weeks, italian television has been showing many social media experts using advanced and expensive tools to perform various analysis.

As a developer, I think it should be important to develop basic analysis tools and share them, so that (almost) everybody could possibly perform his/her own analysis.

I begun writing my basic tools and this article starting from this assumption, and largely inspired by an interesting post by Marco Bonzanini (Mining Twitter Data with Python), and specifically analyzing tweets by 9 of the most important italian politicians, during a month (from 23rd of October to 24 of November, 2019).

I start my article by sharing the results and a short personal analyis, so that everybody can read what I have assumed from the data ingested by my little script. At the bottom of the page I share the source code, a short description of its logic and instructions for reuse.

Followers

A better view of the data is available here.

Looking at the number of followers, we notice that one politician actually owns almost half of all the followers collected by the whole set of 9 people analyzed by this prototype. These numbers are quite far from the actual composition of the Parliament, and even farther from the last polls (Matteo Renzi’s new party Italia Viva reached 5%), but they make sense if we look at the last 5–6 years of italian politics.

The two leaders of the italian way to the alt-right collect a bit more and a bit less than 1 million followers. They are both constantly gaining votes in the latest polls, and they have been both involved in a full time social campaign in the last years.

Prime minister Giuseppe Conte, whose political biography is less than two years old (he was an almost unknown lawyer in his previous life) has less followers than Pippo Civati, a left wing politician not related to any party at the moment.

The politician with less follower is two times former Prime Minister Silvio Berlusconi, a living picture of “another world”, incapable of relating to new media.

Followers’ growth from October 23rd to November 24th, 2019.

A better view of the data is available here.

Looking at the followers’ rates growth helps us to better understand the trends of italian politics in the last month (and by the way looking at the absolute numbers, they show strong similarities with polls). Since October 23rd, the politicians who have acquired more followers in percentage terms are Silvio Berlusconi (+ 4.3%), Carlo Calenda (+ 2.8%) and Giuseppe Conte (+ 2.7%).

It must be said that the campaigns for acquiring new followers have costs that grow in an almost proportional manner, so it is more expensive and complicated to grow significantly for those who already have a significant number of followers.

Note that in absolute terms, far right leaders are growing the most: Matteo Salvini (+10.518) and especially Giorgia Meloni (+13.108), the target of a successful viral campaign in the first days of November. The minimum growth of Matteo Renzi (0.04%), which seems to have divested on Twitter, is also noteworhy.

Total number of tweets

A better view of the data is available here.

The first finding, which for some might be surprising, is that there is someone who tweets more than Salvini. The second finding is that the reason is probably to be found in the age of the Twitter account.

Age of the account (in days)

A better view of the data is available here.

Average tweets per day

A better view of the data is available here.

Examining the average tweets per day, the former internal affairs minister rises to the top of the rankings with the remarkable figure of almost 12 tweets a day, scientifically paced during the day.

Apart from Civati, which seems to use Twitter with the casualness of the early adopter and in part by untying itself from the logic of the electoral campaign (as we will see later analyzing what politicians do tweet), the bulk of the others is between 4 and 6 tweets per day , although it would be useful to deeper analyze the distribution of Giorgia Meloni’s tweets, to understand if there has been a substantial increase in the use of the medium in recent times.

The exceptions are repesented by Renzi, Di Maio and Conte: the last two in particular seem to use the medium as strictly necessary, for few institutional or party communications.

Friends

A better view of the data is available here.

A friend on Twitter is a user you decide to follow and receive updates on your timeline. To have so many friends on Twitter, therefore, are normal people, those who use the platform to inform themselves or interact with other users, activists, and so on. Bots, VIPs and politicians tend to have few or none. In general having many friends can be a signal that you are using the platform as a place for discussion, while having few can mean that you are using it as a megaphone, or a unidirectional tool.

As a reference, Donald Trump has 66 million followers and 47 friends (ratio 0.000002% friends / followers). Barack Obama has 111 million followers and 610 thousand friends (ratio 0.5% friends / followers).

The Italian politician who follows more users on Twitter among those examined is Civati ​​(1.7% ratio friends / followers), followed in percentage by Zingaretti (0.4%) and Calenda (0.4%). The others are closer to “trumpian” percentages.

A more advanced model of the current prototype could check for politicians’ answers to other users’ tweets, to understand if and how they interact with their own followers, and to give us some further information about how they use the platform.

Most mentioned users (last 365 days)

A better view of the data is available here.

Examining which users are the most cited by politicians in their tweets can help us understand how they relate to their social networks.

Conte: the most mentioned users are the Italian Government official account and himself. The only national politician whom he mentions (a few times, only 6) is Di Maio. He is among the few to mention some international politician in his top 10: Von Der Leyden and Trump.

Di Maio: although he is foreign minister, he has no foreign politician among his most mentioned users, and in general few quotations for all. 5-star Movement come first, then himself, some party colleagues and institutional interlocutors.

Zingaretti: after the inevitable frequent mentions for his own party and for himself, other mentions stand out for appearances in TV broadcasts (On Tuesday, Half an hour more) and for the Lazio region that he still runs.

Renzi: In addition to himself and his own party, the first mentions are for his new party colleagues only, in proportion to their media exposure (Terranova and Marattin above all).

Meloni: first two places for her own party and herself. Note that in the other positions not occupied by party colleagues, there are European politicians (R.Ceca, Spain) of the far right area.

Salvini: unlike other politicians with a party behind them, he does not mention himself or his party. The mentions are all focused on TV broadcasts (radio in one case) in which it participates. Strategy designed to create an advertising effect between network and traditional media, which other politicians rely on much less.

Calenda: in addition to the usual self-referential quotations that he shares with his colleagues, he seems to be the one less afraid of mentioning other important italian politicians (Di Maio, Zingaretti, Renzi).

Berlusconi: in addition to the constant mentions for the reference party, at the moment it seems to use a mix of TV broadcasts and party colleagues, in very limited quantities

Civati: his publishing house is the most mentioned, followed by the movement he founded. In the midst of quotes from colleagues in the area, it is interesting to note that a satirical / promotional account linked to his name is very often mentioned.

Most used hastags (last 365 days)

A better view of the data is available here.

Although we should analyze all the words used to get a complete picture of the contents of the tweets , we can build a rough idea of ​​what italian politicians tweet, by monitoring only the set of hashtags used in the last 365 days.
I share my brief and questionable considerations below, for each politician.

Conte: very few hashtags are used many times, almost all to report institutional events or issues related to government policies. There are two more political tags however, shared with the 5 Star Movement.

Di Maio: here very few hashtags are reused, almost all linked to keywords associated with influence campaigns carried out by their own party.

Zingaretti: in the midst of party hashtags and TV broadcasts, his main rival Salvini appears, tagged 16 times in a year.

Renzi: the first hashtag had been used systematically at the beginning of the year and then abandoned (quite revealing of the actual status of his social network usage). Afterwards, as for his former colleague in the PD, the hashtags are used primarily to promote party activities, but in some cases they quote political opponents (Salvini, Di Maio).

Meloni: among the most used hashtags, we find all the alt right influence campaigns, the italian way: Bibbiano, the Sea Watch, the Global Compact and the campaign to go back to vote. No tags for opponents or colleagues in the area.

Salvini: the use of hashtags, unlike user citations, is totally focused on the keywords and initiatives promoted by his party, with a couple of mentions of popular TV broadcasts. No tags for opponents or colleagues.

Calenda: as a former labor minister, he carries out tags related to disputes such as Illva, Alitalia, Whirlpool, supporting them with quotes from nearby and distant politicians (Renzi and Salvini).

Berlusconi: no keywords, very basic use of names of regions involved in the vote, coalition colleagues and party names.

Civati: the first two most used tags are those dedicated to the cooperator seized in Kenya, Silvia Romano and one linked to climate change. The only political mentioned by tag, among the top 10 most frequent, is, not surprisingly, Salvini.

Final thoughts

I don’t think I’m able to draw conclusions or make predictions about the evolution of Italian politics on Twitter, mainly because it’s not my job. But I believe that if I had a little more time available it would be useful to extend the current data collection code, analyzing not only the hashtags but also the occurrences of the words, and not limiting the collection of data to only 10 most used terms, for example.

The analysis of the followers of the various politicians could then open up to new findings, similar to those proposed by “Report” tv show in the episode of a few weeks ago.

Obviously it would be useful to extend the collection of data also to other political subjects.

This first approach, however, wants to be more a stimulus for other developers to form their own free and extensible analysis model, not relying only on “ready made” commercial tools.

The code

The data import script, written in Python 3, is available on Github at this address, along with a dump of the database structure, consisting of a single simple table.

As mentioned it is largely based on the code provided by Marco Bonzanini at this address, and uses Tweepy, a library that allows very simple access to the Twitter API.

First thing to do is to start by importing the Tweepy library and the library to connect to a MySql database.

Now we can enter the authentication data needed to access Twitter API. If you do not already have credentials, you can request them at this address.

At this point we can create a list, containing all the Twitter users we want to monitor.

For each user in the list, we query the Twitter API to return a set of user data.

To monitor the most mentioned users and the most used hashtags, let’s create two lists and use tweepy.Cursor to analyze all the tweets going back to the end_date, defined in our case as 365 days ago. We save the result in two variables (string type), and save them in two text fields on the database, in which each new line (\n) represents a couple (mentioned user / number of mentions or hashtag used / number of citations), separated by a tab (\ t).

Data is collected and saved in a Mysql database. By launching the script once a day, it is possible to have a monitoring history divided by day, using the creation_day field of the scrapes table as distinction.

To run the script I use a simple basic account on PythonAnywhere, which provides a free environment where you can run a script with a daily cronjob and link it to a database.
To retrieve data from the database and display graphic bars I used Php, Html, Css and Javascript, but this can be done better and with any language you know, so I avoid sharing this residual portion of the project.

To contact me: lamorbidamacchina.com

Originally published at https://www.lamorbidamacchina.com.

--

--

Simone Ricci

I work as a full stack web developer in the company that I co-founded. I live in Turin, Italy. I love bikes, Sardinia, cameras, mountains, free software.