Data & Analytics Framework (DAF) and open data: where we’re at
The state of the art of open data in Italy, moving towards a “National Digital Data Platform”
Questo articolo è disponibile anche in italiano
When it comes to open data, we’re not kidding: a nation’s information assets are a goldmine that, if treated with care and attention, according to a system of shared governance able to provide quality data, it can generate enormous value for citizens and businesses.
What do we mean by “open data”? According to the Open Knowledge Foundation:
“Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness). Open data and content can be freely used, modified, and shared by anyone for any purpose.”
Public Administrations generate much of a nation’s collective information assets; the law recognises the value in these assets and provides that they should be open and easily accessible to promote transparency, facilitate access to information, support policy makers in their decision processes and create business opportunities for enterprises and startups.
To this end, the Digital Administration Code (CAD) regulates “open data by default” (article 52, paragraph 2) and requires the sharing of data between public administrations (article 50). The Three Year Plan devotes an entire chapter to administrative data, in which a clear strategy for supporting the implementation of these CAD provisions is outlined in a series of precise actions.
In this article we want to share the cutting edge of open data for the National Digital Data Platform (Piattaforma Digitale Nazionale Dati – PDND) previously known as the Data Analytics Framework (DAF), since renamed in the 50-ter article of the CAD.
Since its inception, the PDND has given itself the following objectives:
- outline a clear national strategy, so as to avoid fragmentation of local initiatives that involve the management and publication of data;
- support the production and publication of open data.
Today, many initiatives either struggle to take off or don’t receive the right attention because they are difficult to access. In some cases, the project doesn’t meet the metadata standards that allow data to be more accessible to a large audience. In other cases, it’s because the datasets are not regularly updated.
The PDND platform is versatile and available in several different modes so as to better respond to a variety of needs. In particular, it is available as:
- an open data portal in SaaS (Software as a Service) mode;
- an open data portal in on-premise mode (CKAN via Docker).
PDND as an SaaS Portal
There are an estimated 14,000 public administrations in our country, of which just under 8,000 are represented by municipalities.
Although there are many central and local Public Agencies (PAs) already publishing open data (there are about 400 organizations on the national open data portal dati.gov.it), many of them do not have the skills and resources to use data to systematically promote transparency and active participation of citizens.
The PDND open data portal in SaaS mode (Software as a Service) can facilitate this task: it is immediately available for use by all PAs who request it and doesn’t require installation or management costs. After signing an agreement with the Digital Transformation Team, all the PA needs to do is define the roles within the platform.
Thanks to a streamlined configuration procedure, the PA can have at its disposal its very own data portal, with its own colors and logos, through which it can publish data sets and provide storytelling based on them. All data is guaranteed to conform to the national DCAT-AP IT profile, necessary for displaying data according to the specifications of the European Commission’s ISA program. The service also comes with a team of experts ready to offer support. Free from setup and infrastructure costs, participating PAs need only focus on what really matters: deciding which datasets to publish, performing the data loading procedures and keeping the data updated.
When PDND is used as an SaaS portal, all other tools offered by the platform also become available. For example, citizens, journalists, companies and the PA itself can search for data through the offered APIs and create Data Stories and Dashboards using tools already integrated with the platform.
We have already received many expressions of interest regarding these services and are currently testing them with some public administrations.
PDND as a CKAN portal with Docker
A catalog of public data collected from PA sites is also included within PDND. The catalog is managed through the CKAN platform, an open source data management software commonly used in Europe to search for data published by various sources. In fact, CKAN is the cataloging platform used by the European Data Portal and by similar sites operated by many Member States.
Two years ago we decided to extend the basic cataloging functions that CKAN offers in order to accommodate the ever-increasing number of Italian PAs that are starting to make public data available. We wanted CKAN to be able to allow more PAs, both central and local, into the system. We also wanted to provide a tool capable of speeding up the transition to European and Italian data cataloging standards (we are specifically referring to DCAT-AP and its Italian extension, DCAT-AP_IT).
The autonomous provinces of Trento and Bolzano had already started work on a similar project. In the spirit of maximum reuse that characterizes the Team’s vision, we decided to use their work as a starting point from which to further extend CKAN along three lines:
- Create a harvester, that is, a software component capable of automatically collecting metadata from an increased number of heterogeneous sources. Efficient, robust and fully compliant with all DCAT-AP_IT specifications, as published by AgID.
- Extend the CKAN web module, used for data uploads, to include all mandatory, recommended and optional metadata from the DCAT-AP_IT profile, while keeping in mind the controlled vocabularies required by European specifications and Italian licensing stipulations.
- Extend the filtering functions used for capturing certain key profile elements (theme, sub-theme, catalog of origin) to facilitate the search for data within the catalog.
These initiatives gave rise to a significant revision of the DCAT-AP_IT CKAN extension already created by the autonomous provinces of Trento and Bolzano. We decided to make this new version available in open source, thus giving substance to some of the action points laid out in the Three-Year Plan.
To accomplish this, we placed the CKAN platform, along with all its extensions, in a container docker, a sort of virtual container comprised of an isolated software unit capable of packaging the code and all its appendages in such a way as to make the platform more easily installable across a variety of operating environments. The resulting CKAN docker was then published on the Developers Italia platform.
The open data community and a few PAs showed immediate interest in this solution. To date, PAs have elected to use:
- just the extension, DCAT-AP_IT (as in the case of the City of Rome);
- the entire CKAN docker, to migrate data from previous platforms to open solutions more capable of interoperating at the central level (as in the case of Bologna) or to start new initiatives dedicated to the publication of open data at significantly reduced costs (like several towns in Puglia).
Footer: The Comprehensive Knowledge Archive Network (CKAN) is an open source system for storing, cataloging and distributing data (source: Wikipedia). It is used by many public administrations worldwide for the management of public data and is a fundamental component of PDND.
DAF and open data: Public Administrations share their experiences
City of Turin
In 2017, the City of Turin recognized the importance of data (and its management) as a primary resource in the information age by creating a specialized data team dedicated to rationalizing and identifying various platforms for use in the collection, presentation and analysis of data.
The collected data pertains mostly to PA services, but also includes citizen reports, image-derived data, and information from the analysis of social media and online articles. A number of initiatives have been launched, ranging from the analysis of heatmaps to define the route for metropolitan line 2, to analyzing the impact of events like “il Salone del Libro” on the city, to the CityMap project. The CityMap project supports citizens in making informed choices on where to live based on neighborhood characteristics and individual needs. It’s purpose is twofold: the Administration can use it to support better policies and regulation while citizens and businesses can use it as an “aggregable and viewable” open data tool. The goal is to create indicators through which a neighborhood and its various aspects (vitality, attractiveness, peculiarity, problems, etc.) may be evaluated.
Neighborhood indicators can also measure and report the impact value of policies and actions on specific themes, so that the city government can assess and modify its policies and actions based on changes generated on the indicators.
This project was carried out in collaboration with the Digital Transformation Team through experiments conducted on the DAF platform.
The added value offered by the DAF platform is its ability to provide an advanced repository of open data that can publish datasets in the Linked Open Data format and identify models, data formats, metadata, and classification according to national guidelines (DCAT-AP_IT) while also offering tools for analysis/data telling and data exposure via API.
DAF, in addition to providing dataset collection capabilities (specific to the web container of each dataset), also supports Data Telling tools for creating stories, widgets and dashboards from uploaded datasets. The Data Telling tools enhance the data itself while also highlighting the fundamental role that citizens have in bringing problems and incidents to light via data.
For example, one experiment involved the creation of widgets to augment a narrative on the San Salvario neighborhood based on datasets acquired from the Municipal Police Contact Center and local commercial activities. From an operational point of view, we were able to verify how to access and catalog datasets, create dashboards and data stories.
Paola Pisano, Anna Gillone - City of Turin
City of Florence
Our experience working with DAF and the Digital Transformation Team was certainly positive.
The local authority’s task on this front is, first and foremost, to make data public. This task may appear simple but it is actually very complex and must be framed within an “information supply chain” context, where only the very last step involves external publication. And let’s not forget that many restrictions (even recent ones based on the GDPR) require that any data available internally for carrying out office procedures, must be anonymized before becoming public.
It is our opinion that public bodies should use open data as a means for making government explicit. Some examples: if the Municipal Police institutes a local police station, distributing a map or a press release isn’t enough. These data must be available as open data. If the Department of Commerce promotes independent bookstores or shops that remain open in August, these must also be made available as databases, and therefore as open data. In general, any asset belonging to the city or neighborhood that has a digital counterpart must, wherever possible, be made available as open data.
Citizens, journalists, students, and businesses all draw from “data products” made available by the public institutions for their own purposes: knowledge, study, business. The public sector, however has still been unable to provide a tool for making data available to all. This is precisely what the Digital Transformation Team has achieved: the presentation of open data (collected from repositories belonging to the various entities) as something that potential users can exploit through the development of a system allowing data to be interpreted and used.
The DAF is the perfect solution for those who want to work on Italian open data and use them to develop products for analysis, for storytelling, apps etc. From the get-go, we found ourselves working very well with the Team, with both sides able to maintain their distinct roles.
On our side, we took the opportunity to use the DAF to execute two important activities that we are carrying forward as part of our promotion of data culture and open data. The first concerns our experience with high school students, who come to our offices during their transition from school to work: these students understand the importance of data, the so-called “oil of the new millenium.” They are learning about “lineage”, the variety of data that exists (including the importance of geographical data), and how to use them, in this case, by taking advantage of an important public platform like the DAF.
The second activity concerns the engagement of journalists, for whom we held a special event in Florence on September 24, 2018. Journalists should be aware that there is a lot of open data available to them and that there are tools (like the DAF) that would allow them to use such data in different storytelling contexts. With support from the Digital Transformation Team, we focused on a single case involving the use of online services among the citizens of Florence and turned it into a story using one of the apps.
What would we like to do? We want to publish our data, improve the supply chain and for the Team to turn the DAF into a system that can be customized to fit the needs of different potential users. Let’s try to treat the DAF like an ecommerce site, where the citizen, the journalist, the student, the programmer (in essence, the user) can find “data products” organized by subject or geographical area or according to the specifications in their profile, and begin to use them.
Emanuele Geri, Gianluca Vannuccini - City of Florence
We got to experience and test the DAF during an experiment on data warehousing aimed at furthering our understanding of the subject matter and available technologies by using a flexible and “hands on” approach. To achieve this goal, it was important for us to have access to an already configured environment and to SaaS, with a set of technologies already selected and integrated within it by the Digital Transformation Team.
Thanks to our collaboration with the Team, we were able to expand our skillset and take on a series of decisions regarding future projects. Large organizations usually have more than one data warehouse, which are, unfortunately, also built in silos (what a paradox!). What’s missing is an overall data strategy. Once fully operational, the DAF will significantly contribute towards achieving methodologies and ontologies that can be shared among the institutions; furthermore, it will provide a common technological base upon which to develop the many applications specific to each sector, whether for ingestion or visualization. In Umbria, we used “metabase” to create dashboards dedicated to the advancement of the Digital Agenda of Umbria (available here). Presently, we are working on developing other tools dedicated to internal users while continuing to experiment on DAF.
We hope that the DAF will be able to provide more opportunities for creating an intranet environment to be used by institutions. Only good data management can make data really useful and, as a consequence, facilitate the automated publication of quality open data.
Giovanni Gentili, Head of Architecture for ICT in Umbria