Italgas: from gas pipelines to data pipelines — Fueling our reporting with the latest innovations.

serena delli
4 min readJan 27, 2024

Italgas: an innovation leader

For over 180 years Italgas has been a leader in Italy in the distribution of natural gas. With over 80,000 kilometers of network, it is Italy’s top gas distributor and the third in Europe. In addition to gas, it also deals with water and operates in the energy efficiency sector.

Utilities is a data intensive sector for three main reasons: the core process of utilities consists in metering and billing the usage of gas (or water or electricity) for millions of customers, an intrinsically data rich activity, even when it was done entirely manually. Today the Italgas Nimbus smart meter remotely manages the metering of multiple types of gas for the same customer.

Gas distribution networks also require continuous monitoring for security and maintenance. This is why in the methanisation of the island of Sardinia, our group is planning to build over a thousand kilometers of “digital native” pipelines, with integrated sensors for remote control and ready for the installation of optic fiber.

Third, marketing is a major source of data volumes as utilities are stepping up the profiling of their customer database to boost their internet, mobile and call center channels. Italgas has built a strategic partnership with Salesforce which enables us to excel in digital marketing.

Italgas data and reporting architecture

Italgas is undergoing a digital transformation process that involves not only infrastructure, but also processes and people, this led to the foundation of Bludigit: the digital transformation arm, which has built the platform to handle this vast data estate. Italgas data is collected in a Lakehouse and made available to all the business, data analyst and IT experts without any silos, although in adherence to the strictest security, privacy and regulatory rules. The architecture is based on Azure Databricks to execute ETL (Extract-Transform-Load), analytics and machine learning workloads, and culminates in gold tables ready for business consumption. Just to give you some numbers, nowadays our platform consists of 1738 pipelines, 23 models, 155 dashboards, 1TB of data processed per day and more than 3000 active and independent users that analysis Italgas’ data.

But just like gas networks, data pipelines need to be maintained or they quickly become inefficient when data volumes, users and analysis grow exponentially. The reporting architecture is a good example.

In the old reporting architecture, gold tables were written both to the Lakehouse and to an Azure Synapse Dedicated SQL Pool. This cloud data warehouse was mainly used to maintain a copy of the data and feed Azure Analysis Service, attached to PowerBI reports, more rapidly.

This architecture satisfied the initial reporting requirements of providing a simple interface and fast queries for data users. Nevertheless, as data volumes kept increasing, we faced a first problem that forced us to scale Azure Analysis Service to maintain proper reporting performance, and this had generated a strong cost increase. At the same time, the internal customers required more data freshness than we could provide by copying data and executing maintenance activities like index building. Although the copying data step made possible to have a strong reporting system, made the whole process slower and more expensive.

Finally, we were shifting from a reporting-heavy platform to a more AI-focused one. A question arose: how could we leverage new technologies to solve these challenges?

The new reporting architecture

Exploring the latest advancements in data technologies, we experimented with two simplifications:

• Upgrade of PowerBI Premium gen2

• Shifted deployment target from Azure Analysis Services to PowerBI Premium.

Removal of the Synapse caching layer and serving the reports directly from Databricks

We assessed the new architecture through a proof of concept to set expectations and define targets for different report sizes (e.g. small, medium, big).

While PowerBI Premium has a higher upfront cost, it can be more cost-effective for larger organizations due to its ability to serve a larger number of users without the need for individual licenses. It also offers more storage and advanced features, which can reduce the need for additional tools or resources (e.g. Azure Analysis Service).

Regarding the second simplification, the new Databricks native connectors to PowerBI allowed a seamless integration of the reports directly with the Lakehouse. Databricks SQL Warehouses were about 50% faster thanks to optimizations like Photon Engine, optimized caching, query routing, etc.

In contrast to the fixed capacity of the previous model, Databricks SQL was able to accommodate our workload requirements in a dynamic fashion, scaling down during low usage (even to zero) and scaling horizontally in case of high demand.

Databricks SQL’s better performance, combined with its scalability, resulted in a workload cost decrease by 73% compared to Azure Synapse.

Removing Azure Dedicated SQL Pool also simplified the ETL processes, further reducing TCO and improving data freshness.

Databricks SQL Data Governance capabilities (Unity Catalog) allowed for further data democratization: unified permission management, out-of-the-box lineage and easy query inspection allowed analysts and business users to define and extract KPIs in a secure environment.

Next steps

We are going to integrate other Databricks SQL advanced features (e.g. extensibility through Python User Define Functions, Lakehouse Federation, Materialized Views, etc) to further empower the reporting capabilities and data usage of analysts.

In the longer run, we believe that Generative AI is going to provide companies like Italgas the opportunity to further democratize data because it will be easily accessible for a wide non-technical audience. In short, we will continue with the strategy that has served Italgas well for almost 200 years: innovation is a long-term game which has to be pursued consistently.

--

--