How to empower data-driven culture on construction (1/4)

5 min readJun 10, 2020

Most of us have come across the McKinsey study that states: While the total economy productivity increased by almost 25% since the ’90s, the AEC industry is struggling to reach double digits. Many engineers and architects believe BIM as a methodology will bridge this gap since it has successfully inspired the data-asset feature in the design phase and is on the path to do the same on the whole building life cycle.

The path to data-driven culture in construction is being paved by closed BIM software, it was no different when the design phase was under the spotlight of implementation. The key to standout at this moment is to learn how to better use the tools to balance control and creativity without handing this decision to a third company, the software developer.

This is the first of a series of articles that aims to propose a debate around the question: how to empower data-driven culture on construction to optimize cost estimation and schedules. To answer it the topic will be divided into four articles with the following objectives:

Investigate data storage architecture and the benefits of applying historical data in analyses;
Construction Progress Monitoring activity digitalization and how to handle interoperability between BIM, cost, schedule, and resources;
Schedule optimization through probability;
Data visualization, from the board meeting to the subcontractor’s manager cell phone.

The present article focus on #1. There is no need to know how to code to enjoy this article, but if this is your first time reading about it, take it as an invitation to dig further. It will be helpful (not only for this reading) if you know at least enough to talk about it.

To talk about data architecture three concepts must be clear: Relational Database Management System, Data Warehouse, and Data Lake. If you are not versed in these terms, bear with me for the next paragraphs, it is never too late to overcome the emotional block to computer science.

Relational Database Management System (RDMS): this type of database stores and provides access to data points that are related to one another by keys. An RDMS must be managed in a secure, rule-based, consistent way. As the complexity increases - the number of interconnected tables - they tend to be a poor solution for analyzing historical data. This means its performance is compromised when dealing with large queries on complex databases, the friction created by its structure will slow the process.
Data Warehouse: it stores structured, curated data pulled from separate intermediary databases. That makes it easy for analysts from different parts of an organization to access and analyze for their respective purposes. Data warehouses retrieve and store data to have all the relevant information at the same data frame, optimizing storage, and analyses time performance (click here for application examples, notice that construction is not listed there, yet).
Data Lake: it also pulls in data from multiple sources, but generally contains a much wider array of data types, including unstructured data and data from sources outside the organization. Whereas data warehouses are designed to be a central data repository for known and specific purposes, data lakes are intended to contain data that might not be useful at the moment but could play into some potential future analysis: the more, the better. They are generally most useful for more exploratory analyses undertaken by data scientists and researchers (e. g. The filme “Moneyball” explores the revolutionary use of data lakes in assembling NFL teams).

The main difference between a data warehouse vs. data lake vs. RDMS is: the first is built to hold structured data from multiple sources, while the last is used to store and organize structured data from a single source, such as a transactional system. Data lakes differ from both in that they store unstructured, semi-structured, and structured data. To go deeper into this matter, click here.

The figure bellow illustrates the proposed information flow.

The use of RDMS is suitable since the four dimensions are already digital in most construction companies, so harvesting it from the original format can be automated and one can create the link between dimensions with foreign keys. The Data Warehouse is proposed as a solution because of its optimum performance for accessing data. Another advantage of this information flow is the division of information into the back room and the front room, given that the training to reach a certain level of expertise demanded to analyze the front room is basic for engineers and interns. Also, because it provides structured historical data.

There are many possible approaches to the use of historical data in construction, presented below are the most advantageous features, from my experience.

Generation of key benchmark metrics:

One of the primary benefits of a construction historical database system is to provide benchmarking data from similar buildings and even from the current construction. Having at hand the actual construction ratios queried and filtered in different ways allows to estimate better the cost and duration of each phase of the project. These pieces of information can assist decision making in many ways, for instance, it can be used for appointing suppliers, subcontractors, and design solutions.

Forecasting critical path problems:

Implementing a routine that results in performance early warnings to the construction manager. It is possible to predict delays by keeping track of each crew’s productivity rate and checking if the actual performance is meeting the predicted in activities related to the critical path. This provides the data the construction manager requires to adapt on the fly and can further develop to machine learning and AI.

Performance and productivity management:

Managing and improving performance and productivity effectively promotes the HR department from being a support function to a strategic partner of a company. It can generate outcomes such as rewards, promotions, succession planning, and contracting models. Although the implementation of such policies in the construction sector is challenging, having a single source of truth for these measures provides the means to do so. As stated earlier, the culture of resistance to changes is at the top of the reasons for failure in the age of digital construction and one of many uses of rewarding performance is behavior change.

This feature encourages digital implementation because it provides short term outcomes: it measures its own results and creates awareness across business sectors, while providing strategic-level information for the executive board, such as the data for elaborating an Integrated Project Delivery contract based on performance.

The proposed data management routine is guided by the better use of the information most companies already have and aims to reach all stakeholders with data visualization tools. It makes the most out of the historical data while empowers the data-driven culture of the company by allowing the manager to use cross-sector information, which increases the digital asset value simply by making it available.

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

Sir Arthur Conan Doyle

How to empower data-driven culture on construction (1/4)

Generation of key benchmark metrics:

Forecasting critical path problems:

Performance and productivity management:

Written by João B Vieira