DATA STORIES | ARCHITECTURE & ENGINEERING | KNIME ANALYTICS PLATFORM

Cognitive Digital Twin: Advanced Building and Energy Management in a Multitenant Building

Create powerful low-code AI solutions with KNIME to support smart decision-making

Pierpaolo Vergati
Low Code for Data Science

--

The implementation of a “smart” building as well as its management is becoming a complex activity that requires the integration of increasingly specialized disciplines as the integrated technological components become numerous and diverse.

Modern buildings include a wide range of equipment, from HVAC systems to plumbing, lighting, and security, that all contribute to the buildings’ operation. These elements are increasingly accompanied by some degree of automation so as to provide a modern, responsive experience for the occupants.

The integration of building management systems coupled with the analysis of the data they generate can allow building managers to gain visibility into complex networks of equipment, sensors, and devices.

Through a thorough process that consists of:

  1. Data collection and processing,
  2. Identification of key parameters and behaviors,
  3. Forecast generation,
  4. Alert automation,

a building manager can “see” what is happening inside a building at any given time and understand the relationships among multiple systems and variables both environmental and generated by user behavior.

Such an evolved approach cannot go without adopting a new concept of Operations that makes extensive use of methodologies such as Big Data, Data Analytics, and Machine Learning.

If skillfully set up, such an approach would give building managers access to a new set of information that would allow them to define more accurate parameters for possible alerts, simplify maintenance, and implement effective automation strategies that: (i) optimize energy efficiency, (ii) simultaneously reduce operating costs, (iii) extend equipment life, and (iv) improve the occupant’s experience. In practice, we could aspire to the realization of a Cognitive Building.

Figure 1. Representation of the evolution of buildings and their integration with management systems.

Cognitive Digital Twin

The first definition of the Digital Twin was coined in 2003 by Michael Grieves; over time it has been refined and detailed. The following definition from 2017, also by Michael Grieves, will be adopted in this work:

The digital twin is a set of virtual information constructs that completely describe a potential or actual physical product from the atomic level to the macroscopic level. At its optimal level, any information that could be obtained from inspection of a physical product can be obtained from its digital twin.

Figure 2. Representative image of the application potential of a Digital Twin.

The DT concept emphasizes the following key elements:

  • Represents an actual (or potential) physical object;
  • It is not just a digital model in that it maintains relationships and interactions with the physical object;
  • You can explore it as if it were a real object;
  • You can link it to relevant data and time series to ensure a closer fit with reality;
  • Simulates patterns and behaviors with varying levels of reliability.

The digital twin therefore aims to increase operational efficiency, resource optimization, resource management, cost savings, productivity and safety.

The evaluation of a Digital Twin is based on 5 levels of complexity starting with a simple digital model (level 1). As the model evolves, feedback and the ability to extract reliable predictions increase in importance. At higher levels of complexity (levels 4 or 5), machine learning capability, generalization possibility, and scalability potential come into play.

The metrics on which the Digital Twin is evaluated are:

  • Autonomy, ability of a system to act without human input.
  • Intelligence, the ability to replicate human cognitive processes and perform more or less complex tasks.
  • Learning, the ability of a twin to automatically learn from data to improve performance without being explicitly programmed to do so.
  • Fidelity, the level of detail in a system, the degree to which measurements, calculations, or specifications approach the true value or desired standard.

Some technologies better than others have proven to be enablers for the full development of a Digital Twin:

  1. Computing capacity capable of processing a significant amount of data;
  2. Sensors capable of building automation and, in a broader sense, increasingly smart buildings;
  3. Data Visualization to enable their best possible use.

Data Analytics And Data Visualization

The term data analytics is used to refer to the methods and techniques used to extract information from data, that is, it represents how raw data can be transformed into something useful (from data to information) and support decision making.

Figure 3. Types of Data Analytics.

There are three different types of data analytics, each with its own set of possible applications:

  • Descriptive analytics: is the starting point of an analytical process. These methodologies focus on describing the historical data collected to make it readable. They answer the generic question “what happened?”. Associated with descriptive analytics is a data visualization model that is either static (Pdf, Excel, PowerPoint, etc.) or semi-interactive (e.g., a dashboard with a web interface);
  • Predictive analytics: focus on answering questions such as, “why did this happen?”, “what will happen?”. These methodologies are based on more sophisticated analytical and probabilistic techniques (based on artificial intelligence) through which it is possible to recognize and describe patterns of data behavior, allowing us to predict possible future scenarios;
  • Prescriptive analytics: indicates what actions need to be taken to achieve a given goal, thus answering the question, “what to do?”. Typical tools of this type of analysis could simulate numerous alternative scenarios and screen them to return the best combination that maximizes profits and reduces costs.

Closely related to the concept of data analytics on display above is a nod to the concept of data visualization that was applied in the implemented Digital Twin prototype.

Figure 4. Relationship between the level of detail of a piece of information and the size of the information’s target audience and effectiveness of data visualization.

The essential goal of data visualization is to transfer a message from one “emitter” subject to a “receiver.” To be effective in this communication process the receiver must understand the meaning of the message that the initial subject wanted to communicate.

The effectiveness of the communication process requires carefully balancing the level of detail one wishes to impart to the information. As illustrated in the images, there is an inverse relationship between the level of detail and audience size.

Data Workflow

The pivotal technology for data processing was KNIME’s desktop application.

Figure 5. Dataflow of the Digital Twin prototype.

KNIME was chosen because of the power and versatility it offers and, above all, because of its no-code/low-code paradigm that allows effective data processing to be conducted quickly and conveniently output to external business intelligence tools (e.g., Microsoft PowerBI). The use of KNIME has greatly facilitated and enhanced my analysis, sparing me a lot of frustration caused by the syntax of traditional programming language (e.g., Python).

Besides KNIME Analytics Platform, an additional data visualization tool (the aforementioned Microsoft PowerBI) was used to effectively represent the information inferred from the data with appropriate dashboards.

Integrated along with this tool is a particularly high-performance application for the representation of BIM models: VCad (from the company BLogic). Through the use of this application, it is possible to explore an information model (generated with BIM Authoring SW) even for users not accustomed to the use of dedicated SW.

The integration on the PowerBI platform allows data from even external sources to be “hooked up” to individual three-dimensional model components for more effective use and exploration.

In addition, through the publication of the report to Microsoft®’s WebAPP PowerBI Service, automatic email alerting services have been set up that are useful for sending appropriate communications according to certain KPIs. We can think of this functionality as an initial exploration of the prescriptive analytics capabilities.

Prototype Digital Twin

Space Management

Data Exploration

The data I relied on is related to booking at desks and meeting rooms. From the first exploration of the data, I found that the booking data is not always the same as the detection data at the entrance turnstiles. This data reconciliation issue should be verified with the time-stamping data at the turnstiles, but it cannot be solved at the moment for privacy reasons. However, the problem could be circumvented through an edge computing service that can bring in the actual attendance data while eliminating the privacy data.

Figure 6. Trend of desk booking for 2021 and 2022.

KNIME Workflow and main nodes

The external service for desk booking provides csv files containing the following data (sensitive information have already been removed):

  • Start date of the reservation (string);
  • End date of the reservation (string);
  • Desk coding (string);
  • Desk type (string);
  • Name of the user (information removed).
Figure 7. Desk booking data after the upload in KNIME.

As can be seen, the available data required a phase of transformation and cleaning (e.g., string to date).

Figure 8. Typical KNIME workflow (KNIME Press source).
Figure 9. Workflow in KNIME for space management.

In addition, other data fields have been identified and extracted in order to have useful elements for subsequent stages of predictive analytics.

The transformations applied have made it possible to have the following additional fields available:

  • Tenant ID;
  • Floor;
  • Season;
  • Day of the week;
  • Holyday, working day, long weekend.
Figure 10. Rule Engine node applied to identify season.

After all data processing phases, the final output has been sent to PowerBI through connector nodes.

Figure 11. Connection nodes between KNIME and PowerBI.

Dashboard Insights

By processing the booking data through KNIME Analytics Platform combined with the visualization capabilities offered by VCad tool (BLogic), it was possible to explore the information through dynamic dashboards that could be queried based on various criteria (day of the week, floor, tenant) so as to create a comprehensive descriptive analytics framework not only for workstations but also for meeting rooms.

Figure 12. Space management data exploration dashboards — desk booking.
Figure 13. Space management data exploration dashboard — meeting room booking.

Given the available data, I was able to estimate the possible occupancy rate in the following period (30days).

To set up the machine learning algorithm, I identified the dummy variables that seemed most appropriate (season, weekday or holiday, long weekends) and then calibrated an appropriate algorithm. Unfortunately, the prediction accuracy (%) was not high due to the relatively short period for which data was available (2 years). To increase reliability, I created a range by adding the square root of the mean square error to the accuracy value. According to the statistical theory of Gauss distribution, with this expedient about 70% accuracy could be achieved.

Figure 14. Dashboard space management — Predictive 30days.

Energy Management

Data Exploration

The available data allowed me to conduct a very deep descriptive analysis (about 2.2 Mln of record). A critical element was identified in the way the data was collected.

The data resides on proprietary systems of the BMS manufacturer and can only be explored on HW attested to a corporate VPN.

The main issues I had to deal with were:

  • High numbers of data sources from the BMS (about 126 csv files);
  • Different Time stamps from data sources (30’ or 1h for energy data, 5’ for weather data);
  • Progressive monitoring of electrical values (since beginning);

KNIME Workflow and main nodes

In loading the data and in the construction of the workflow, I tried to implement a structure that recalled the shape of the building so as to facilitate the reading and control of the connections between the nodes.

Figure 15. Workflow in KNIME for energy management.

In order to compare electrical and weather data, I decided to synchronize the timestamps through the use of the Aggregation Granularity verified component (see figure below).

Figure 16. Aggregation Granularity component for calibration.

Energy Management Dashboard Insights

Although the number of files handled was considerably large, the quality of the available data was sufficient to allow the investigation of the consumption trends of each system component for each individual floor.

Figure 17. Energy management data exploration dashboard.

The analysis of the data allowed me to:

  • Identify the programming logic of the system. The operational logic seems to be decoupled from the number of attendances within the floor. If the building is active on a working day, the building’s energy consumption is not impacted by the number of staff present.
  • Verify that the plant’s programming does not follow outdoor temperature trends. Much more realistically, the plant seems to follow an hourly-seasonal programming logic.

Given the availability of data, a “what-if” scenario was set up. In this scenario, we assumed a headquarter assigned to a single tenant instead of multiple tenants, and imagined applying a business procedure whereby if a minimum percentage of the total reservation availability was not met on a given floor, then all the reservations were automatically transferred to a different floor. The minimum requirement was set at 20%.

Current reservation data tell us that during the past year, due to the pandemic, only 83 out of 250 working days met the minimum 20% reservation number. If this rule had been applied, 2500kWh could have been saved per floor for the lighting system and as much for other consumption components. The information of possible savings was also declined in units of CO2 and Oil Tons Equivalent.

Operational Maintenance

Data Exploration

As can be seen from the following image, the quality of the data available for maintenance activities is quite poor. The available data are incomplete, with numerous null values ​​and, after an initial analysis, the following shortcomings were found:

  • Absence on the day of the maintenance activity;
  • Lack of identification of the object;
  • Lack of anomaly/activity found/resolved.

Following this lack of data, the initial goal of implementing predictive maintenance algorithm was reduced to monitoring operational maintenance.

Figure 18. Maintenance available data.

KNIME Workflow and main nodes

Figure 19. Workflow in KNIME for Operational maintenance.

One of the first issues I had to overcome in the maintenance data structure was how new records were added by the facility company’s SW. The new records, instead of being added on successive rows, were inserted as new columns (with a lot of null values), making it particularly difficult to explore the data.

To solve this problem, I cleaned up data and implemented recursive loops that copied the values ​​from the columns to the rows according to the periodicity of the scheduled maintenance activity.

Figure 20. Setting up of loop in KNIME.

Operational Maintenance Dashboard Insights

The maintenance dashboard was limited to a lite analysis of operational maintenance. In the graph, a decomposition tree can be seen with identification of the percentages of interventions broken down by category and subcategory.

Figure 21. Operational maintenance data exploration dashboard.

Workflow Evolution

In the conclusion of this analysis of the implemented Digital Twin prototype, I would like to pause to comment on some of the key aspects that, in my view, would enable a fully mature Digital Twin level to be achieved.

The key aspects identified can be divided into two types:

  • Type 1, characterized by incomplete (more or less partial) data, which would allow more effective data analytics and, consequently, significantly higher quality of results. Part of this category, in addition to the limitations dictated by the quality/quantity of the data, are the limitations related to the ineffectiveness/automation of their collection.
  • Type 2, characterized instead by the need to include components and new features aimed at improving the two-way connection between the digital and real worlds as well as the data storage of simulations.
Figure 22. Possible dataflow for a mature Digital Twin.

As can be seen in the image, some additional elements have been included in the proposed evolution of the dataflow:

• An AR-VR-MR data exploration service via Unity platform.

• A DB service for unstructured data that can receive both structured data from BMS systems and unstructured data for future implementations.

Conclusions

The prototype digital twin implemented focused on the use of KNIME, Power BI, and VCAD services. The services were chosen with the goal of testing a digital twin that would explore across the board all possible capabilities offered by currently available data.

The development of the Digital Twin prototype was long and laborious, but it produced a robust and very promising result with potential for flexibility and expansion. In fact, the prototype can serve as a first step for the introduction of Digital Twin also on other sites of the same group of companies so as to support and improve asset management and building managers’ needs and, in an even broader perspective, business decision-making processes as well.

--

--