Research & Development

Synthetic Data x Machine Creation

Unlocking a major bottleneck for zero-emission building design

Sayjel Vijay Patel

Follow

Published in

Digital Blue Foam

5 min readMay 26, 2022

--

Data-Scarcity in the Building Industry

A new era of data-driven design will inform the next evolution of the Architecture, Engineering, and Construction (AEC) industry. With the process of rapid urbanization comes a broad range of challenges in sectors such as housing, employment, and healthcare. This creates new opportunities, through data generation, to support how we understand and steer the design process. An estimated 175 zettabytes of data in the global data sphere by 2025 [1] offers even more challenging opportunities for quantifying relationships between design and use. Several researchers [2], [3] have highlighted big data as a catalyst for urban planning to shift the focus from “data driven design” to “design driven data”[4].

Data heterogeneity and insufficient datasets are two challenges with big data analytics. To move from “data-based innovation” (DBI) to “data-driven innovation (DDI),” a synthetic data generation driven generative design technique, also known as “Machine creation,” is one way to overcome data scarcity or incompleteness [5].

Unlike digital businesses like finance, which generate billions of gigabytes of data every day, the building sector faces a unique problem: a majority of building-related datasets are confined in the real world. As a result, new approaches to acquiring building-related data sets are intended to better utilize ML’s potential to inform zero-emission building design.

What is Synthetic Data?

Machine Learning (ML) algorithms need access to large data sets to learn and predict design outcomes. Synthetic data is data that is artificially created rather than being generated by actual events. It is generated algorithmically and can be applied to a wide range of activities, including as test data for new products and tools, for model validation, and in AI model training.

Synthetic Data as Generative Design

At DBF we are leveraging our generative design technology to create a proprietary data data-base of 100 million annotated building designs and 1 trillion simulations by 2025. This data-set will be used to create a variety of fast, accessible, and cost effective tools to improve the environmental performance of building designs with respect to ESG goals. Using “off-the-shelf” Machine Learning (ML) algorithms, this data-set will facilitate the discovery and optimization of net-zero building designs, as well as the assessment and renovation of existing buildings.

In Q3 2022, DBF will launch an automated synthetic data pipeline to allow a single client computer to be able generate 1 million annotated building designs in a 24 hour period (Figure 1). This pipeline will be used to create a previously unthinkable building dataset in terms of scale and detail; and represent a step towards DBF filling the gap left behind by the incomplete, untimely and unstructured datasets for the building industry.

Figure 1 — Synthetic Data Generation Pipeline

Machine Creation: Case Studies

To demonstrate the pipele, we have successfully engineered an approach for generating site scale datasets for daylighting [7] (Figure 2), and wind modeling through Computational Fluid Dynamics (CFD) applications for quick feedback on the design to the user (Figure 3).

**Case Study 1** — Daylight Autonomy (Artificial Neural Nets (ANNs): Fast daylighting models were developed for a geographically diverse and distributed dataset across 15 major cities, including New York, Los Angeles, Vancouver, Toronto, Bogota, Warsaw, Madrid, Budapest, London, Birmingham, Abu Dhabi, Osaka, Tokyo, Kuala Lumpur, and Singapore, using a novel synthetic data generation approach. For cities not on the aforementioned list, the developed daylighting prediction model consistently performed well for Spatial Daylight Autonomy prediction. The synthetic data generation based machine creation framework for major cities could be a huge resource for stakeholders to use the model directly, similar to an energy-plus-weather (EPW) data file with ease of use.

**Case Study 2 —** CFD Analysis (3D Convolutional Neural Network (CNN). A synthetic machine learning (SML) framework was developed to predict three-dimensional wind flow patterns within minutes, when compared to typical CFD simulations. The synthesized datasets offer an effective solution to reduce pre-processing, processing and post-processing time and computation resources requirements. Clustering the cities and training distinct models for each cluster based on climatic, contextual, and other commonalities are the next steps in improving the model.

Conclusion & Next Steps

Synthetic data, as well as ML models trained atop said data, can be decoupled from the platform and made available directly through APIs. This will allow for academic/ industry collaborators to access data/ models for a variety of applications. Data could be used for further research, models could be used to create new data and potentially create composite models. ML models could also be monetized for commercial use, bypassing subscription to the platform. Availability of data through APIs also opens up doors to crowdsourcing, which can be used for validation, tagging, evaluation, etc., to name a few applications.

NOTES

1 Data Age 2025 — Import.io. Seagate UK, Apr. 2017, https://www.import.io/wp-content/uploads/2017/04/Seagate-WP-DataAge2025-March-2017.pdf.

2 Kitchin, R. (2014). ‘The real-time city? Big data and smart urbanism’. GeoJournal; 79(1): 1–4.

3 Albino, V., Berardi, U., and Dangelico, R.M. (2015). ‘Smart cities: definitions, dimensions, performance, and initiatives’. Journal of Urban Technology, 22(1): 3–21

4 Bige Tunçer. 2020. Data driven design to design driven data. In Proceedings of the 11th Annual Symposium on Simulation for Architecture and Urban Design (SimAUD ‘20). Society for Computer Simulation International, San Diego, CA, USA, Article 4, 1–2. https://dl.acm.org/doi/pdf/10.5555/3465085.3465089.

4 L’Heureux, Alexandra & Grolinger, Katarina & El Yamany, Hany & Capretz, Miriam. (2017). Machine Learning With Big Data: Challenges and Approaches. IEEE Access. PP. 1–1. 10.1109/ACCESS.2017.2696365.

5 Luo, Jianxi, Data-Driven Innovation: What Is It (December 15, 2021). Accepted for Publication in IEEE Transactions on Engineering Management, Available at SSRN: https://ssrn.com/abstract=3951983 or http://dx.doi.org/10.2139/ssrn.3951983

6 Buthayna Eilouti, Concept evolution in architectural design: an octonary framework, Frontiers of Architectural Research, Volume 7, Issue 2, 2018, Pages 180–196, ISSN 2095–2635, https://doi.org/10.1016/j.foar.2018.01.003.

7 Deshpande R., Nisztuk M., Cheng C., Chavan T.,Weijenberg C., Patel S. V., Synthetic Machine Learning for Real time Architectural Daylighting Prediction, Proceedings of the CAADRIA 2022 Conference, 2022

8 Jaipuria, N., Zhang, X., Bhasin, R., Arafa, M., Chakravarty, P., Shrivastava, S., Manglani, S., & Murali, V. N. (2020). Deflating dataset bias using synthetic data augmentation. arXiv:2004.13866 [cs, eess]. http://arxiv.org/abs/2004.13866

Special thanks to Ram Subramanian, Maciej Nisztuk, and Rutvik Deshpande for their contribution to this post!