Development and Validation of Wind Farm Models within the Data Science Paradigm

We discussed the interests of Task 31 on “Wind Energy and Digitalization”, the last IEA-Wind Topical Expert Meeting in Dublin: how data-driven models can be complementary to physics-driven models in the development of engineering design tools with access to operational wind farm data.

Published in

The Wind Vane

6 min readOct 18, 2018

Tulip Fields near The Hague, Claude Monet (1886), The Van Gogh Museum, Amsterdam

Last 4–5 October I attended the TEM#92 in Dublin to present a perspective on the potential use of data science in the development of wind farm models. This has been the best experts meeting I’ve attended so far with record participation. An exciting program was put together by Task 11 and excellently conveyed by Nadine Mounir, John Mc Cann and Jason Fields, combining presentations from industry and research with break-out discussions about the problems and opportunities offered by digitalization. A white paper on the topic of the meeting will be organized to elaborate further a roadmap to wind energy digitalization and identify priority areas that could be developed within a dedicated IEA-Wind Task.

The Data Science Paradigm

In the context of Task 31, the data science paradigm shall put the focus on the data rather than on the computational physics-based model.

This approach assumes that we have access to large databases of operational wind farms covering a wide range of wind farm configurations and wind climate conditions. A data science toolbox will establish relationships between wind conditions and wind farm performance to train engineering wake models that will systematically improve as more data is added. This is in contrast to traditional models that seek universal formulations of computational fluid dynamic (CFD) models that improve systematically as better physics are implemented at increased resolution.

This new paradigm requires that models are trained on smart quality-controlled data which has undergone intensive screening to make sure they provide more added value than noise to the predictive models. Hence, most of the development work is devoted to cleaning and structuring the data such that it can be effectively used to parameterize fast engineering models for (in the context of Task 31) the assessment of annual energy production (AEP) and wind farm design optimization.

The data workflow also implies that we can close the loop between planning and operational phases so that new data can be incorporated in the calibration of wind farm design tools and predictive models.

This digital wind farm uses the data language to leverage subject-matter expertise that has been traditionally developed in knowledge and organizational silos.
This common data language is facilitated with wide adoption of open-source toolboxes developed in popular programming languages like Python or R. The data science toolbox is readily available and the focus is put in data analysis enabled by proper data management and computing infrastructures.

Model Evaluation

The wind energy research community is already developing open source projects that can be effectively used together with general purpose data science toolboxes to produce data analytics workflows. For example, NREL presented at TEM#92 the OpenOA initiative to provide a reference framework for performance analysis from wind farm operational data. Validation datasets can be generated and tracked more transparently using open-source scripts resulting in more consistent evaluation as data from additional sites become available.

Task 31 is developing an ambidextrous verification and validation (V&V) framework whereby models are developed and evaluated systematically following a bottom-up approach of increasing flow complexity. This follows best practices from CFD models when full-system validation is not possible or prohibitively expensive. Here, the predictive capacity of flow models is inferred by building credibility through a validation process with sub-system and unitary-problem experiments targeting the physical phenomena that have presumably the largest impact in the system’s performance. More information about the multi-scale building-blocks for wind farm flow models can be found here. Model developers formulate targeted validation benchmarks derived from high-fidelity experiments in controlled conditions to demonstrate the added value (credibility) of their models. The larger the validation envelope of physical phenomena, the more confidence in the application of these models in connection to design tools.

Closing the loop in the ambidextrous V&V process must include testing validated models with operational data across a range of relevant wind conditions. Only at this level, can one effectively quantify the impact of new model implementations on application-specific quantities of interest. It is also at this stage where model calibration (or training) can be introduced to mitigate the systematic bias of the model due to lack of physics or due to a limited validation range.

The final outcome of the evaluation process is a model validated over a limited range of controlled conditions and calibrated over a wider range of operational conditions. Parametric sensitivity testing and uncertainty quantification techniques can be used to identify knowledge gaps in the underlying model (wind conditions where uncertainties are larger) which should be prioritized in the next validation loop exploring the available data pool or designing a dedicated experiment.

Challenges

Lack of high-fidelity experimental data has been traditionally the most important bottleneck in the development of flow models. Recent research programmes like NEWA or A2e have addressed this limitation by conducting large-scale field campaigns with extensive use of scanning lidar technology. These systems produce a wealth of information that greatly complements traditional mast-based measurements.

The challenge now is how to make sure that all these data is accessible and interpreted consistently across different research groups.

Standards for data and metadata need to be defined and adopted in the wind energy sector as a whole to enable efficient exploitation of data resources. This is one of the most important bottlenecks for wind energy to tap into the full potential of digitalization, as discussed in many good examples in TEM#92. This is also a fundamental objective in open science policies which, in the European context, is addressed with the promotion of FAIR principles requiring that all research data produced by public funding should be Findable, Accessible, Interoperable and Reusable. Initiatives towards FAIR wind energy public data repositories have been promoted by the IRP-Wind project in Europe (Sempreviva et al, 2018) or the A2e Data Archive and Portal (DAP) in the United States. The lidar community, under the umbrella of IEA-Wind Task 32, has recently published the e-Windlidar, a data convention implementing FAIR principles in the management of lidar data. Standardization of operational data is also present in several industry standards but adoption, as in the scientific community, is far from being realized.

Promotion of standards without clear incentives addressing end-user needs does not result in effective implementation.

Managing data ownership is a fundamental challenge inherent to any data science discipline and wind energy is no exception. In our context, while research data is increasingly publicly available through the implementation of open science policies, industry remains reluctant to data sharing.

Nevertheless, using appropriate bilateral non-disclosure agreements, it has been possible to access wind farm operational data in connection to research projects. Recent examples of data sharing by industry discussed In TEM#92 have been the WP3/PRUF benchmarking project, where NREL could analyze operational data from 10 wind farms, or Ørsted providing access to academic institutions to offshore wind farm data from offshore wind farms Westermost Rough and, previously, Anholt.

The exploitation of a distributed database of public and private data also creates the challenge of tracking the data-awareness of models, i.e. which data has been used to validate and train each model.

Here it is important that open-source models can document their track record in validation activities such that they can become a reference for engineering and advance models to compare against.

Interest of Task 31 in “Digitalization”

To summarize, the following items are worth mentioning about the interest of Task 31 in digitalization in the anticipation of overlap areas with a potential new IEA-Wind Task:

Promote data standardization and open-science practices on experimental data and simulations.
Multi-site analysis of modeling errors vs site and climate conditions (left side of ambidextrous V&V framework) to identify knowledge gaps and complement hierarchical systematic building-block validation (right side).
Develop model evaluation procedures for data-driven models.
Explore pathways to design data-based engineering models using observations and high-fidelity simulations.
Uncertainty quantification using inference methods.

Interesting in contributing to this line of work in Task 31? You are welcome to share your thoughts in the discussion below or eventually join Task 31.