Data Engineering for Wind Power Optimization: The promise and challenges

15 Jan 2019

Artificial Intelligence. Deep Learning. Neural Networks. Machine Learning. Blockchain. Big data. We are surrounded by these technology buzz words and their promise of software panaceas. The unspoken assumption is to collect a lot of data, apply these new data tools, and watch value magically appear. Many industrial businesses spend millions of dollars implementing broad-based data collection, in the blind hope of a data solution.

Unfortunately, these new data tools are limited. Much like a hammer, drill, or screwdriver, only when the tool is applied by specialists will its true value be unlocked.

Without a clear vision of the value they hope to extract, many firms are unwittingly engaging in poor data engineering. This severely limits the value available to them from these new tools. Ultimately, where and when the latest data science should be used will be driven by the value proposition.

The wind industry faces at least six distinct data engineering challenges that must addressed by any wind farm owner (or performance analyst) looking to take advantage of the latest and greatest data science techniques.

The precise inflow conditions are not known.

Uniquely among turbine power generation, wind turbine generators face an unknown input resource. For hydro or steam turbine generators (whether nuclear, molten-salt, gas, or coal) the enclosed intake enables inflow conditions to be controlled and well understood. From this, operators and analysts can generate clean performance signals to assess any turbine specific challenges.

In fact, when it comes to assessing the turbine aspect of performance, nuclear power is a lot simpler than assessing hundreds of wind turbines across a hillside. Each turbine with limited instrumentation, different, unknown, and constantly varying inflow conditions, can have diverse input resource.

Wind inflow conditions - driven by wakes, forestry, terrain, and atmospheric stability - can create performance variation higher than 30 percent within each wind speed bin below rated power: trying to detect an underperformance issue of 1 to 2 percent becomes a herculean task, especially if you don’t get your data model correct (regardless of the algorithm or machine intelligence applied).

The assumptions used in wind turbine design and data collection have not kept pace with advancements in turbine engineering or software engineering.

Turbines with larger blades, farms with more turbines, and farms built in more varied global locations, challenge longstanding industry assumptions. SCADA and other software systems are often generations behind those of the analysis tools, even if the turbine has the latest manufacturer software upgrades.

This has led to problems both in terms of mechanical failures, and owner-operators’ available strategies for data collection. These severely limit the effectiveness of any modern data science techniques, without significant and conscious up-front data engineering effort.

Data collection structures reflect an industry obsession with availability as the only important metric.

The wind industry’s data collection structures have been built around answering the question, why has the wind turbine stopped spinning (availability)? A better question would be, is the turbine producing enough energy (performance)? Many wind turbines suffer chronic or acute technical performance issues that do not cause shutdowns. SCADA and data collection systems are not built to capture, communicate, and manage these issues effectively.

Turbine suppliers and wind farm service providers (operators) have conflicting interests, which combine to harm wind farm owners.

The wind industry is plagued by poorly structured service agreements that do not align incentives to maximize farm output. Even the incentives for internal self-perform teams are awry; they don’t create a culture and contractual environment that aligns the project team (owner, asset manager and operator) to maximizeenergy output over the life of the wind farm.

At the same time, not enough suppliers engage with wind farm owners to address potential performance issues in a lean, dynamic, and transparent way. Many prefer to maintain their expert position relative to their customer base.

Therefore, the only people with large amounts of cross-sectional turbine operating data (the suppliers) do not share or learn from their customers. While those at the front line (the operators) are working with smaller datasets, and poorly structured data collection systems.

The result is systemic underperformance across the wind industry.

Overly-optimistic, pre-construction energy yield estimates have left wind farm owners with limited resources to investigate and address performance issues.

Wind farms are not massively lucrative operations.

On the revenue side, projects often do not achieve their pre-construction energy yield forecasts. On the cost side, as developers seek to minimize capex to get through financing, owners end up paying for it on the opex.

Current projects are financed on merchant markets with sub two cents per kwh hedges.

Any big data strategy looking to have an industrywide impact must come in with a clear and definable value proposition that can increase annual energy production quickly.

Knowledge sharing between turbine suppliers, industry consultants, and asset operators is very limited.

In the next five years, we are on target to pass one terawatt of wind energy installed globally. Who will hold the right ‘big labelled data’ to give future artificial intelligence tools the best chance of bringing meaningful performance increases?

Turbine suppliers have the scale, but their closed box business model prevents them from effectively implementing an agile improvement model that would see that big data labeled effectively.

The largest operators might be active on over 20 gigawatts of assets – a mere 2 percent of the market, given the inherently local, and hence fragmented industry structure. Will they have the tools to effectively capture, label, and analyze the big data from those assets? 


Gareth has over a decade of experience leading identification, development, construction, financing, and operation of renewable energy assets for a renewable energy technical consultancy. He is an entrepreneur, chartered engineer with the IMechE, and has degrees in mathematics and mechanical engineering. Currently, he serves as the CEO for the Renewable Energy Software Startup, Clir Renewables.

Clir Renewables |


Volume: 2019 January/February