Machine Learning + Process Simulators = 💘

Published in

Kongsberg Digital

9 min readApr 25, 2019

What could machine learning do for us if we were not limited by data? That question is one we asked ourselves here in Kongsberg Digital. Despite the hype of machine learning and deep learning, we are yet to see it make a significant impact in the Oil and Gas industry. The enormous capital costs and high-risk environments in the Oil and Gas industry seems at odds with the uncertainties, “Black Box” approaches and poor extrapolation capabilities of machine learning. The industry has traditionally used explainable dynamic process simulators. Is there a way of combining the powers of machine learning with the safety and precision of the simulators? We believe there is.

Why do we care?

This decade we have seen impressive advances in the application of machine learning and deep learning to longstanding problems. Computer vision and image comprehension have surpassed human abilities in some areas, and natural language processing has taken huge strides forward the last couple of years. There is no longer doubt that computers will be able to outperform humans at almost any game with the recent milestone of Google DeepMind beating top professional players at StarCraft II.

Yet, in the Oil and Gas (O&G) industry these technologies are virtually non-existent. What is it with this industry that has been impeding the adoption of ML?

There is no lack of use cases. The capital cost of a new oil platform can be enormous, exemplified by the new oil platform Johan Sverdrup estimated to cost $ 10 billion. Any anomalies slowing production will, therefore, incur large costs. A system that detects component fatigue, suggesting when parts should be replaced to avoid failures, should provide substantial benefit. Another possibility is a system that could recognize signatures of potential hazards allowing the crew to act in time. Increasing rig uptime and improving safety could save these companies millions a year.

Production optimization provides another set of interesting applications of machine learning in the Oil & Gas industry. One option is searching for an operational configuration for the many compressors that minimizes the overall power consumption while keeping the overall production constant. Another idea floated is minimizing the gas lift injection for getting oil up to the platform without causing slugging events. These models could provide decision support for the operator crew, so that they can better foresee the effects of changing operating conditions. Today, decisions on operating conditions largely rely on the experience of the crew; hence, tools that combine their experience and intuition with fast predictions could facilitate operations. With large capital and operating costs for these platforms, even modest improvements can have a significant impact on the bottom line.

However, there are good reasons to be cautious. The infamous Deepwater Horizon oil spill is a stark reminder of the damage mistakes can cause. An image classifier that thinks a dog is a cat has a little negative impact on the world around it; however, this does not hold in the Oil & Gas industry. As an example, imagine if a production optimization algorithm tells the crew to increase the throughput, completely oblivious of the maximum strength of a pressure vessel in the system. Needless to say, disaster would follow. With the risk of oil spills, equipment failure or injury to the crew, nothing can be left to chance.

Deepwater Horizon reminds us of the consequences of mistakes. Source.

Where are we today?

Things are changing. The big industry players are now looking for ways of cutting costs, often by digitalizing their many operations. These companies are storing more of their data and looking for ways of enabling that data to transform their businesses. This is a good start, but it might not be enough. Even in a world where every sensor reading is recorded, aliasing mitigated, and alarms logged, making sense of such complex systems poses some fundamental problems.

Machine learning can only be as good as the data you provide it. It learns only from the data it has seen and will only perform well when interpolating between the data points provided. It is however notoriously bad at extrapolating from the data onto new cases. Even knowing if it is extrapolating or interpolating can be very challenging. As seen in the figure below, within the training data range, the proposed models fit the data well; however, as soon as we venture outside this area the models are no longer close to realistic.

Machine learning models work well when interpolating data but often generalize poorly outside the data range of the training set. All the curves in the top figure fit the data reasonably well, but they make completely ridiculous predictions outside the training range. Source.

This poses serious limitations for the use cases we presented as we need to be able to use our models even if some of the conditions are outside the range we have seen in the available data. How do you recognize a failure that has never happened before from a purely data driven model? What does a fire look like to your sensors? What are the signs of pressure vessel weaknesses caused by fatigue? What about the potential failures that didn’t happen due to quick responses from the crew? The list of problems just goes on. All this data would be required for the failure detection system to be able to warn the crew in time or for the optimization algorithm to stay well within safety limits for the equipment. As you can imagine, data recorded during rare and dangerous events, such as fires, failures and other anomalies, are difficult or impossible to obtain. Consequently, purely data driven modeling is extremely challenging in the O&G industry.

Dynamic process simulators

The dominating approach for modeling in the industry today are with proprietary dynamic process simulator software. These are built on a combination of physical understanding, numerical simulators, and empirical correlations to deliver precise and explainable predictions. They allow for iterations over designs, testing out scenarios and training of crew. Two of these dynamic process simulators, K-Spice and LedaFlow, are developed by Kongsberg Digital, thereby allowing us to test our ideas on these simulators.

“Why do we need ML when we have such tools?” you might ask. Dynamic simulators have a few limitations that make them hard to use for the suggested use cases. Large models are often complex systems that require significant computation time to run. In many cases they will only run a few times faster than real time. For every failure mode that should be studied, a separate simulator instance is required, which in practice requires an infeasible amount of computational resources. Similarly, it would require enormous computational resources to test every possible control input setting to optimize production. We would need a system that could help us narrow the search to only a few instances. This is where machine learning can provide the edge.

Merging machine learning with the simulators

We want a system where we can benefit from the advantages of machine learning without compromising on the reliability dynamic process simulators provide. This would enable us to solve some of the many complex problems in the Oil & Gas industry that have not previously been solvable. We propose a two-step solution. Step one is generating large amounts of data for the specific process using the dynamic simulator. We can then use this to train machine learning models that can be verified against the simulated data. In step two we can feed suggestions from the machine learning models back into the simulator to verify the predictions in real time.

Using a simulator allows us to create data both for data in the normal operating conditions, but also for cases that wouldn’t normally be seen in real data. Such cases could include abnormal operating conditions, equipment failures and similar. To enable large scale data generation, we implemented a system to sweep huge parameter space by queuing jobs in a dynamic queue. This allowed us to home in on the problematic cases, a feature not previously available in the simulators. The abstraction of the data generation facilitated easier integration with the rest of the model training as well. With this parameter study we can compute close approximations to the specific model for the area of interest. The availability of data also allows us to perform robust validation and testing of the model. The process simulators allowed us to simulate anomalies and logging the predicted alarms. This meant that we could quickly create labeled datasets suitable for model training.

The first step in the process was creating a practical API to automatically modify states and inputs and extracting data from two of our process simulators K-Spice and LedaFlow. We then paired this with another system that enabled us to queue jobs to multiple simulators in parallel. Each model might be relatively slow, but by creating multiple instances we could effectively speed up the process of creating the data of interest by many multiples.

An overview of the Ledaflow worker queue architecture.

The next step involved making a python package to orchestrate these instances. We focused on the user experience and simplicity of the system, such that it requires no data science or programming background to use it. We wanted a system that only would require a simple configuration file to start. A process engineer with the domain knowledge of the system could then specify a set of ranges to span out and which sensors or states to log and the system would queue the thousands of jobs required to the servers. The adoption of new technologies depends on the ability of the stakeholders, who in most cases do not have a machine learning/data science background, to understand and approve what is going on. Giving them the controls will smooth this process (and offload work from data scientists).

The final step was converting the outputs from the simulators to standardized data formats. By standardizing this process, we found that models could basically be plugged in and out for the various datasets created without any manual configuration. Another important advantage of using generated data is that it effectively eliminates the problems of defects in the data, drastically reducing the need for data cleaning. This means that when the data is processed it is ready to be trained by preconfigured models speeding up the development process. In many cases, no manual configuration of the training is required, and key insights in the data can be automatically produced.

Can we model the entire plant?

Lastly, we wanted to test the ability to model these systems we could now generate data for. As a toy example we wanted to predict pressure drops, oil to gas ratio and flow regime (slugging, turbulence etc.) for a flow line riser from the sea bed up to an oil platform. For the simple system in question we achieved accuracy well within the uncertainty of the model itself both for the discrete flow regime calculations and the prediction of target variables.

Determining the probability of slugging for a specific flow line riser at different pressures and flow rates

Before we get too carried away with the opportunities of machine learning trained on generated data it is important to be aware of its limitations. ML models are not going to replace the physics-based simulators anytime soon. The networks can only be trained to mimic specific cases, but any attempt to learn general systems will not even get past the starting line. The second problem is that the parameter sweep required to characterize the system with data grows exponentially with the complexity of the system. For smaller subsystems, it is feasible to find good approximations of the system, but as the system complexity grows this becomes impossible. In contrast, the process simulators have only a polynomial increase in computation time with system complexity. ML models, therefore, have their use case when sub-components of a system require fast approximations to solve a problem.

Conclusion

The adoption of machine learning in the Oil & Gas industry has been slow, but recent developments has made the industry more eager to adopt such technologies. Traditional data driven techniques using data collected from real operations have serious limitations for anomaly detection and production optimization in the Oil & Gas industry. This has been impeding the impactful change foreseen by many so far. Harnessing the power of dynamic process simulators is one route to developing machine learning models that meet the standards of the industry. We have shown that by simplifying the interface to two of our dynamic process simulators, K-Spice and LedaFlow, we have been able to create systems that can generate huge amounts of data. The automated data generation enabled us to create robust machine learning models as we could simulate many types of failures and control configurations not normally seen in real operations. This also aided to speed up the development of the machine learning models by drastically reducing the required data cleaning. The models we created with this technique was shown to mimic the simulator for individual components with comparable accuracy to the model itself. We believe this approach can enable machine learning as an impactful tool for increased safety, reduced cost and lower carbon footprint in the Oil and Gas industry.