A Cloud Savvy Way to Store Oil & Gas Drilling Data

Setup a Well Log Passive Store for Max Efficiency & Cost Effectiveness

Hashmap, an NTT DATA Company

Published in

Hashmap, an NTT DATA Company

5 min readJun 18, 2019

by Chris Herrera and Anuj Kumar

In Oil and Gas drilling operations, sensor data from surface and downhole is generated at a very high frequency. This data is generally packaged run by run and as the data becomes older, it’s not used for fetching point in time information for a specific date/time or analyzing it for any abnormalities. Thus, the access pattern for the data changes in a way that allows for cost optimized storage and retrieval of the data.

Although the access pattern has changed, the data is still very important in planning new wells in the same field or nearby fields. In order to efficiently plan new operations it is critical to understand past operations to determine optimal bottom hole assembly (BHA) configuration, trajectories, and drilling programs. However, storing this data in a traditional database solution (data warehouse, operational data store, etc.) would lead to unnecessary cost and management overhead.

So What Are the Challenges with Storing Drilling Data?

As the volume of data is huge and continuously growing, it becomes a costly affair to keep the old and less relevant data in an enterprise or primary datastore, sometimes also called an Active Store (typically an RDBMS or NoSQL database such as Cassandra).
Apart from the cost factor, performance can also be impacted due to high indexing times and slower query executions when required by applications which are using the real-time data from the Active Store for use cases such as anomaly detection, ROP optimization, etc.

Consider a Two Part Solution

The above mentioned problems can be tackled if you separate the older offset data from current, active data. Therefore this can be addressed with a two part solution:

Extract old data (for example, older than a year) from the Active Store, leaving only the active data remaining in the datastore currently being used. This will significantly reduce the size of the datastore, making it cost and performance efficient and reducing the management overhead.
Put the offset data into a new service, the passive store, which is designed based on the type of data and query patterns.

The graphic below depicts the proposed 2 part solution:

The Active Store (Part 1 above) is a hot store, facilitating ad hoc and streaming queries, however, this post focuses on the Passive store (Part 2 above).

Design Considerations for the Passive Store

For ingesting data into the passive store, the solution uses a serverless computing architecture to help reduce the cost since compute resources will only be used when new data needs to be ingested.
Avoid using any enterprise database for storing the passive data as it will lead to the same cost and performance overhead as the Active Store.
Store the data in a format which helps reduce the size and allows for efficient searching.
Additionally, the Search engine is not required to be “always-on” due to the lower query frequency and the lower SLAs in terms of query response times.

AWS Specific Implementation of the Passive Store

For this post, AWS resources were used to build a service based on the above design considerations. We assumed that the data is extracted from the Active Store and is available to us in an AWS S3 bucket in LAS format. A Lambda Function was written to transform the data from LAS to ORC and an S3 bucket was used to store the ORC data.

When a query is executed, the data in this S3 bucket will be searched and the relevant data from those files will be loaded, while avoiding the need for a database. To create the metadata that will be searched on, an AWS Glue Crawler scans this S3 bucket at a given frequency. The AWS Glue Crawler scans the data and creates an AWS Glue Table holding the metadata of all the files in the S3 bucket.

Finally, AWS Athena was used to query the ingested data using metadata from the AWS Glue Table data. In case you are wondering, AWS Athena supports ANSI SQL for querying.

Below is a high level diagram connecting all the components together:

Let’s look at each component to see how the passive store is configured:

Load all LAS files in an S3 bucket.
Create an AWS Lambda function to take this data as source and start converting it into ORC file format — we can also use other file formats like Parquet, Avro and CSV. CSV is the most common format supported by most cloud providers for blob storage searches.
The Lambda function will write converted files to the destination folder of an S3 bucket, for instance, well-logs-orc/data.
Configure an AWS Glue Crawler to scan the destination folder and index new files at a certain frequency. AWS Glue Crawler takes a S3 bucket and tries to partition data depending on nested folders. For example, let’s say we have a S3 bucket named well-logs-orc which contains a folder named data. if we add logs for every well under a different folder, Glue Crawler will use the folder as a partition for inferring the file schema.
Use the AWS Athena Console to query the data using ANSI SQL syntax.

Get Started in the Cloud with Drilling Data

In this post, we described a way to build a solution for a serverless Passive Store for Well Log data using AWS cloud services to solve the issues of cost and performance. Although we used AWS in this example, the same cloud-native solution template can be implemented on the cloud provider of your choice (Azure, GCP) with each respective cloud vendor’s native services.

If you enjoyed this, here are some other recent Hashmap stories that you might like as well:

This is Why You Should Use Snowflake for Security Analytics

I will admit it, when asked what you should use as the core technology to understand your organizations cybersecurity…

medium.com

How to Ingest & Enrich IoT Data at Scale into Snowflake with Apache NiFi

No, You Don’t Need To Work Long Hours to Make This Happen

medium.com

Business Won’t Wait — Migrating to Azure for Data & Analytics

Accelerating Business Outcomes Using Azure — Databricks, Data Factory, PowerBI, & Snowflake Cloud Data Warehouse

medium.com

Feel free to share on other channels and be sure and keep up with all new content from Hashmap.

Anuj Kumar is Accelerator Software Development Lead (connect with him on LinkedIn) and Chris Herrera is Chief Technology Officer (tweet Chris at @cherrera2001 or connect with him on LinkedIn) at Hashmap working across industries with a group of innovative technologists and domain experts accelerating high value business outcomes for our customers, partners, and the community.

A Cloud Savvy Way to Store Oil & Gas Drilling Data

Setup a Well Log Passive Store for Max Efficiency & Cost Effectiveness

So What Are the Challenges with Storing Drilling Data?

Consider a Two Part Solution

Design Considerations for the Passive Store

AWS Specific Implementation of the Passive Store

Get Started in the Cloud with Drilling Data

This is Why You Should Use Snowflake for Security Analytics

I will admit it, when asked what you should use as the core technology to understand your organizations cybersecurity…

How to Ingest & Enrich IoT Data at Scale into Snowflake with Apache NiFi

No, You Don’t Need To Work Long Hours to Make This Happen

Business Won’t Wait — Migrating to Azure for Data & Analytics

Accelerating Business Outcomes Using Azure — Databricks, Data Factory, PowerBI, & Snowflake Cloud Data Warehouse

Written by Hashmap, an NTT DATA Company