Drilling Down: Analyzing Hydraulic Rig Data for Enhanced Manufacturing Monitoring

Em Ejiga
10 min readMay 26, 2023

--

A hydraulic rig system, yellow and blue.

Introduction

A hydraulic rig system is a complex machine used in the oil and gas industry to extract oil from the ground. Its performance is crucial to the success of drilling operations, and any inefficiencies or malfunctions can result in significant downtime and financial losses.

Data analysis can be used to gain insights into the performance of the hydraulic rig system by analyzing data collected from sensors installed on the rig, such as pressure, volumetric flow, temperature, and vibration measurements.

By analyzing this data, we can identify patterns and anomalies that indicate potential problems, allowing us to take corrective actions before they become critical issues. Additionally, data analysis can help optimize the performance of the hydraulic rig system by identifying opportunities to improve efficiency and reduce waste.

Business Task

I have been tasked with creating a manufacturing monitoring dashboard that allows management to understand the manufacturing process of a hydraulic rig system in more detail.

In order to accomplish this task, I’ll be using data analysis to gain insight into:

  1. The cooling system effectiveness of the rig.
  2. Its energy efficiency.
  3. The overall power consumption of the system.
  4. The flow of fluids through the system.
  5. The fluid pressure.
  6. The overall performance of the rig.
  7. The temperature of the fluids and components of the rig.
  8. The mechanical condition of the rig.

And use these insights to build a dashboard that presents complex information in a clear and engaging manner to the management team.

Data

Source: UCI Machine Learning Repository

Data Type: Multivariate, Time-Series
Task: Classification, Regression
Attribute Type: Categorical, Real
Area: CS/Engineering
Format Type: Matrix
Does your data set contain missing values? No

Number of Instances: 2205

Number of Attributes: 43680 (8x60 (1 Hz) + 2x600 (10 Hz) + 7x6000 (100 Hz))

Relevant Information:
The data set was experimentally obtained with a hydraulic test rig. This test rig consists of a primary working and a secondary cooling-filtration circuit which are connected via the oil tank [1], [2].

The system cyclically repeats constant load cycles (duration 60 seconds) and measures process values such as pressures, volume flows and temperatures while the condition of four hydraulic components (cooler, valve, pump and accumulator) is quantitatively varied.

Attribute Information:
The data set contains raw process sensor data (i.e. without feature extraction) which are structured as matrices (tab-delimited) with the rows representing the cycles and the columns the data points within a cycle. The sensors involved are:

Sensor  Physical quantity  Unit  Sampling rate
PS1 Pressure bar 100 Hz
PS2 Pressure bar 100 Hz
PS3 Pressure bar 100 Hz
PS4 Pressure bar 100 Hz
PS5 Pressure bar 100 Hz
PS6 Pressure bar 100 Hz
EPS1 Motor power W 100 Hz
FS1 Volume flow l/min 10 Hz
FS2 Volume flow l/min 10 Hz
TS1 Temperature °C 1 Hz
TS2 Temperature °C 1 Hz
TS3 Temperature °C 1 Hz
TS4 Temperature °C 1 Hz
VS1 Vibration mm/s 1 Hz
CE Cooling efficiency (virtual) % 1 Hz
CP Cooling power (virtual) kW 1 Hz
SE Efficiency factor % 1 Hz

Python and Power BI are the tools I will use to analyze and visualize this dataset.

You can see my Python codes here.

Exploratory Data Analysis

Firstly, I imported the necessary packages in to Jupyter Notebook and read in the dataset.

# import packages

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

# read in the dataset
df = pd.read_excel('Manufacturing Rig Data.xlsx')

# Preview first 5 rows of data set
df.head()
Preview of dataset

Then I used df.shape() and df.info() to get the shape and summary information of the dataset.

# Shape of data set
df.shape

(2205, 19) (2205 rows, 19 columns)

# Summarised information of data set
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2205 entries, 0 to 2204
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Time 2205 non-null int64
1 Cooling efficiency 2205 non-null float64
2 Cooling power 2205 non-null float64
3 Motor power W 2205 non-null float64
4 Volume flow l/min 1 2205 non-null float64
5 Volume flow l/min 2 2205 non-null float64
6 Pressure bar 1 2205 non-null float64
7 Pressure bar 2 2205 non-null float64
8 Pressure bar 3 2205 non-null float64
9 Pressure bar 4 2205 non-null int64
10 Pressure bar 5 2205 non-null float64
11 Pressure bar 6 2205 non-null float64
12 Efficiency factor 2205 non-null float64
13 Temperature 1 2205 non-null float64
14 Temperature 2 2205 non-null float64
15 Temperature 3 2205 non-null float64
16 Temperature 4 2205 non-null float64
17 Vibration mm/s 2205 non-null float64
18 Date 2205 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(16), int64(2)
memory usage: 327.4 KB

There are 2,205 rows and 19 columns. Data type is correct and matches the corresponding values.

Data Cleaning

The data was pretty much clean, but to ensure it was totally ready for analysis, I searched the dataset to see if there were any missing values.

# Find the number of null values for all columns
df.isnull().sum()

Time 0
Cooling efficiency 0
Cooling power 0
Motor power W 0
Volume flow l/min 1 0
Volume flow l/min 2 0
Pressure bar 1 0
Pressure bar 2 0
Pressure bar 3 0
Pressure bar 4 0
Pressure bar 5 0
Pressure bar 6 0
Efficiency factor 0
Temperature 1 0
Temperature 2 0
Temperature 3 0
Temperature 4 0
Vibration mm/s 0
Date 0
dtype: int64

There are no null values in the data.

I also searched for duplicate data.

# Find the number of duplicate data
df.duplicated().sum()

#Output
0

There were no duplicates either. The data is good enough for use.

Summary Statistics

I performed summary statistics on the dataset to provide a concise overview of its key characteristics, which should enable us to understand its central tendency, dispersion, and its measure of normality. It will also reveal outliers (if any) in the data.

Measure Of Central Tendency

The mean and median of the variables are pretty much similar except for Pressure bar 4 where there’s a significant difference between the mean and median.

Measure Of Dispersion

To get the measure of dispersion, I found the standard deviation of the data.

The cooling efficiency and motor power variables show very high variability. The efficiency factor, temperature and pressure variables show moderate variability, and the volume flow and vibration variables show that the data points are packed closely to the mean (i.e low variability).

Measure Of Normality

The skewness showed the measure of normality of the dataset.

The data shows that almost all the variables are positively (right) skewed. Only the volume flow, pressure bar 3, and efficiency factor variables show negative (left) skewness.

Outliers

After checking, I found outliers in the dataset, which suggests the presence of extreme values that deviate significantly from the majority of the data, potentially indicating unusual or exceptional observations.

The Motor power W, Efficiency factor, and Pressure bar 1 & 2 variables showed significant outliers. I used boxplots to visualize them:

Outliers in the Motor power W variable suggests abnormal power consumption or irregularities in the motor performance. They indicate potential issues with the motor or variations in the operating conditions that may require further investigation or maintenance.

Outliers in the Efficiency factor variable indicate deviations from the expected performance efficiency of the hydraulic rig. The lower efficiency factor outliers suggest energy wastage or suboptimal operation.

Outliers in the Pressure bar 1 & 2 variables indicate irregularities in the hydraulic pressure system. These outliers signify abnormal pressure fluctuations, potential leaks, or anomalies in the pressure measurement.

Correlation Analysis

In the correlation analysis of the dataset, I examine the relationships between the different variables to assess their interdependence and potential impact on the rig’s overall performance.

First, I used a heatmap to give an overview of the correlation between all the variables.

To gain further insights, I selected several variables that exhibited significant relationships and utilized scatterplots to provide additional clarity and understanding.

The scatterplots show that there’s a positive correlation between cooling efficiency and cooling power and a negative correlation between cooling efficiency and temperature.

The significance of these relationships to the performance of the hydraulic rig’s cooling system is that a positive correlation between cooling efficiency and cooling power indicates that increasing the cooling power input can improve the cooling efficiency and help maintain optimal operating temperatures.

However, the negative correlation between cooling efficiency and temperature suggests that higher temperatures in the hydraulic rig system hinders the cooling efficiency and affect the ability to dissipate heat effectively.

The scatterplots show that there’s a positive correlation between volume flow and pressure and a positive correlation between cooling efficiency and volume flow.

The significance of these relationships lies in their impact on the hydraulic rig’s performance. The positive correlation between volume flow and pressure implies that controlling and optimizing the volume flow rate is crucial to maintaining the desired pressure levels for effective operation of the rig.

Similarly, the positive correlation between cooling efficiency and volume flow highlights the importance of ensuring an adequate and sufficient flow of coolant to achieve optimal cooling performance and prevent overheating.

The scatterplots show that there’s a positive correlation between motor power and cooling power and a negative correlation between motor power and temperature.

The positive correlation between motor power and cooling power implies that there is a relationship between the power consumption of the motor and the cooling capacity of the system. It is expected that as the motor power increases, more energy is consumed, leading to higher cooling power to dissipate the generated heat effectively.

The negative correlation between motor power and temperature implies that increasing motor power can lead to temperature reduction in the hydraulic rig. This can be advantageous in maintaining the desired temperature levels for efficient and reliable operation.

Time Series Analysis

In order to obtain a comprehensive overview of the hydraulic rig’s performance over the past three months, I generated line plots depicting all the variables against time.

Cooling efficiency showed a sharp decline during the first month followed by a gradual increase in the subsequent months, which suggests a potential issue or inefficiency in the cooling system during the initial period. This indicates there’s a malfunction, maintenance requirement, or an adjustment needed in the cooling system. The subsequent rise in cooling efficiency suggests that corrective actions may have been taken, resulting in an improvement in the system’s performance.

The cooling power line plot shows a similar trend to the cooling efficiency.

The motor power graph showed sharp upward spikes during the first month, exceeding the normal range, indicating irregular and abnormal behavior of the motor. This suggests potential issues such as motor malfunctions, power surges, or improper control settings.

The line plot of volume flow 1 against time over a period of three months displayed fluctuating and erratic patterns. Particularly in the first month, the volume flow exhibited numerous spikes, some indicating increases while others indicating decreases.

The sharp decline in volume flow 2 during the first month followed by a sudden upward spike and a steady level in the second and third months suggests a significant change in the flow dynamics of the hydraulic system.

The initial decline in volume flow indicates a decrease in the overall flow rate, which can be attributed to factors such as system inefficiencies, blockages, or changes in operating conditions.

The sharp upward spikes in pressure during the first month, exceeding the expected range, indicate abnormal pressure fluctuations in the hydraulic system. This is indicative of faulty valves, pumps, or other system elements. It’s also an indication of blockages or restrictions in the hydraulic lines, causing pressure to build up and fluctuate irregularly.

This is the same for Pressure bar 1 & 2.

The sharp downward spikes in pressure during the first month, falling below the expected range, indicate abnormal pressure drops in the hydraulic system. This is indicative of leakages or component failure. The hydraulic fluid may be leaking or draining from the system, causing a decrease in pressure.

Pressure bar 3 also shows closely packed squiggly lines, which indicates that the pressure readings are experiencing high variability or fluctuation over time.

The line plot for Pressure bar 4 showing zero most of the time, with occasional sharp upward spikes in the first and third months, followed by a sharp downward spike and subsequent increase, suggests a potential issue or anomaly in the pressure readings.

The consistent zero readings for most of the time indicate that the pressure 4 values were not being measured or recorded during those periods. This could be due to sensor malfunction, data acquisition errors, or intentional shutdown or maintenance of the system.

Pressure bar 5 & 6 show similar trends to the volume flow indicating that they had similar system malfunction.

The line plot for the efficiency factor showing sharp downward spikes in the first month, which are significantly lower than the normal range of efficiency factor readings, suggests a potential issue or anomaly in the efficiency of the hydraulic rig system.

The sharp downward spikes in the first month indicate sudden drops in the efficiency of the hydraulic rig. Possible causes include equipment malfunctions, suboptimal operating conditions, inadequate maintenance, or variations in the input parameters (such as motor power or pressure).

The line plot for temperature showing a sharp upsurge in the first month, followed by a sharp downward spike in the second month, and then a steady trend in the third month suggests variations or fluctuations in the temperature readings of the hydraulic rig system.

The sharp upsurge in the first month indicates a significant increase in temperature, which can be attributed to various factors such as increased workload, environmental conditions, or changes in the rig’s operating parameters. This is the same for Temperature 1–4

The line plot for vibration showing closely packed squiggly lines suggests that the vibration levels in the hydraulic rig system fluctuated frequently over the three-month period.

The sharp upward spikes observed in the vibration readings during the first month indicate instances of high vibration intensity. These spikes in vibration indicate potential problems or irregularities in the rig’s components or operation.

You can view the dashboard here.

Conclusion

In summary, the line graph of all the hydraulic rig variables reveals an initial period of malfunctioning in the first month followed by stabilization in the second and third month, suggesting the need to investigate and address the issues observed during the initial period of operation.

Recommendation

A thorough inspection of the hydraulic rig components should be conducted in order to identify and resolve any underlying problems. And regular maintenance procedures should be implemented to prevent future malfunctions and ensure consistent performance.

--

--