VSB Power Line Fault Detection

Kaggle Problem case study

10 min readAug 13, 2020

Overview of the case study:

Step 1: Explanation of the problem which includes details about the source, problem statement, explanation of relevant terms, and how the problem can be viewed as a business problem.

step 2: Conversion of the business problem into a machine learning problem.Details of the existing performance metrics, previous solutions, approaches, and improvements.

step 3: Complete Exploratory data analysis.

step 4: Feature Engineering involving various techniques.

step 5: Trying out different machine learning models and selecting the best model and predicting on the test dataset.

step 6: Future works that can be done to improve the performance metric

step 7:References

1.Business/Real-world Problem:

1.1 Source:

This was posted as a kaggle challenge by the ENET Centre which researches and develops renewable energy resources with the goal of reducing or eliminating harmful environmental impacts.

Source: https://www.kaggle.com/c/vsb-power-line-fault-detection/overview

Data: Enet Centre, VSB — T.U. of Ostrava

1.2 What is Partial Discharge?

Here we deal with medium voltage overhead powerlines which are spread over hundreds of miles making manual fault detection almost impossible

These lines on some occasions get damaged by either a tree branch or due to a flaw in the insulator. These damages lead to a power outage gradually over the passage of time. This phenomenon is called partial discharge.

Its textbook definition is an electrical discharge that does not bridge the electrodes between an insulation system completely.

1.3 Problem Statement

The main objective of this case study is to detect these partial discharge patterns in signals acquired from lines with a new meter. Effective classifiers using this data will make it possible to continuously monitor power lines for faults.

1.3. Real-world/Business objectives and constraints.

1.Minimize binary-class error
2.probability estimates
3. There’s no time limitation as partial discharge faults do damage over time and not immediately so limit can be in hours
4. Detecting the partial discharge early can be helpful financially

2. Machine Learning Problem

2.1. Data Overview

Source:https://www.kaggle.com/c/vsb-power-line-fault-detection/data

In total 4 files are given in which 2 correspond to train data and the rest correspond to test data
1.A file containing signal data
2.A file containing metadata

Each signal contains 800,000 points and in total data of 8712 signals were given for training and 20337 signals were given for testing in the form of parquet data

Metadata consists of the phase of the signal and the target label
0-if partial discharge is not there
1-if partial discharge is present

2.2. Mapping the real-world problem to an ML problem

2.2.1. Type of Machine Learning Problem

There are 2 different classes of malware that we need to classify a given a data point => Binary class classification problem

2.2.2. Performance Metric

Source: https://www.kaggle.com/c/vsb-power-line-fault-detection/overview/evaluation

Metrics:
*Matthews correlation coefficient(MCC)
*Confusion matrix

2.2.3. Machine Learning Objectives and Constraints

Objective: Predict the probability of each data-point belonging to each of the 2 classes.

Constraints:

* Class probabilities are needed. * Penalize the errors in class probabilities => Metric is Matthews’s correlation coefficient.
* Some Latency constraints.

2.3.1. Existing approaches

Most of the notebooks present in https://www.kaggle.com/c/vsb-power-line-fault-detection/notebooks used deep learning techniques.

In most of the approaches, each signal is divided into equal chunks of data of size 1000. So in total, there would be 800 chunks. Now from each chunk statistical features are extracted which would result in a 3-dimensional array. Now, most of the solutions used LSTMs as the data is sequential. Some solutions have used attention layers and some notebooks used transformers.

Some approaches used signal denoising techniques like DWT(discrete wavelet transform) and some other solutions relied on finding peaks in the signal data and then models were built using deep learning techniques.

2.3.2. Improvements

As most of the solutions used deep learning techniques I’ve used machine learning models like boosting models. In terms of feature engineering I’ve used new techniques like power spectral density and Fourier transform and found the top features using peak detection. I’ve also used peak detection in the spectra of the signal which was already mentioned in a notebook. In the modeling part, I’ve four different machine learning models and other techniques like Randomsearchcv and stratifiedKfold cross-validation.

3. Exploratory Data Analysis

3.1.1. Examining Metadata

8712 rows
Consists of 4 columns:
signal_id — can be used as a key to join the target and signal data
id_measurement — different phases belonging to the same signal have the same id
phase — Each signal consists of 3 phases
target — partial discharge present or not

3.1.2. Examining phase and target

It can be observed that the data is highly imbalanced.

Let’s try looking at each phase

count plot of target values grouped by phase

Even if we look at each of the phases, target values are still imbalanced in each phase. The target values are distributed uniformly among the phases.

3.2. Analysis of signal data

Let’s look at the data given.

For analysis purpose, I’ve only imported data of 9 signals

It can be observed that each signal consists of 800,000 points and column number here corresponds to signal id in the metadata

let’s observe one phase of a three-phase signal

statistical details of one signal of phase 0

We can see that the signal oscillates at a mean of approximately 0. It can also be said the values of signal lie in the range of [-39,33] as they are the minimum and maximum values.

let’s plot the above signal

50% of the points are less than -1 and the mean is -0.9.
Some values are abnormally high and some values are abnormally low

Plotting all the 3 phases of a signal

phase 0 starts at 20 ,phase 1 starts at 0,phase 2 starts at -20. So it can be seen that phase 0 and phase 1 have a phase difference of 90, phase 0 and phase 2 have a phase difference of 180, phase 2and phase 1 have a phase difference of 90.

Plotting the means of signals with positive target values and signals with negative target values

I separated the signals with positive target values and took the mean of the 3 phases separately which is plotted below. Similarly for the signals with negative target values too.

means of signal with positive target values

means of signal with negative target values

observations of above 2 plots

A lot of difference can be found now between the signals with partial discharge(target=1) and signals without(target=0). The following can be observed:
1. In case of target value 0(no partial discharge), the values lie within an interval of 0.2 or 0.3 but in case of target value 1 the values lie in an interval of approximately 1.5
2. In the first plot, the signals with different phases are separated with a certain distance but in the second plot, the signals with different phases are almost overlapping.
The above 2 differences can be added as features for our model

Looking at the KDE plots of statistical features of our signal data with respect to the target value.

red : target 0,blue : target 1

Mean:

If the mean of the signal is between less than -2 or more than 0.5 it is more likely that there is no partial discharge.

Standard deviation:

It is more probable that std is greater than 15 if there is no partial discharge. From both mean and std it can be said that the signal is spread more if there is no partial discharge.

Minimum of signal:

if the minimum of the signal is greater than -50 there’s more probability of no partial discharge.

Maximum of signal:

If the maximum of signal is less than 50 there’s more probability of no partial discharge.

Bandwidth of a signal(mean-standard deviation):

3.3. Feature Engineering

Refernce:https://www.kaggle.com/junkoda/handmade-features

3.3.1.Spectra of a signal:

For each signal, we calculate the mean and percentile for every chunk of 1000 values

Peak Interval:

Then, within the 800 chunks of the spectra, the peak interval of width=150 which contains the maximum deviation in the max — mean spectrum.

code snippet taken from:https://www.kaggle.com/junkoda/handmade-features

From the peak interval calculated above, we extract features like mean and max
Instead of considering each phase independently, we combine all the 3 phases of a signal

3.3.2. Fast Fourier Transform:

signal processing reference:http://ataspinar.com/2018/04/04/machine-learning-with-signal-processing-techniques/

used to convert a signal from the time domain to the frequency domain

The graph has to be zoomed in as there are some amplitudes which are very large compared to others and there are 400,000 frequencies
In the above graph, we can observe the amplitude values for different frequency values

3.3.3. Power spectral density:

Similar to fft but this also considers the power distribution at each frequency

power spectral density frequency spectrum

In the above graph, we can observe the power spectral density values for different frequency values
There are fewer peaks in the above graph compared to fft but the peaks in the above graph can also be used as features for our model.

we extract the above features from our train data set along with some statistical features

we combine the phases by taking the average of the features that we calculate over each phase

From the Fourier features, we only consider the top 10 peaks, and similarly, from the PSD features, we consider only the top 10 peaks.

After extracting all the features from train data we get an array of features of shape (2904, 76) and extracting from the test data we get an array of shape (6779,76)

Now we can train and test models upon the above features

4. Modeling

4.1. Logistic Regression:

We can first use stratified k fold cross-validation to find the best hyperparameter and then train our model using the best hyperparameter. Then we can calculate the score on test data. The metric used here is Matthews correlation coefficient as explained above

The result of calculating the average MCC score of the above 5 models is 0.668 ± 0.031

4.2. Random Forest Classifier:

In this case, as there are multiple hyperparameters we can use Random Search cv to find the best hyperparameter.

Now we can use the best model found above to train and test on our data

we can use the best model and use stratifiedKfold to calculate the average MCC score which is 0.701 ± 0.028.

We can next use boosting models like the catboost classifier and LightGBM classifier to improve the MCC score.

4.3. LightGBM Classifier:

As we did above for random forest classifier we repeat the same process. First, we do Randomsearchcv and find the best hyperparameters.

We find the best model which is

If we use the above model we get an average MCC score of 0.714 ± 0.070

4.4. Catboost Classifier:

Applying Randomsearchcv to find the best hyperparameters.

The best parameters found are

{‘depth’: 9, ‘iterations’: 400, ‘l2_leaf_reg’: 4, ‘learning_rate’: 0.01}

Using these parameters we can train our model. Upon training, we get an average MCC score of 0.724 ± 0.056 on the cross-validation set

This is the highest cross-validation MCC score we got so far:

4.4. Comparison of all 4 models:

So we can use the Catboost classifier to classify the test dataset and submit the results.

4.5 Prediction on the test dataset

We replicate the target value 3 times for each signal as there are 3 phases. We can then create a dataframe and submit the results.

4.6 Result:

I got a private score of 0.641 and a public score of 0.63813

5. Future works

As the data is sequential we can improve the score using deep learning techniques like LSTM layers and then improve on it using attention layers and transformers.

In terms of feature engineering, we can try other techniques like wavelet transform, use signal denoising techniques.we can also consider more statistical features of the signal.

As this is similar to audio data we can also try out techniques like MFCC (mel frequency cepstrum coefficients).

Different kind of features from the peaks array like mean, max, percentiles,peak_count,sawtooth_rmse_mean and https://www.kaggle.com/junkoda/handmade-featuresetc can also be calculated to improve the score

6. References

http://ataspinar.com/2018/04/04/machine-learning-with-signal-processing-techniques/

https://www.kaggle.com/junkoda/handmade-features

https://www.kaggle.com/braquino/5-fold-lstm-attention-fully-commented-0-694