U.S. electric disturbance events analysis (part 2): exploratory data analysis

Using Python to find trends and patterns in the data.

Diogo Reis
8 min readDec 29, 2023
hPhoto by Matthew Henry on Unsplash

In the previous post, I showed how I cleaned and prepared the data about the electric disturbance events in the United States. Here, I will proceed with my analysis describing some basic statistics and answering questions based on the information of our dataset.

Initial analysis

I started my analysis using the .describe() function on the columns related to the loss of demand, the number of customers affected and the duration of the event. The result is shown in the figure below.

The figure shows metrics such as mean, median (50%), maximum, and minimum values in the selected columns.

I also used the .plot() function to generate histograms of each of these columns to visually understand how the values are distributed. The corresponding histograms are shown below.

  • Demand Loss
  • Customers affected
  • Event duration

You can see that most of the values are concentrated around zero, and there are also outliers in each distribution. I did not remove them because in this context, large values are related to major events (e.g. blackouts) and I am interested in them.

Analyzing the histograms and summary statistics, I found the following:

  • Half of the electric disturbances did not affected customers or caused demand loss.
  • Over the past four years, an average of approximately 56,000 customers have experienced electric disturbance.
  • Half of the electric disturbances last less than 4 hours.
  • 25% of electric disturbances in the last four years have lasted more than 15 hours.

Another information that can also help us to understand the data is the correlation. I used the .corr() function in the analyzed columns and plotted the correlations in the form of a heatmap.

The correlations between the variables show that:

  • The number of customers affected and the loss of demand are weakly correlated (0.55). This suggests that an increase in the number of customers affected does not necessarily mean an increase in lost demand.
  • The duration of the event has a small effect on the number of affected customers and the loss of demand.
  • Note: This analysis is only valid for linear correlations.

Answering questions with the data

Since we have a general overview of the data, we will answer some questions with our dataset. The code is available in my GitHub repository.

  1. What is the number of events per year?

First of all, it is interesting to see if the number of electric disturbance events is increasing or not, and I have done this by plotting a bar graph of the number of events each year, as shown in the figure.

About the results:

  • In the last three years the number of electric disturbances are quite close.
  • It is important to remember that severe blackouts occurred in 2020 and 2021, which did not occur in 2022. So, it could be an indicator that the number of events has increased (excluding major events).

2. What is the number of customers affected by each year?

Another metric that can be seen from year to year is the number of customers affected. The numbers in the chart below are in millions.

About the results:

  • The number of customers affected appears to be decreasing, but in 2020 and 2021 there were major outages, which are events that affect a large number of customers.
  • If we ignore the years with major events and compare the year 2022 with the year 2019, the year 2022 shows an increase in the number of customers affected.

3. What is the number of electric disturbances by month each year?

We can also check how the number of events changes each month in the analyzed period. I used the Plotly library to create an interactive visualization of each month.

About the results:

  • The two largest numbers of events (August 2020 and February 2021) were caused by blackouts in California and Texas, respectively.
  • In 2021 and 2022, the number of disturbances increases as summer approaches and tends to decrease in the fall.

4. What is the number of disturbances by season each year?

Season is a factor that directly affects the power system, as weather is one of the main causes of power problems. For this question, I created a stacked bar chart using the Plotly library to see how the electrical disturbance events are distributed by season in the four years analyzed.

About the results:

  • In the last four years, summer season had more disturbance events in three years. This shows that summer is a problematic season, which makes sense because high temperatures increase load demand (e.g. air conditioning use).
  • From 2019 to 2022, the number of electric disturbances in the fall season are similar.

5. What is the number of events by U.S. state?

Another piece of information that can help understand where these perturbations are occurring is counting them by state. For this question, I used a choropleth map from the Plotly library to create a visualization where each state is colored according to the number of events.

I also made a list of the top 10 states by number of events.

Abou the results:

  • Texas and California are the states that registered more electric disturbance events, with 211 and 195 events, respectively.
  • Wyoming and Vermont had only 3 events during the period.
  • Of the 10 states shown, 8 are among the most populous according to the U.S. Bueral Census

6. Which period of the day experiences the most electric disturbances?

During the research, I was curious about the number of events in a certain period of the day. So I grouped the events by hour and considered two conditions: i) counting all events ii) counting only the events that affected customers. The figure below shows the comparison between these two conditions in two different curves.

About the results:

  • Considering all events, the period of the day with the highest electric disturbances occurs between 11 am and 12 am.
  • However, when taking only the events which affected customers, the period occurs between 18 pm and 19 pm.

7. What are the most common types of events?

In the dataset, some event types are characterized by two causes, for example, severe weather and transmission interruption. To answer this question, I decided to separate the events and count them individually to see the total number of occurrences of a particular event type.

About the results:

  • Severe weather is the most commom event type.
  • System operation, the second in the ranking, means a complete loss of the ability to monitor or control the electrical system at its control center.
  • Vandalism, which is a difficult event to control, completes the top three most common event types.

8. Which event types caused more Demand loss?

As we saw in the initial analysis of this post, some electric disturbances did not cause demand loss. Therefore, in this question, I only consider the events that caused demand loss and calculate the percentage of each type of event in these events.

About the results:

  • Severe weather (64.6%), system operations (16.8%) and transmission interruptions (7.2%) were responsible for the largest share of lost demand.
  • Although vandalism, suspicious activity, and actual physical attacks are more frequent events, they have little impact on demand loss.

9. What is the mean/median time of restoration by each event type?

Another characteristic we can examine is the restoration time required for each of them. Since our dataset contains extreme events (i.e. outliers), I compared the mean and median restoration times.

  • Considering the mean values
  • Considering the median values

About the results:

  • In both metrics (mean and median), fuel supply defieciency was the problem that took the longest restoration time. Half of the incidents took up to 27 hours to recover. This indicates the complexity of resolving this type of incident.
  • Problems envolving cyber events are in the top five in both analyses.
  • Events related to severe weather also have a high time to restoration.

10. What is the percent difference for each event type between 2019 to 2022?

In this question, we are interested in comparing the years 2019 and 2022 to see if certain types of events have increased or decreased. This was done by calculating the percentage difference between them.

About the results:

  • Suspicious activity (223.53%), generation inadequancy (100%) and actual physical attack (87.5%) were the types of events that increased the most.
  • Events involving severe weather and transmission/distribution interruptions decreased significantly in the period.

Conclusions

The following aspects can be highlighted through the data analysis performed:

  • Severe weather is the main cause of electric disturbance events, and extreme changes in the climate can make this problem even worse;
  • Periods of the day such as 11–12 p.m. and 18–19 p.m. should be given special attention;
  • Events caused by a deficiency in fuel supply take longer to be restored;
  • The number of electric disturbances caused by cyber-attacks and suspicious activities is on the rise.

Thanks for reading! Please feel free to leave your suggestions.

I hope you find this post usefull!

  • Linkedin — Connect with me on Linkedin
  • GitHub — Visit my Github to see my other projects

--

--