Enhancing Power Grid Resilience in Maryland through Data science

Ameerah Dasti
INST414: Data Science Techniques
12 min readDec 17, 2023

Appendix:

  • Explain the decision you are supporting, so the reader knows what your actionable insight is. If you prepared a thorough proposal and intermediate report, then you may be able to borrow some material from there. — Octavio
  • Explain what data you explored. Where did it come from, how did you process it? If your data collection report was thorough, you can likely reuse much of this material. — Octavio
  • What is the key ideas from the course did your project build on? This should describe the rational behind and what methods you applied. — Richmond
  • Explain what analysis you did to support the actionable insight you wanted to extract. What models did you apply, how did you evaluate them, etc.? — Nikita
  • Explain what answer you have for the stakeholder who needs to make the decision you are supporting. This is often greatly aided through charts of experiments, but you should also include a clear description of your insights. — Ameerah
  • Conclude with a discussion of the limitations of your project, the data, your analysis, ethics, etc — Kofi

Introduction

Power outages can be a significant inconvenience for residents and businesses alike. They can disrupt daily activities, cause financial losses, and even threaten public safety. That’s why our team recently conducted a comprehensive analysis to assist utility companies operating in Maryland in preventing and responding to power outages.

To achieve this goal, we collected a dataset of over 86,000 rows on power outages in Maryland and combined it with data extracted from the VisualCrossing weather API. In order to have a more comprehensive dataset, we only kept the data that affected more than 4 customers. After taking the database entries, we sifted through them, removing any extraneous data points, such as power outages that only affected a single person. We reformatted and reformed the data as a new CSV file. Then, we used this file alongside information taken from the weather API to create a third version of our CSV file, complete with information about the weather during those events at their locations. Our primary objective was to identify trends in power outages and vulnerable regions to improve resource allocation and infrastructure. To accomplish this, we delved into historical data on power outages, including the frequency, duration, and location of outages. This helped us understand the root causes of outages, such as weather events, equipment failures, and other factors.

Through this analysis, we can help companies pinpoint the best locations in Maryland to build facilities that will store and provide resources for power outages. Additionally, we will be able to recommend which times of the year to provide more resources and where staffing can be reduced. Our analysis will help advocate for reliable and more accessible power distribution systems and facilities. Additionally, this will help businesses take better precautions to minimize the adverse effects of outages.

Our analysis also involved a geographic assessment that aimed to identify regions that are most vulnerable to power outages. We used techniques to map the areas that are most prone to power interruptions, such as remote areas with limited access to infrastructure and regions that are exposed to extreme weather conditions. We also analyzed the socio-economic factors that can contribute to power outages, such as population density, income levels, and access to alternative power sources. By improving the reliability of the power grid, we can ensure that residents and businesses in Maryland have access to uninterrupted power supply, even during emergencies. With our detailed analysis, utility companies in Maryland can now take proactive measures to prevent power outages, minimize their impact, and respond more effectively.

Our plan is to conduct a comprehensive analysis of outage patterns, identifying the most affected areas and specific time periods when these outages occur. This analysis provided valuable insights that can help utility companies operating in Maryland make better decisions about infrastructure investment, resource allocation, and emergency response planning.

In addition to identifying regions that are most vulnerable to power outages, we also analyzed the socio-economic factors that can contribute to power outages. For example, we found that areas with a high population density and low income levels were more likely to experience power outages, as residents in these areas may be less able to afford backup power sources or may not have access to alternative transportation during outages.

Our analysis also identified specific times of the year when power outages are more likely to occur. For example, we found that outages were more common during extreme weather events, such as hurricanes and snowstorms. By identifying these patterns, utility companies can allocate resources more effectively and prepare for potential outages in advance.

Overall, our analysis provides a comprehensive roadmap for improving the reliability of the power grid in Maryland. By identifying vulnerable regions, specific times of the year when outages are more likely, and socio-economic factors that contribute to outages, we can help utility companies make better decisions about infrastructure investment, resource allocation, and emergency response planning. Ultimately, our goal is to ensure that residents and businesses in Maryland have access to uninterrupted power supply, even during emergencies, and that power outages are minimized and their impact is reduced.

The data explored in this analysis is a dataset of 1078 rows containing power outages in different area codes in Maryland from September 10, 2023, to September 14, 2023. Each row represents the number of outages on a given date in a specific area code, along with associated weather data on that date. This weather data was joined with the outage data to enable an analysis of the impact weather conditions have on power outages.

The outage data came from a CSV file containing the area codes in Maryland as columns with the number of outages per day as values. The date was contained in the column header. The weather data was collected using an API from the open-meteo weather service, which returned historical daily weather details based on latitude and longitude coordinates. This data included maximum temperature, minimum temperature, wind speeds, wind gusts, snow depth, and precipitation measurements.

To process this data, the area code was extracted into its own column along with the date and number of outages. The various weather parameters were likewise put into separate columns, and standard value formats were enforced. The date was converted from the string into datetime format to enable time-series analysis. The textual description and conditions values were kept as additional categorical variables related to weather.

Once compiled into a clean dataframe, we conducted an exploratory analysis to identify correlations between weather parameters and the number of outages. Summary statistics, such as the total number of outages per area code, were also calculated. This gave a baseline understanding of the relationships in the data, which zip codes were most outage-prone, the number of outages under different weather conditions, and so on. Additional preprocessing, like encoding categorical variables, was done before modeling.

In summary, a robust dataset was constructed, combining outage data sourced from utility providers and historical weather data from an open API. This raw data was processed through cleaning, structuring, and exploratory analysis to prepare it for predictive modeling and identifying insights to help explain and prevent power outages based on weather and location-based trends found in the data. The dataset was further enriched by combining it with weather data taken via API from open-meteo.com, which provided a better understanding of the weather conditions on the day of the outages and helped to identify the impact of such weather conditions on power outages.

Analysis + Answer for Stakeholder

After completing a thorough analysis using Python, we came to the conclusion that the best locations in Maryland to build facilities in were four different counties: Baltimore County, Montgomery County, Anne Arundel County, and Washington County. To find the best locations to build facilities in, we grouped the outage data by area code. This provided us with a sum of outages per area code. By sorting the area codes by the number of outages, we were able to find the top 10 area codes with the most power outages within the timeframe we collected data. After searching for the cities and counties of each area code, we found four common counties in the top area codes.

To gain further insight into which areas are best to build power facilities in, we utilized statistical methods to find the correlation between certain weather conditions and power outages. To begin with, we grouped power outages by weather conditions. By doing this, we found that “Rainy, Partially cloudy” was the weather condition associated with the most power outages. We also found that the correlation coefficient between power outages and the condition “Rainy, Partially cloudy” is 0.06. Although this does not indicate a strong correlation, it is still a positive correlation above 0. Furthermore, we grouped and sorted the top 5 area codes to find the count for “Rainy, Partially cloudy” for each area code. Among the top 5 area codes, we found that the counties with the highest “Rainy, Partially cloudy” count were Baltimore County and Anne Arundel County. Since there is a correlation between power outages and rainy and cloudy weather, it would be strategic for power companies to focus on building power facilities in areas where it is commonly cloudy with precipitation. Constructing facilities in such areas would be effective for gaining business and customers.

In addition to finding the correlation between power outages and “Rainy, Partially cloudy,” we used the same statistical method to find the correlation between power outages and maximum temperature, wind speed, wind gust, and precipitation. The highest correlation found was between outages and wind gusts, with a correlation coefficient of 0.08. The second highest feature was precipitation, with a correlation coefficient of 0.06. The feature that came in at third place was maximum temperature, with a correlation coefficient of 0.03. The lowest correlation found was between outages and wind speed, with a correlation coefficient of 0.004. The highest correlation coefficient being between outages and wind gusts suggests that power companies should target areas in Maryland that tend to have strong wind gusts. Although the three other correlation coefficients indicate a positive correlation, it’s important to target areas with higher counts of power outages.

Although the time frame that data was collected from is limited, we found that there was a peak of power outages on September 9th, 2023. Following this peak, the number of outages declined, however, it remained higher than the number of outages recorded before September 5th, 2023. In the future, it would be important for power companies to be prepared around early to mid-September for an increase in demand for supplies, such as backup power generators. To better visualize the relationship between outages and wind speed, wind gusts, and precipitation, we constructed scatter plots. The three scatter plots did not display any strong correlation or relationships between power outages and the selected features. The data points did not have a strong center and they did not have a clear mean or median. Moreover, the data did not appear to be skewed in any direction. There were no apparent extreme outliers. Overall, the scatter plots showed that there was not a strong linear relationship between power outages and wind speed, wind gusts, and precipitation. This aligns with the correlation coefficients that were calculated earlier. However, although there are no strong linear relationships between outages and other features, it is still important for power companies to focus on features that have the highest correlation with power outages and prioritize investment in areas where the feature is prevalent.

Overall, the four areas in Maryland where power companies should focus on building power facilities are Baltimore County, Montgomery County, Anne Arundel County, and Washington County. The four Maryland counties were found to have the highest number of power outages. Furthermore, these counties were found to have the highest count for the condition “Rainy, Partially cloudy.” This condition was the weather condition that was most associated with power outages. In addition to focusing on these areas for business and construction of facilities, power companies should consider running their business in areas with strong wind gusts. Of the features used for calculations, wind gusts had the highest positive correlation with power outages. Prioritizing business and investment in these areas will increase power companies’ chances of success and profitability.

The cornerstone of our project rested on utilizing K-means clustering to discern patterns and trends within power outages in Maryland. To improve utility companies’ preparedness and response to outages, we used K-means clustering to categorize regions based on shared characteristics, allowing them to allocate resources and improve infrastructure accordingly.

Our primary objective was to unveil trends in power outages, focusing on frequency, duration, and geographical hotspots. K-means clustering enabled us to identify distinct clusters of outages sharing similar characteristics, aiding utility companies in strategic decision-making. By employing geographic assessments, we pinpointed vulnerable regions susceptible to outages particularly those with exposure to extreme weather conditions. Leveraging the dataset, we provided utility companies with actionable insights, enabling them to make informed decisions on infrastructure investment, resource allocation, and emergency response planning. Our methodology, blending historical outage data and real-time weather insights, equips Maryland’s utility sector with a dynamic approach to mitigate and respond to power outages effectively.

Despite the insightful outcomes, our project does acknowledge certain limitations. The confined date range restricts the ability to capture long-term trends and seasonal variations comprehensively. The analysis also overlooks specific weather parameters that might contribute to outages, and the use of historical data may not fully encapsulate evolving weather patterns and technological advancements in the power grid. Ethically, our findings underscore the need for equitable solutions, recognizing the diverse needs of different communities. As we lay the groundwork for a resilient power infrastructure, our project serves as a catalyst for future research. Expanding temporal and contextual scope, integrating real-time data, and adopting predictive analytics will be crucial in navigating the dynamic landscapes of weather and technology.

Conclusion

Given the findings of our analysis, it is crucial to reflect on the limitations and ethical considerations of our analysis as well. Our study embarked on a journey to provide actionable insights for utility companies in Maryland, aiming to improve the resilience of the power grid against outages. However, our analysis did encounter certain limitations that are essential to develop a comprehensive understanding of our findings.

One of the primary limitations of our project is the limited date range of our data. Our dataset encompassed power outages in Maryland from September 10, 2023, to September 14, 2023. While this timeframe provided us with valuable snapshots of power outage incidents, it inherently constrained the scope of our analysis. The short duration of the dataset limited our ability to observe long-term trends and seasonal variations in power outages, which are crucial for comprehensive planning and resource allocation by utility companies. For instance, the data may not fully capture the variability of weather conditions across different seasons, which can significantly impact the frequency and severity of power outages.

Another limitation is the restricted details regarding weather conditions. While our dataset included essential weather parameters such as temperature, wind speeds, and precipitation, it lacked depth in capturing the full spectrum of weather-related factors that can influence power outages. Factors like lightning strikes, sea level pressure, wind direction, humidity levels, and localized weather phenomena were not accounted for in our dataset. This limitation means that our analysis might not fully encapsulate the complexity of weather-related impacts on power infrastructure.

Furthermore, our project’s usage of past data, while beneficial for understanding past trends, does not account for the evolving nature of weather patterns and power grid technologies. But, that can be normal when analyzing weather trends. Climate change, for instance, is leading to more frequent, severe, and unpredictable weather events, which our historical dataset may not adequately represent. Similarly, advancements in power grid technology and changes in infrastructure over time could alter the dynamics of power outages, making our current findings less applicable in the future.

Ethically, our analysis must be contextualized within the broader societal and environmental implications. While our goal is to aid utility companies in improving power grid reliability, it is important to recognize the diverse needs of different communities, especially those in vulnerable socio-economic regions. Our findings should be employed in a manner that promotes equitable access to reliable power, avoiding any unintentional exacerbation of existing inequalities.

In conclusion, our analysis, despite its limitations, provides a foundational understanding of the interplay between weather conditions and power outages in Maryland. It highlights the need for utility companies to adopt a dynamic and multifaceted approach in their planning and response strategies. Future research should aim to expand the temporal and contextual scope of data, incorporating longer timeframes and more detailed weather parameters. Additionally, there should be an emphasis on integrating real-time data and predictive analytics to adapt to the rapidly changing environmental and technological landscapes. Our project serves as a stepping stone towards a more resilient and equitable power infrastructure in Maryland, ensuring that residents and businesses can weather future storms with minimal disruption.

You can find the code for this analysis here: https://github.com/RichmondYeboah/INST414_Final/blob/master/INST414_Final_Project.ipynb

--

--