A Case Study Solution — Data Analyst Interview Elimination Round -1.

Data Analysis Project — Power Plants Globally

Kamireddy Mahendra
ILLUMINATION

--

“Energy can neither be created nor destroyed, It can only transformed from one form of energy to another form of energy”
— Law of Conservation of Energy Principle

“I wish we could understand this principle and ensure that strictly following the above principle in our life, will lead us to succeed in our life”.

Here I am doing the project of power Plants global data analysis to do ad-hoc analysis and find solutions for different Problems.

These are the steps I followed to do the data analysis.

  1. Gathered data from one of the open-source data Platforms i.e. Kaggle.
    Click Here to get the data set.

2. Analysed the data and Started Cleaning the data by removing the unnecessary columns and changing the column names as those will
gives more meaningful conventions for data analysis using Excel.

3. Assume you were a stakeholder and ask yourself as many questions to improve the plant outcomes and power utilization and efficiencies and many more questions that should be aiding the total business.

4. Import the data into any analytical tools like SQL DBs like MySQL, MSSQL Server, Postgres, or any visualization tools like Power BI Tableau, QlikView, or Looker. In this project, I am using SQL Server and Power BI.

These are the questions that I asked myself.

Mean Life Studies by author Kamireddy Mahendra.

1. Identify the top 10 countries with the highest total estimated generation from 2013 to 2017. Include the total estimated generation for each country.

select top 10
country_code,
country_name,
round(sum(estimated_generation_gwh_2013 +
estimated_generation_gwh_2014 +
estimated_generation_gwh_2015 +
estimated_generation_gwh_2016 +
estimated_generation_gwh_2017),2) as total_generation
from ['Powerplants_Globally']
group by country_code, country_name
order by 3 desc
Image designed by author Kamireddy Mahendra
Mean-Life-Studies by author Kamireddy Mahendra.

2. Find the plant with the highest capacity in each country. Include the plant name, capacity, and the corresponding country.

with cte as (
select
country_code,country_name,plant_name, capacity_mw,
row_number() over(partition by country_code order by capacity_mw desc) as plant_rank
from ['Powerplants_Globally'])
select country_code, country_name, plant_name, capacity_mw
from cte
where plant_rank = 1 order by 4 desc
Image designed by author Kamireddy Mahendra

https://youtu.be/-om6WXXqaUE

3. Calculate the average capacity of each type of plant and rank them in descending order.

select
type_of_plant,
round(avg(capacity_mw),3) as avg_capacity_mw , rank() over( order by avg(capacity_mw) desc) as rank
from ['Powerplants_Globally']
group by Type_of_Plant
Image designed by author Kamireddy Mahendra

4. Identify the owners who have plants in more than one country. Include the owner’s name and the count of unique countries they operate in.

select 
owner,
count(distinct country_code) as num_countries
from ['Powerplants_Globally']
group by owner
having count(distinct country_code) > 1 and owner is not null order by 2 desc
Image designed by author Kamireddy Mahendra

In addition to this, we can also find those countries that have the same owner using the STRING_AGG function.

SELECT
owner,
COUNT(DISTINCT country_code) AS no_of_countries,
STRING_AGG (DISTINCT country_name) AS corresponding_countries
FROM ['Powerplants_Globally']
WHERE owner IS NOT NULL
GROUP BY owner
HAVING COUNT(DISTINCT country_code) > 1
ORDER BY no_of_countries DESC;
Image designed by author Kamireddy Mahendra

5. find the year with the highest total estimated generation across all plants. The plant should be commissioned and return the commissioning year and the corresponding total estimated generation.

Select top 1
commissioning_year,
sum (estimated_generation_gwh_2013 + estimated_generation_gwh_2014 + estimated_generation_gwh_2015 +
estimated_generation_gwh_2016 + estimated_generation_gwh_2017) as total_generation
from ['Powerplants_Globally'] where commissioning_year is not null
group by commissioning_year
order by total_generation desc
Image designed by author Kamireddy Mahendra

6. Identify the top three plant owners with the highest total estimated generation from 2013 to 2017. Include the total estimated generation for each owner.

select top 3
owner,
sum(estimated_generation_gwh_2013 + estimated_generation_gwh_2014 + estimated_generation_gwh_2015 +
estimated_generation_gwh_2016 + estimated_generation_gwh_2017) as total_generation
FROM ['Powerplants_Globally'] where owner is not null
group by owner
order by total_generation desc
Image designed by author Kamireddy Mahendra

7. Calculate the year-over-year growth rate of total estimated generation for each country. return country code, country name, commissioning_year, and the average growth rate.

with cte as (
select
country_code,country_name,commissioning_year,
estimated_generation_gwh_2013,
lag(estimated_generation_gwh_2013) over (partition by country_code order by commissioning_year)
as prev_year_generation
from ['Powerplants_Globally']
)
select
country_code,country_name,commissioning_year,
round(avg((estimated_generation_gwh_2013 - prev_year_generation) / prev_year_generation),2) as avg_growth_rate
from cte where commissioning_year is not null
group by country_code, country_name,
commissioning_year order by avg_growth_rate desc;
Image designed by author Kamireddy Mahendra

8. Find the plant with the highest capacity in each type of plant. Include the plant name, capacity, and the corresponding plant type.

WITH cte AS (
SELECT
Type_of_Plant,
Plant_name,
capacity_mw,
ROW_NUMBER() OVER (PARTITION BY Type_of_Plant ORDER BY capacity_mw DESC) AS plant_rank
FROM ['Powerplants_Globally']
)
SELECT
Type_of_Plant,
Plant_name,
capacity_mw
FROM cte
WHERE plant_rank = 1 order by 3 desc
Image designed by author Kamireddy Mahendra

9. Identify the top 5 types of plant (e.g., solar, wind) with the highest total estimated generation. Include the source, type of plant, and the total estimated generation.

solution:

select top 5
source, Type_of_Plant,
sum(estimated_generation_gwh_2013 + estimated_generation_gwh_2014 + estimated_generation_gwh_2015 +
estimated_generation_gwh_2016 + estimated_generation_gwh_2017) as total_generation
from ['Powerplants_Globally']
group by source, Type_of_Plant
order by total_generation desc
Image designed by author Kamireddy Mahendra

5. Finally find out the solutions to all problems as mentioned above.

Image designed by author Kamireddy Mahendra

I found many insights regarding power generation along with different types of sources and types of plants in multiple aspects.

You can see my Github profile to access more projects. Don’t forget to follow my Github to access all projects and to be in touch with upcoming projects as well.

SQL Use Case — Data Engineer Interview.

The Most Commonly Asking Big Data(Apache Spark) Concepts in Data Engineer Interviews.

Python Coding Questions for Data Engineer Interview Part-I (Easy Level)

I hope you will Bring your hands together to create a resounding clap, to show your support and encouragement for me to share even more valuable content in the future.

Follow me and subscribe to catch any updates from me instantly.

Thank you :)

--

--

Kamireddy Mahendra
ILLUMINATION

Data Engineer - Analyst (Upcoming Data Scientist), Content Writer & Freelancer (Projects) + Freelance Tutor / Trainer.