Sea Level Predictor Using Python
Question:)
You will analyze a dataset of the global average sea level change since 1880. You will use the data to predict the sea level change through year 2050.
Use the data to complete the following tasks:
- Use Pandas to import the data from
epa-sea-level.csv
. - Use matplotlib to create a scatter plot using the
Year
column as the x-axis and theCSIRO Adjusted Sea Level
column as the y-axis. - Use the
linregress
function fromscipy.stats
to get the slope and y-intercept of the line of best fit. Plot the line of best fit over the top of the scatter plot. Make the line go through the year 2050 to predict the sea level rise in 2050. - Plot a new line of best fit just using the data from year 2000 through the most recent year in the dataset. Make the line also go through the year 2050 to predict the sea level rise in 2050 if the rate of rise continues as it has since the year 2000.
- The x label should be
Year
, the y label should beSea Level (inches)
, and the title should beRise in Sea Level
.
Unit tests are written for you under test_module.py
.
The boilerplate also includes commands to save and return the image.
Development
For development, you can use main.py
to test your functions. Click the "run" button and main.py
will run.
Testing
We imported the tests from test_module.py
to main.py
for your convenience. The tests will run automatically whenever you hit the "run" button.
Answer:)
Below is an implementation for the tasks mentioned using Pandas, Matplotlib, and Scipy. You can use this as a starting point and adapt it to your specific needs:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
def draw_plot():
# Import the data
df = pd.read_csv('epa-sea-level.csv')
# Scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(df['Year'], df['CSIRO Adjusted Sea Level'], label='Original Data')
# Linear regression for all data
slope, intercept, r_value, p_value, std_err = linregress(df['Year'], df['CSIRO Adjusted Sea Level'])
x_pred = list(range(1880, 2051))
y_pred = [slope * x + intercept for x in x_pred]
plt.plot(x_pred, y_pred, label=f'Original Fit (R-squared: {round(r_value**2, 2)})')
# Linear regression for data from year 2000 onwards
df_recent = df[df['Year'] >= 2000]
slope_recent, intercept_recent, _, _, _ = linregress(df_recent['Year'], df_recent['CSIRO Adjusted Sea Level'])
y_pred_recent = [slope_recent * x + intercept_recent for x in x_pred]
plt.plot(x_pred, y_pred_recent, label='Recent Fit')
# Set labels and title
plt.xlabel('Year')
plt.ylabel('Sea Level (inches)')
plt.title('Rise in Sea Level')
plt.legend()
# Save and show the plot
plt.savefig('sea_level_plot.png')
plt.show()
# Uncomment the following line when running in a local environment
# draw_plot()
Make sure to replace 'epa-sea-level.csv'
with the actual path to your dataset. This code defines a function draw_plot()
that reads the dataset, creates a scatter plot, and overlays two lines of best fit using linear regression. You can run this function in your main.py
file or test it individually to check if it produces the expected results.