Data Science Project: Solar Panel, Chapter I: Electricity Generation Prediction With Arima Model

Alparslan Mesri
3 min readMay 22, 2022
Unsplash American Public Power Association

This article is written by Alparslan Mesri and Madhujith Arumugam

You can access data from here.

In this article, firstly we will examine solar panel energy data. Then we will make a prediction with the Arima model and forecast the power generation for the next few days.

Firstly we need to import needed libraries.

Then assign CSV files into variables. In this dataset, the time column is not assigned as date-time. We need to handle it. We also drop the Plant_ID column.

After this process, we copied our data to the group daily_yield column according to the data_time feature. We gave also new index.

Now our grouped data frame seems like that:

Before the running Arima Model, we need to execute the Augmented Dickey-Fuller Test. In this test, we examine does our data has a systematic pattern? The null hypothesis shows it has a unit root and the alternative hypothesis rejects the null hypothesis.

If the p-value is more than 5% then there is strong evidence against the null hypothesis, if it is less than 5% then we can say it is weak evidence against the null hypothesis.

From the above, we can conclude that the data is non-stationary. Hence, we would need to use the integrated concept, denoted by value ‘d’ in time series to make the data stationary while building the Auto ARIMA model.

We need to split data into train and test parts. Also, it would be good to see these data on the graph.

After it, we tune the auto_arima parameters. M equals 96. Because we do observations every 15 minutes. There are 4 observations within an hour and there are 96 observations per day.

About the d parameters, we adjust it to “1” because we need to turn the data into a stationary situation.

`P` is The order of the seasonal component for the auto-regressive (AR) model.

`D` is The integration order of the seasonal process.

`Q` is The order of the seasonal component of the moving average (MA) model.

Our arima model searched the best parameters and found ARIMA(4,1,0)(0,1,1) as the best model. If you look results, you can see its AIC point(1527) is minimum.

Here for prediction, we need to assign future_dates. We will forecast these days.

Now let's see the results. To understand it better we need to visualize the results.

So, It seems Arima predicted it quite well. As you see prediction and test values are very similar. The model made a forecast for the next few days too.

References:

[1]: https://www.kaggle.com/code/virosky/how-to-manage-a-solar-power-plant#Task-2:-Forecast

[2]: https://ademos.people.uic.edu/Chapter23.html#:~:text=ARIMA%20models%20are%20typically%20expressed,growth%2Fdecline%20in%20our%20data

[3]: https://pypi.org/project/pmdarima/

[4]: https://www.kaggle.com/datasets/anikannal/solar-power-generation-data?resource=download&select=Plant_1_Generation_Data.csv

--

--