Creating a Model for Weather Forecasting Using Linear Regression

Ashan Lakmal
Oct 9, 2020 · 4 min read

Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models differ based on — the kind of relationship between dependent and independent variables, they are considering and the number of independent variables being used.creates a model that forecasts weather/temperature based upon some features mainly including Humidity, Ppm, and Air quality index AQI or PM2.5. We searched for a lot of datasets that contain all those features but found none. What we found were two different datasets containing the following features: -


This data set contains features like Weather temperature, humidity, and AQI(PM2.5) as the target variable. This data set was too short just 150 entries.


This data set contains almost 24 features including Ppm, Humidity, and weather temperature as a target variable.


We cannot record real-time data due to pandemic, nor do we have any data set that contains all required features in one place. One possible solution was to combine both data sets so that we can create a final data set that contains all the required features. In order to do so, we must create two separate models. One model that is trained on data-set-1 and predicts the AQI(PM2.5) value and the other data set into which the values of PM2.5 will be embedded to get the final desired data set.

Program Structure

We created a linear regression model and train it on data set-1 to predict PM2.5 values. Before that, we plotted a heat map to check the correlation between features and target variables and found out that only temperature and humidity were in some correlation with PM2.5 (target variable ). So the model was trained using these two features. The following shows the heat map of it.

Heat map

We saved this model as a pickle file to use it later. Pickle is a python module used to store objects. We save the model at maximum accuracy because every time we ran the program accuracy varies a little bit, so it is always a nice practice to save the best accuracy model to use it again.

Gather data

Data-set2 now needed to be embedded with PM2.5 values. So we picked temperature and humidity columns from dataset-2 and give it to our trained linear regression model to get values of PM2.5. in this way we created a final data set that now has all features including Ppm, Humidity, PM2.5.Now we trained another linear regression model on this final data set with Temperature as the target variable. As before we again plotted the heat maps to check the correlation of features and target variables to throw out unnecessary features.

The model when trained gave 93% accuracy which is quite good. But as it is not real data rather it is just sample data so the model might not predict very accurately on real-time data. To overcome this, we must retrain our model on real-time data, and then it will be good to go.

Here is the code snippet of the training method that uses here.

lis_drop = [ 'Date2','Time3','Weather_Temperature6', 'Exterior_Entalpic_120','Exterior_Entalpic_221', 'Exterior_Entalpic_turbo22','Day_Of_Week\n'
'Lighting_Comedor_Sensor11' , 'Lighting_Habitacion_Sensor12', 'Precipitacion13' , 'Meteo_Exterior_Crepusculo14']
features = []
for i in data:
# print(i)
if i not in lis_drop:
features.append( i )
print (features)
x = np.array(data[features ])
y = np.array(data['Weather_Temperature6'])
print(x.shape , y.shape)#x_train , x_test , y_train , y_test = sklearn.model_selection.train_test_split(x , y , test_size = 0.1 )
#print(x_train.shape , y_train.shape)
#print(x_test.shape , y_test.shape)
while True:
x_train , x_test , y_train , y_test = sklearn.model_selection.train_test_split(x , y , test_size = 0.2 )
linear = linear_model.LinearRegression(), y_train)
acc = linear.score(x_test , y_test)
if int(acc*100) > 94:
predictions = linear.predict(x_test)for i in range(len(predictions)):
print('PREDICTED WEATHER : '+str(predictions[i]) ,'\t','ACTUAL WEATEHR : '+str(y_test[i]))


From these code snippets, we can train the data and get an approximately 93% accurate model for weather prediction. Also, forget the more accuracy need to improve the algorithm with neural networks with the Keras LSTM model. That will be worked fine rather than going with the linear regression.

The Startup

Get smarter at building your thing. Join The Startup’s +793K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +793K followers.

Ashan Lakmal

Written by

Former Software Engineer @axiatadigitallabs |SLIIT | Electrical and Electronic Engineering | Contact me —

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +793K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store