Improving meteorological and ocean models with Machine Learning

Part 2: Applying Deep learning to enhance visibility variable predicted from the meteorological model

Jorge Robinat
Analytics Vidhya
5 min readSep 21, 2019

--

As we can see in the last article, part 1, the model performance of the variable visibility is inferior. We are going to evaluate the performance of the model and try to improve it with Deep learning.

We can start defining a data frame from my Github Account:

The variables of the data frame are explained in the article part 0. Our independent variable will be “visibility_o,” and the independent variables will be the variables predicted by the model with an extension “_p.” Therefore:

master_f contains only the variables that we are interested in. We can plot the matrix of correlations using Seaborn library:

And the results are:

Correlation between visibility observed and predicted is weak. We can asses our meteorological model defining the same parameters to asses a machine learning algorithm. It´s vital to forecast when visibility is lower than a threshold. A binary classification problem could be solved. Our goal would be to find when visibility is smaller than a threshold. First, we define the following thresholds: 50,500,1000,5000 meters and evaluate the model capacity to predict them.

The results are:

From the data above we can see different variables to evaluate the meteorological model performance. For instance, if we look at threshold 500 meters (visibility less than 500 meters) at column support: 3034 hours with visibility less than 500 meters and 58222 hours with more than that. Looking at the confusion matrix, the meteorological model was correct 548 times. It predicted 2189 visibility less than 500 meters wrong. Also, there are 2486 hours with actual visibility of less than 500 meters wrong forecast.

I plot the results in a ROC and get the AUC to asses the model results for several thresholds (visibility ranges).

We can build a neural network to enhance the results. We define dependent and independent variables. Dependent variable visibility observed and independent variables, variables forecasted by the model. Let´s do it :

We build a neural network and display the results:

It seems the neural network performs well. The validation loss is going down, and validation recall is going up. Now we are going to plot two box plots. One depicts the values obtained by the neural network when the y test is one that means visibility below threshold (500 meters in our case). The other one represents the values obtained by the neural network when the y test is zero (visibility more than 500 meters). The ideal result should be that in the first case, the neural network predicted high values nearly one, and in the second case values near zero. The code is:

We are looking for a threshold that discriminates between 1 and 0. I choose 0.55 following indications from the figure above. If the neural network predicts a value of less than 0.55, it means visibility more than 500 meters and vice versa. The code:

We get:

It seems the parameters slightly better than the meteorological model. Last, we plot the ROC curve and calculate the area under the curve (AUC):

And get:

Conclusion

Perhaps a step to improve a meteorological model with deep learning algorithms…

Thank you!

--

--