Weather Prediction using storm images (Part 3: Hyperparameter tuning and kernel testing)

Shuvam Das
deepkapha notes
Published in
6 min readApr 19, 2023

Gaussian Process Regression: Fitting and Predicting

gp1.fit(X_train, y_train)
GaussianProcessRegressor(alpha=0.01,
kernel=WhiteKernel(noise_level=0.09) + 1.41**2 * ExpSineSquared(length_scale=1, periodicity=40),
n_restarts_optimizer=10, normalize_y=True)
# Generate predictions.
y_pred, y_std = gp1.predict(X, return_std=True)
data_df['y_pred'] = y_pred
data_df['y_std'] = y_std
data_df['y_pred_lwr'] = data_df['y_pred'] - 2*data_df['y_std']
data_df['y_pred_upr'] = data_df['y_pred'] + 2*data_df['y_std']

After generating the prior samples, the code fits the Gaussian process using the fit() function of gp1. This function trains the Gaussian process on the input data to generate predictions based on the specified kernel. The GaussianProcessRegressor object is also printed, which contains the hyperparameters and kernel of the fitted Gaussian process.

Once the Gaussian process is fit, the code generates predictions on the entire dataset using the predict() function of gp1. This function returns the predicted mean values and standard deviation of the Gaussian process. The predicted mean values are stored in the y_pred column of data_df, and the standard deviation values are stored in the y_std column. The code also calculates the upper and lower bounds of the predicted values by adding and subtracting twice the standard deviation from the predicted mean, respectively. These bounds are stored in the y_pred_lwr and y_pred_upr columns of data_df.

Plotting the Predictions of a Gaussian Process Regression Model

This code block is generating a plot of the predictions made by the Gaussian Process Regression model. The plot contains three lines representing different variables. The first line is the actual data represented by the variable y1. The second line represents the predicted values generated by the GP model and is labeled as y_pred. Finally, the shaded area between the upper and lower bounds represents the 95% credible interval for the predictions.

To generate the credible interval, the code uses the ‘fill_between’ method of Matplotlib to fill the space between the upper and lower bounds. These bounds are calculated using the predicted values and their standard deviations, which are returned by the GP model’s ‘predict’ method.

The code also includes a vertical line at the index corresponding to the end of the training data. This line separates the training and testing data, which is split based on the ‘prop_train’ parameter, which is set to 0.7. Overall, this code block generates a visual representation of the GP model’s performance on the given dataset. It shows how well the model’s predictions match the actual data and provides an estimate of the uncertainty in those predictions.

Hyperparameters

In this code, we did hyperparameter tuning for a Gaussian Process (GP) regression model. We defined the GP kernel using the WhiteKernel, ConstantKernel, and ExpSineSquared classes from the sklearn.gaussian_process.kernels module. We specified the hyperparameters for each kernel and combined them to create a composite kernel. The hyperparameters that were tuned were noise_level, constant_value, length_scale, and periodicity.

We also used the GaussianProcessRegressor class from sklearn.gaussian_process module to fit the GP model to the data. We specified the number of optimizer restarts, n_restarts_optimizer, and the level of normalization applied to the target values, normalize_y. Finally, we generated predictions and calculated the standard deviation of the predicted values to obtain a credible interval.

The hyperparameters of the GPR model are the kernel parameters.

The kernel is a function that measures the similarity between two input points in the feature space. In this code, a kernel is defined as a combination of two kernels, the White Kernel and the Exponential Sine Squared Kernel, which are added together. The White Kernel is responsible for modeling the noise in the data. It has a single hyperparameter, noise_level, which controls the amount of noise in the data. This hyperparameter is initialized to 0.3² and is bounded between 0.1² and 0.5².

The Exponential Sine Squared Kernel is responsible for modeling the periodicity in the data. It has two hyperparameters, length_scale and periodicity. The length_scale parameter controls the smoothness of the kernel, while the periodicity parameter controls the frequency of the periodic component. In this code, the length_scale parameter is initialized to 1.0 and the periodicity parameter is initialized to 40, with bounds of 35 and 45.

In addition to the kernel hyperparameters, the GPR model also has a regularization hyperparameter alpha which controls the magnitude of the noise in the target variable. In this code, the alpha parameter is initialized to 0.0, which means that no regularization is applied.

The model is trained on 70% of the data, with a total of 70257 data points. The first 2500 data points are used as the training set, and the remaining 100 data points are used as the test set. The GPR model is fit to the training data using the fit method, and the predict method is used to generate predictions for the test data.

Finally, the model predictions are plotted along with the credible interval, which is a measure of the uncertainty in the predictions. The credible interval is defined as two standard deviations above and below the mean prediction.

Failed kernels

Table showing various scores of hyperparameter used

For the white kerner and constant kernel the MAE error is way too low and also the R2 score is very high. However, in general, the RBF kernel may not work well for datasets with discontinuities or sharp changes in the data, since it tends to smooth over these features. Similarly, the dot product kernel may not work well for datasets with nonlinear relationships between the input features and the output, since it assumes a linear relationship.

Regarding the parameters used in the kernel functions, it is important to choose appropriate values based on the characteristics of the data. For example, the length scale parameter in the RBF kernel controls the smoothness of the function, so if the data is very noisy, a larger length scale may be appropriate. The noise level parameter in the white noise kernel controls the amount of noise in the data, and should be set based on the variance of the measurement error. The periodicity parameter in the dot product kernel controls the period of the linear relationship between the input features and the output, and should be set based on the periodicity of the data. The specific values chosen for these parameters in this code were based on experimentation and may not be optimal for all datasets.

Conclusion

In conclusion, satellite images of storm winds and wind speed graphs provide valuable information for monitoring and forecasting severe weather. The dataset on storm winds is a collection of satellite images that capture different stages of the storm, including its formation, intensification, and dissipation, and provide insights into the storm’s structure, movement, and intensity. The dataset is primarily used by meteorologists and climatologists to study the dynamics of storm systems, improve weather forecasting, and develop better models for predicting storm behavior. The Python script uses various libraries and performs operations related to data analysis, machine learning, and visualization to build a Gaussian process regression model. By closely monitoring weather patterns and using the dataset, we can better prepare for severe weather events and mitigate their potential impact on our communities. The availability of this dataset has contributed to significant advancements in the field of meteorology and is crucial for advancing our knowledge of the Earth’s climate system.

--

--