Rain probability in the Northeast region of Brazil.

Carlos Barbosa
Analytics Vidhya
Published in
4 min readFeb 23, 2020

This article refers to the probability of rainy events in states located in the northeastern region of Brazil, using statistical techniques, data analysis was performed to predict the states that are more likely to have a climatic event in a given region.

  1. Introduction

1.1 Background

The weather conditions are fundamental for agriculture, because in addition to being one of the main factors that determine the productivity of a crop, the weather conditions also determine the calendar of activities of the farms, with this, it is extremely relevant for the rural producer have a way of predicting these rainy weather events.

1.2 Problem

A large amount of agricultural production, makes food and several other types of products reach the consumer, but they are dependent on regular rain cycles, when a rainfall change in this event, a phenomenon called drought appears, and certain consequences are felt in replenishing these foods, thereby enhancing the product to the final consumer.

1.3 Solution

To solve this problem, I performed a data analysis based on a dataset, from the kaggle entitled “Precipitation in Brazil”, I used statistics and data science techniques and machine learning, a model that processes the information in order to predict the probability of happening the rainy weather event in a given region.

2. Data capture and cleaning

2.1 Data Sources

In the current context of our problem, the facts that will influence our decision making will be:

  • Average Wind Speed
  • Maximum Wind Speed ​​(Average)
  • Average Cloudiness
  • Total Precipitation
  • Average Compensated Temperature
  • Average Relative Humidity

The data sources needed to extract and generate necessary information:

BDMEP database — Meteorological Database for Teaching and Research

Located in:

The data sources needed to extract and generate necessary information:

BDMEP database — Meteorological Database for Teaching and Research

Located in:


and kaggle Dataset

Located in:

2.2 Data Cleaning

In the database of BDMEP, there was a lot of historical data and in an unpleasant format to perform analyzes, it was in txt, I had to perform a conversion to csv and modify the data types of the fields, after that I made a substitution of the null values for a generalized average.

3. Exploratory Data Analysis

3.1 Descriptive Data Analysis

identifying outliers, it is clear that there is a large amount above 45% of precipitation.

comparing rainfall levels by state.

Fisher distribution, many values ​​close to zero, a downward trend.

Scatterplot on the Average Cloudiness metric, we observed that the ideal in the Gaussian distribution is between 6.5 and 7.5

Scatterplot on the Maximum Wind Speed ​​(Average) metric, we observe that the ideal in the Gaussian distribution is between 6 and 8

3.2 Verification of data normality

3.2 Comparison of claims by state

Comparison of claims by state, we can clearly see that the claim of rain in the state of Amapá is much higher than in the state of Pernambuco

One of the states that most needs to be on the alert with rainy weather events, is the state of PE, since the intention of rain for this region is very small compared to the rainiest ones, based on this the farmer could be better prepared for such type of event, at a certain time of the year.


The purpose of this study was to help our farmers prevent one of the most anticipated events for them, to supply food in the stock to sell these products, increasing consumer consumption and the country’s economy.

6. References

  1. Kaggle
  2. BDMEP
  3. Superbac

