Analysing forest fires data in the world's largest wetlands

Published in

CodeX

7 min readAug 15, 2022

It is increasingly common to find news about forest fires worldwide, principally in regions such as Europe, the USA and Brazil. One of the most significant examples is Brazilian biomes, which constantly suffer from forest fires. Global warming, lack of rain and deforestation are some factors that generate these fires.
Interestingly, Brazil's wetlands (Pantanal) are one of the most affected by fires. As reported by the WWF organisation, new fire records are broken every year in this region.
On the other hand, data analysis can be an essential tool for understanding, sharing, and generating information about these fires. In this article, we will perform a data exploration using Python to better understand the forest fires that occur annually in the Pantanal.

Data

The adopted data in these analyses are from the Programa Queimadas Open Data portal. The Programa Queimadas is a project from the Brazilian National Institute for Space Research to provide openly available data focusing on monitoring and modelling the occurrence, propagation and classification of active fire in vegetation, its risk, extent and severity, using Remote Sensing, Geoprocessing and Numerical Modeling techniques.

In these analyses, we will use two distinct datasets:

The historical series of fires for years in the Pantanal biome, including:

Quantity of fires by months;
The maximum amount of fires by month;
The Average amount of fires by month;
The Minimum amount of fires by month.

2. The heat regions quantity in the Mato Grosso state in the last 48 hours, including:

Heat id;
Date and time;
County;
Brazil state;
Biome.

Preparing the environment

So, let's get started by importing the necessary python libraries:

import pandas as pd
import matplotlib.pyplot as plt
import urllib.request
import seaborn as sns

The next step is to retrieve the forest fires by year dataset from the Programa Queimadas portal and converting to a Pandas DataFrame:

csv_url_hist_pantanal = 'https://queimadas.dgi.inpe.br/queimadas/portal-static//bioma/csv_estatisticas/historico_bioma_pantanal.csv'urllib.request.urlretrieve(csv_url_hist_pantanal, 'pantanal.csv')df = pd.read_csv('/content/pantanal.csv', encoding='utf-8')
df.replace(to_replace='-', value=0)

When analysing the extracted dataset, it is noticeable that some data have the '-' character, possibly representing uncollected data. To work around this issue, we can assume these values as zero. This dataset also adopts the PT-BR language to identify the months (Janeiro-Dezembro) and the maximum (Máximo), average (Média), and minimum (Mínimo).

Exploratory data analysis

Pantanal forest fires by year

To understand the forest fire scenario, let's first create a graph to view the historical evolution of fires in the Pantanal. Although, by executing the df.info() method, we can observe only columns 6, 7, 8 and 13 adopt the int64 type, and the other columns adopt the object type.

The tolist() function can standardise these values so we can quickly generate the wanted graph. Further, we can also define the years and forest_fires variables for our chart based on the Unnamed: 0 and Total columns.

years = df['Unnamed: 0'][0:24].tolist()
forest_fires = df['Total'][0:24].tolist()

Now we can generate the graph to understand Pantanal's forest fire historical evolution. The plt.bar() function is adopted to create the chart relating the years and forest_fires variables. Further, by adopting a series of designs function we can enhance the visualisation, shown as follow:

plt.figure(facecolor='#383838') #changing image colour background
sns.set(rc = {'figure.figsize' : (20, 10)}) #setting the graph sizeplt.bar(years, forest_fires, color='#e14854') #defining the x and yplt.title('Pantanal forest fires by year', color='white', fontweight='bold', fontsize=20 ) #graph title 
plt.xlabel('Years', color='white', fontweight='bold', fontsize=20 ) 
plt.ylabel('Forest fires', color='white', fontweight='bold', fontsize=20 )
plt.xticks(years, rotation=90)ax = plt.axes()
ax.set_facecolor('#383838') #changing graph color background
ax.tick_params(axis='x', colors='white') #x parameters colour
ax.tick_params(axis='y', colors='white') #y parameters colour
ax.xaxis.set_tick_params(labelsize=20) #y parameters text size
ax.yaxis.set_tick_params(labelsize=20) #y parameters text size
ax.xaxis.grid() #Hiding the x gridplt.show()

As a result, 2020 is considered the year with a record of fires during the entire historical series. When searching the fires in the Pantanal in 2020, numerous news reported this tragedy.

News headlines citing the 2020 Pantanal fires in New York Times, BBC, France 24 and ABC News

The Pantanal forest fires maximum, average and minimum by month

Using the same dataset, we can also identify when there is a greater risk of fires during a year. A line graph presenting the Pantanal fire evolution by month can assist this analyse. For that, the maximum, average and minimum variables present in the dataset are collected by:

months = df.columns[1:13]months_max = df.loc[25, 'Janeiro':'Dezembro'].tolist()
months_max = list(map(int, months_max))months_avg = df.loc[26, 'Janeiro':'Dezembro'].tolist()
months_avg = list(map(int, months_avg))months_min = df.loc[27, 'Janeiro':'Dezembro'].tolist()
months_min = list(map(int, months_min))

The plt.plot() function is adopted three times to represent the months_max, months_avg, months_min variables. In addition, the plt.fill_between() function fills the area represented by the months' variables to enrich the graph visualisation. This chart's colour and label design follow the previously used format.

plt.figure(facecolor='#383838') #image brackground colour# months X months_max line graph
plt.plot(months, months_max, color='white', marker='o', label='Max')
plt.fill_between(months , months_max, color='#522564')# months X months_avg line graph
plt.plot(months, months_avg, color='white', marker='o', label='Avg')
plt.fill_between(months , months_avg, color='#a22a69')# months X months_min line graph
plt.plot(months, months_min, color='white', marker='o', label='Min')
plt.fill_between(months , months_min, color='#e14854')plt.xticks(months, rotation=90) #rotation of the x parameters
plt.title('AVG-MAX-MIN forest fires in Pantanal by month', color='white', fontweight='bold', fontsize=20) #title
plt.xlabel('Months', color='white', fontweight='bold', fontsize=20)
plt.ylabel('Forest fires', color='white', fontweight='bold', fontsize=20)ax = plt.axes()
ax.set_facecolor('#383838')
ax.tick_params(axis='x', colors='white')
ax.tick_params(axis='y', colors='white')
ax.xaxis.set_tick_params(labelsize=20)
ax.yaxis.set_tick_params(labelsize=20)plt.show()

Avg-Max-Min forest fires in Pantanal by month

The graph shows a considerable increase in Pantanal forest fires in years second half. The August, September and October months are the top 3 favourable months for fires in Pantanal, all the analysed variables in the graph show this increase. This characteristic is related to the dry season in Pantanal, which occurs from July to October annually.

Heat spots by county share

The Programa Queimadas portal also provides data from various satellites that collect all possible fire outbreaks. These fire outbreaks are delimited by a temperature increase in a region called heat spots. The datasets are available in the portal considering South America, Brazil and Brazillian counties.
In Brazil, the Pantanal biome is present in the Mato Grosso and Mato Grosso do Sul. In this case, we choose Mato Grosso for the data analysis.

csv_mt = 'https://zenodo.org/record/5669098/files/focos48h_estados_MT.csv?download=1'
urllib.request.urlretrieve(csv_mt, 'heatspots_MT_48h.csv')

In the heatspots_MT_48h.csv dataset, each record is a new heat spot identified by a satellite. By grouping the records, we can visualise Mato Grosso counties with a higher number of heat spots. From this, we can locate the counties more likely to have forest fires. The counties and heatspot_county_id variables are created to store the incidence of heat spots and generate a doughnut chart.

df_mt = pd.read_csv('/content/heatspots_MT_48h.csv')
df_area = df_mt.groupby(['municipio']).nunique().sort_values(by=['municipio'], ascending=False)
counties = df_area.index.tolist()
heatspot_county_id = df_area['FID'].tolist()

The plt.pie() function is adopted to visualise the heat spots classified by counties. We also adopted some additional procedures to change the design of the generated doughnut chart.

plt.figure(facecolor='#383838') #image brackground colour

plt.pie(heatspot_county_id, labels = counties, autopct='%1.1f%%', labeldistance=1.05, textprops={'color':'w', 'fontsize':'20',})plt.title('Forest fires in Pantanal by month', color = 'white', fontweight='bold', fontsize=20) #title# generating a empty space inside the circle 
my_circle = plt.Circle((0, 0), 0.7, color='#383838')
plt.gca().add_artist(my_circle)plt.show()

As a result of this visualisation, we can identify the counties with higher heat spots in the Mato Grosso state. This result is also confirmed when we look for news about Pantanal fires in these counties.

News citing the fires in the Poconé and Cáceres counties from the Globo website

Summary

Through these Pantanal data exploratory analyses, we can better understand how data can be a powerful tool in critical situations. Governments, counties, nonprofit organisations and other stakeholders must be able to perform data analysis tasks to prevent forest fires. Further, we could compare the results generated with related news and confirm the reliability of these news reports and our obtained results.

The adopted scripts in this article were first introduced in our data literacy workshop for teachers and students in Brazil organised by the InformAção project and SESC educational centre. The workshop can be accessed here.

Please feel free to ask your valuable questions in the comments section. Cheers, thanks for reading! 😊