How to Scrapping Data of Climate from 4 Different Source.

Imam Muhajir
Analytics Vidhya
Published in
6 min readOct 13, 2021
Photo by Martin Sanchez on Unsplash

INTRODUCTIONS

Climate is the average weather where the weather is the state of the atmosphere at a certain point in time. Climate is defined as a measure of the average and the variability of the relevant quantities of certain variables (such as temperature, precipitation, or wind), over a period of time, ranging from months to years or millions of years. Climate changes are continuously due to interactions between its components and external factors such as volcanic eruptions, variations in sunlight, and factors caused by human activities such as changes in land use and the use of fossil fuels.

In this article, we will discuss some reliable sources of climate data. And how do we scrape the data?

SCRAPPING DAT

In this scrapping, we will use the R programming language, the climate package, and the CHIRPS package. The climate R package is a package that automatically downloads meteorological, and hydrologic data from the public repository. Before starting scrapping, we will first download the required packages, use the console to operate.

install.packages("climate")
install.packages("chirps")

or

Screenshot by Author
Screenshot by Author

after installing the package then import the package:

library(

There are 4 sources of climate data available :

1. Packages Climate

This package is divided into several data fields, namely meteorological, hydrological dan station information.

a. Meteorological

Download meteorological data from the SYNOP station available on the ogimet.com web collection, the data can be hourly / per day
SYNOP are all meteorological stations that work under The World Meteorological Organization. the results of the scrapping will get 18 columns.

data_meteo = meteo_ogimet(date = c(Sys.Date() - 5, Sys.Date() - 1), 
interval = "daily",
coords = FALSE,
station = 12330)
head(data_meteo)
Screenshot by Author

The next guess will take Meteorca \ data from SYNP/CLIMATE/PRECIP from available stations at the danepubliczne.imgw.pl. the data from this source contains a lot of columns.

m = meteo_imgw(interval = "daily", rank = "synop", year = 2000,  coords = TRUE)
head(m)
Screenshot by Author

next, we will be scrapping co2 data of monthly from Mauna Loa Observatory.

c = meteo_noaa_co2( )
head(c)
Screenshot by Author

b. Hydrological

scrapping data hydrological hourly, daily, and monthly SYNOP/CLIMATE/PRICEP station from danepubliczne.imgw.pl.

h = hydro_imgw(interval = "semiannual_and_annual", year = 2010:2011)
head(h)
Screenshot by Author

c. Station Information

we can be scrapping data stations from a different source. first, we will be scraping data from ogimet collection. if you run in R studio, the output can automatically show a map from the data.

nso <- nearest_stations_ogimet(country = "United+Kingdom",
date = Sys.Date(),
add_map = TRUE,
point = c(-1, 53),
no_of_stations = 100)
head(nso)
Screenshot by Author
image by author

besides that, we can also scrape station data sourced from the NOAA ISH meteorological repository.

nsn <-  nearest_stations_nooa(country = "UNITED KINGDOM",
date = Sys.Date(),
add_map = TRUE,
point = c(-1, 53),
no_of_stations = 100)
head(nsn)
Screenshot by Author

and there is still a lot of station data from several different sources such as the IMGW-PIB station in Poland.

2. Packages CHIRPS

Climate Hazards Groups InfraRed Precipitation with Station data (CHIRPS)is a terrestrial rainfall database that is a combination of three rainfall information, namely global climatology, satellite-based rainfall estimates, and in-situ observed rainfall. CHIRPS data can be a solution for extreme rainfall studies in the long series.

library(chirps)
lonlat <- data.frame(lon = c(-55.0281,-54.9857),
lat = c(-2.8094, -2.8756))
dates <- c("2017-12-15", "2017-12-31")
dt <- get_chirps(lonlat, dates)
head(dt)
Screenshot by Author

3. Web Physical Sciences Laboratory

Screenshot by Author

the web is one of the best webs in data storage. the web provides several categories of data such as surface, temperature, land, ocean, and others. on this web, data can be scrapped manually or automatically using code. example download using code. the first click one of the datasets in a table, for example, we can click “CMAP Precipitation” .

https://psl.noaa.gov/data/gridded/

Screenshot by Author

and find the download feature and copy link address. this is format code:

download.file(url = <link.download>, destfile=<namefile>,mode= "mb")

example:

download.file(url = 'https://downloads.psl.noaa.gov/Datasets/cmap/std/precip.mon.mean.nc', destfile = 'precipmeanmonthly.nc',mode = "wb")

the result has been saved in your local storage. downloading using code looks more complicated, so what are the benefits of downloading using code? the benefit of downloading using code when we have a lot of data to be scrapping. so let's look at the following example. look at the last of lists in the dataset, and you can find the dataset namely “CPC Global Temperature” and click. after that finding “download file” and choose “daily Minimum Temperature” (click ‘tmin.yyyy.nc’).

Screenshot by Author
Screenshot by Author
Screenshot by Author

here there are many datasets, if you download manually it will take a long time and tired. so make it automatically by coding. the result has been saved in your local storage.

#download data CPC Daily Minimum Temperature
for (i in 2018:2019) {
string_date <- as.character(i)
myfile <- paste0("tmin_",string_date,".nc")
myurl <- paste0("ftp://ftp.cdc.noaa.gov/Datasets/cpc_global_temp/","tmin.",string_date,".nc") #bisa disesuaikan; tmax or tmin
download.file(url = myurl, destfile = myfile, mode = "wb")
}

4. Satellite Himawari-8

Himawari-8 is a satellite launched by JMA which began operating in 2015 as a successor to the MTSAT satellite which has 16 channels and produces data every 10 minutes. The Himawari satellite carries the AHI (Advanced Himawari Imager) sensor. This satellite has a geostationary orbit with an altitude of 35,791 km. The spatial resolution of Himawari-8 data is 0.5 km (band 3), 1 km, and 2 km. Himawari-8 data was applied to monitor rainfall levels, cloud top temperatures, and sea surface temperatures.

#### Download data Himawari LAPAN
#sumber data: http://modis-catalog.lapan.go.id/monitoring/#
#hanya tersedia beberapa hari terakhir saja

rm(list = ls())
#ext= brick("./compile/Himawari_IND_start2020-07-01.nc")
#date.ex=seq(as.Date("2021-09-01"),length.out = nlayers(ext),by='day')
from=as.Date("2021-09-05")
date=seq(from,Sys.Date(),by="day")
hr=sprintf("%02d", 0:23)
download_size <- function(url) as.numeric(httr::HEAD(url)$headers$`content-length`)
for (tgl in 1:length(date)) {
#tgl=1
string_date=date[tgl]

for (i in 1:length(hr)) {
#i=1

myfile <- paste0("C:/","himawari_",string_date,"-",as.character(hr[i]),".tif") #save in your directory
myurl <- paste0("http://modis-catalog.lapan.go.id/himawari-8/GeoTIFF/",string_date,"/",hr[i],"-00/produkRFR2.tif")

if (file.exists(myfile)==F) {
tryCatch(download.file(url = myurl,destfile = myfile,mode = "wb", quiet = FALSE),
error = function(e) print(paste0("precip_",string_date,"-",as.character(hr[i]),".tif = did not exist")))

if (file.exists(myfile)==T) {
while (file.info(myfile)$size < download_size(myurl)) {
download.file(url = myurl,destfile = myfile,mode = "wb", quiet = FALSE)
}
print(paste0("precip_",string_date,"-",as.character(hr[i]),".tif = OK"))
}
}
}
}

The article above is 4 sources of climate data, hopefully, it can increase knowledge and be useful for the good of data in the world. thank you for reading

References :

https://psl.noaa.gov/data/gridded/data.cpc.globaltemp.html
https://downloads.psl.noaa.gov/Datasets/cpc_global_temp/
http://modis-catalog.lapan.go.id/monitoring/
https://www.rdocumentation.org/packages/climate/versions/1.0.1
https://www.rdocumentation.org/packages/chirps/versions/0.1.2/topics/chirps
power-point practice 4, data climate access from course methodology computation IPB.

--

--

Imam Muhajir
Analytics Vidhya

Data Scientist at KECILIN.ID || Physicist ||Writer about Data Analysis, Big Data, Machine Learning, and AI. Linkedln: https://www.linkedin.com/in/imammuhajir92/