Geo Exploration with Elasticsearch and Kibana

Li-Ting Liao
Dev Diaries
Published in
7 min readAug 9, 2020

Taiwan River Pollution Status Data Visualization with Elasticsearch and Kibana

Goal

I’m always fascinated by data analytics tools like the Elastic Stack, Splunk, etc. This time I will use the two components of the well-known Elastic Stack — Elasticsearch and Kibana — to showcase how we can utilize speedy store-search features of Elasticsearch and then use Kibana to give shape to our data.

I will create a Kibana dashboard to present river pollution status in Taiwan. The dashboard will cover:

  1. Overall data contains 117,883 rows, including 68 rivers and over 300 monitoring sites, from 2018 Jan — 2020 Jun.
  2. Select indices for each river. The most important index we’re looking at here is RPI (River Pollution Index, 河川污染指數) which is composed of multiple indices such as DO (水中溶氧量)、BOD5 (生化需氧量)、SS (懸浮固體)、NH3-N (氨氮). We can see over time how a river’s RPI is changing, and can also look at each specifically related sub-indices as shown in the line chart.
  3. Monitoring sites along a river can be shown on a map.
demo

Dataset I’m going to use is from the EPA of the Taiwan government's open-data website. I’m using river pollution monitoring datasets for this demo. Datasets can be found here:

https://data.epa.gov.tw/en/dataset/wqx_p_01

Column names of the dataset:

  • SiteId 測站代碼
  • SiteName 測站名稱
  • SiteEngName 測站英文名稱
  • County 縣市
  • Township 鄉鎮市區
  • Basin 流域名稱
  • River 河川名稱
  • TWD97Lon TWD97經度
  • TWD97Lat TWD97緯度
  • TWD97TM2X TWD97二度分帶X
  • TWD97TM2Y TWD97二度分帶Y
  • SampleDate 採樣日期
  • ItemName 測項名稱:酸鹼值、水溫、河川污染分類指標、大腸桿菌群、溶氧飽和度、氯鹽、氨氮、化學需氧量、生化需氧量溶氧(電極法)懸浮固體、氣溫、導電度
  • ItemEngName 測項英文名稱:pH、Water Temperature、River Pollution Index、Coliform Group、Dissolved Oxygen Saturation、Chloride、NH3-N、Chemical Oxygen Demand、Biochemical Oxygen DemandDissolved OxygenSuspended Solid、Temperature、Conductivity
  • ItemEngAbbreviation 測項英文簡稱:pH、WT、RPI、Coliform、DOS、Cl、NH3-N、COD、BOD5DOSS、TMP、EC
  • ItemValue 監測值
  • ItemUnit 測項單位

So, let’s get started.

Process

  1. Keep required columns in the dataset, add new columns and transform data type with several existing columns with Python 3.8

Read the CSV data into Jupyter Notebook as data frame:

In column “SampleDate”, some of the date formats are not consistent, so I used the below code to take care of that:

data[‘SampleDate_New’] = [row.replace(‘-’, ‘’) for row in data[‘SampleDate’]]

Now there’s a new column called “SampleDate_New”, and sort the data set accordingly:

Since this dataset originally contained data from 2012 and the total was around 300K rows, I decided to only include data from 2018 Jan onwards which is around 100K rows in total:

data_20182020 = data.loc[(data[‘SampleDate_New’] >= “20180000”) & (data[‘SampleDate_New’] <= “20210000”)]

Also dropped several unused columns:

data_20182020 = data_20182020.drop(columns=[‘TWD97TM2X’, ‘TWD97TM2Y’, ‘SampleDate’])

First, I noticed that the column “ItemValue” which contains all actual number readings from chemical indices e.g. RPI, etc. is not in numeric format, which will cause problems in calculating on Elasticsearch. So, I decided to transform the data type here. If there are “-” in column “ItemValue”, then it will be “NaN”.

Then I added another column “RPI_level” with category level specified tag according to “ItemValue” of RPI 河川污染分類指標. This is following Taiwan EPA’s website:

  • Not(Slightly) polluted 未(稍)受污染: RPI ≦2.0
  • Low level polluted 輕度污染:2.0<RPI≦3.0
  • Medium level polluted 中度污染:3.1≦RPI≦6.0
  • High level polluted 嚴重污染:RPI>6.0

Code of the above two steps:

Lastly, export this updated dataset into a new CSV file “data_20182020_v3” for later use:

data_20182020.to_csv(‘data_20182020_v3.csv’,index=False)

2. Upload 2018 Jan- 2020 Jun data with specific mappings on the index with Elasticsearch, also adding geolocation (including latitude and longitude) as a new field.

First, connect to my Elasticsearch engine:

http://[myip]:9200/

Then, start Kibana and metricbeat agents. Use below to see the home page of the Elastic Stack:

http://[myip]:5601/

Upload my data_20182020_v3.csv here:

Right before uploading the datasets, Elasticsearch will show the first 1000 rows of our data by default, so we can check if all columns are uploaded correctly so far. Since my time field and time format had changed to column “SampleDate_New” and it looks like “yyyyMMdd”, I edit this change in the “Override settings”:

Click “import” and come to the index mapping settings page. Several index settings need to be changed:

  • Create Index Pattern once I import the data
  • Type of column “SampleDate_New”: date
  • Type of column “ItemValue_float”: float
Type of column “SampleDate_New”: date
Type of column “ItemValue_float”: float

Here comes a tricky part — Kibana allows us to provide geo points of data and showcase them on the map, so we need to do some settings to make those geo points able to be recognized by the engine.

First, in Mappings, change the type of column “TWD97Lat” and “TWD97Lon” into “float”. Then in both Mappings and Ingest pipeline, add one more field called “GeoLocation” (this name can be changed):

# Mappings
"GeoLocation" : {
"type": "geo_point"
}
# Ingest pipeline
{"append": { "field": "GeoLocation", "value": ["{{TWD97Lat}},{{TWD97Lon}}"]}}
“TWD97Lat” & “TWD97Lon” need to be floated and “GeoLocation” is added.

Once done with all settings, click “import”. If all settings are done correctly, it will show the following success message:

Next, click “View index in Discover”. Now all raw data looks as we expected:

Looks fine so far!

3. Create visualizations on the dashboard with Kibana.

Here comes the fun part — creating a customized dashboard. It will include:

  • Selectors: to choose which chemical indices and river name as user wants
  • Metrics: to show how many data points are present in the dataset, from overall to the number of rivers, amount of political borders across, amount of sites, and amount of chemical indices can be checked
  • Pie chart: to see what is the percentage of RPI status of overall rivers (or a specific river) over time
  • Line chart: to show how each chemical indices change over time, especially those are related to our RPI 河川污染指標 e.g. NH3-N 氨氮、BOD5 生化需氧量、DO 溶氧(電極法)、SS 懸浮固體
  • Map: a map showing locations of each monitoring site. Currently, it only shows the latest Sample Date’s data in tooltips

Let’s start from the top left.

Settings for the Selectors visual:

Both selectors are “option list”.

Settings for the Metrics visual:

Settings for the pie chart — here I used “Lens”, a new feature from Kibana standard visuals output. It helps provide several recommendation charts for your chosen data:

Settings for the line chart visual — here I tried to use timestamp which is the same as my SampleDate_New as x-axis, and make sure there’s one line to represent each of the chemical indices:

Finally, here to set up the map. First, choose “cluster map”:

I chose the “clusters” map here

Then click on the newly added map layer and start to edit the details:

There are 117883 counts of data hovering “Taiwan”!
I gave the map a name and specified tooltip fields.
The data here on the map will show RPI only.
Then I gave each dot’s color based on their RPI_level values.

Conclusion

My river pollution dashboard is finished! If I want to look at a specific river, I can do the following checkup. For example, I’m living in Taipei. One of the well-known rivers is Keelung river 基隆河. Let’s take a look.

There are 390 RPI data counts for the Keelung river over our time — 2018 Jan to 2020 Jun. It has 13 monitoring sites and it runs across 3 counties — New Taipei, Taipei, and Keelung:

Its RPI level is mostly at both the Not(Slightly) polluted and Medium level polluted. This seems a bit odd:

Then we can see in the map, its upstream RPI is quite good but the downstream RPI is in bad shape:

That’s all! Thanks for following along and hope you find this helpful :)

demo for Keelung river

--

--

Li-Ting Liao
Dev Diaries

Software developer by day, amateur writer by night. Passionate about both code and creativity, and always seeking new ways to learn and grow.