Labelling Areas of Coordinates with GeoPandas

Eka Aditya Pramudita
Nerd For Tech
Published in
5 min readMay 31, 2021

Dealing with spatial data in the spatial analysis might be cumbersome. Spatial data is usually stored in a form of points, lines, or polygon coordinates that are representable in a map. Thankfully there is a python package called GeoPandas purposed to work with geospatial data in python easier.

I recently worked on spatial data analysis. In that project, GeoPandas helped me to label coordinates with their corresponding areas. Labelling area is indeed important to identify whether there is a spatial pattern in the data. In this article, I will particularly discuss how to put area label on given coordinates.

Photo by Timo Wielink on Unsplash

What is Geopandas?

GeoPandas is an open-source project designed to help to work with geospatial data in python. It combines the capabilities of pandas and shapely, providing geospatial operations in pandas and a high-level interface to multiple geometries to shapely. This package can read almost any vector-based spatial data format, GeoJSON files and more, then return a GeoDataFrame object. You may check this documentation when working on your project as it provides a comprehensive explanation of how to use GeoPandas.

Identifying Areas

For this article, I will take an example of my latest spatial data analysis project. The dataset provides coordinates of streets in Bandung, a big city in Indonesia. This is the preview of the dataset:

df.head(5)
The dataset. Image by Author

Transform to Proper Geometry Data

The coordinates lie on the geometry field. The values of that column suppose to be formed as shapely objects such as Points / Multi-Points, Lines / Multi-Lines, and Polygon / Multi-Polygons. The reason why we need shapely objects is that we need to create a GeoDataFrame object. GeoDataFrame class implements nearly all of the attributes and methods of shapely objects including checking whether the coordinates lie within a certain area.

There is only Line / Multi-Line for this dataset, so we need to convert as the values are still in dictionary format. Use this piece of code to transform them.

from shapely.geometry import LineString, MultiLineStringdf[‘geometry’] = df[‘geometry’].apply(lambda x : (json.loads(x)[‘type’] == ‘LineString’ and LineString(json.loads(x)[‘coordinates’]) or json.loads(x)[‘type’] == ‘MultiLineString’ and MultiLineString(json.loads(x)[‘coordinates’])))df.head(5)
Transformed geometry. Image by Author

Create a GeoDataFrame Object

Now we have our geometry values transformed into shapely objects. Then the next step is creating a GeoDataFrame object. A GeoDataFrame object is a pandas.DataFrame that has a column with geometry. We can specify which column to be used as geometry and Coordinate Reference System (CRS) of the geometry objects such as “EPSG:4326” or WKT string.

d = {‘street’: list(df[‘street’]), ‘geometry’: list(df[‘geometry’])}gdf = geopandas.GeoDataFrame(d, crs=”EPSG:3395")gdf.head(5)
GeoDataFrame object. Image by Author

Although it seemed like no difference between ordinary DataFrame and GeoDataFrame, the difference is on the method applicable to use in GeoDataFrame, as it implements nearly all of the attributes and methods of shapely objects.

Get Sub-Area of Bandung City Map

Now that we have transformed our DataFrame into GeoDataFrame, we can leave that for a while as we are going to get the area label data. As the data boundary is within a city, then we might label the observation according to sub-area, district, sub-district, or whatever relevant within a city. In my last project, I use Sub-Wilayah Kota (identical to sub-area, abbreviated by SWK) to label the observations.

Bandung City government provides the data on its page. The data is stored in a JSON-typed file, but it usually mentioned as GeoJSON type as it contains the geographical attribute. We can find the GeoJSON data at this link.

url = “http://data.bandung.go.id/dataset/fbec5a0e-2efe-4d37-99ee-7a55d32beb20/resource/0a9aefc2-c849-4fb5-b393-b1687dbe8d70/download/3273-kota-bandung-level-kewilayahan.json"gdf_swk = geopandas.read_file(url)gdf_swk
SWK data. Image by Author

We can see that SWK data’s geometry has a Polygon type, meaning those values represent areas. We can see through this mapping:

import matplotlib.pyplot as pltgdf_swk[‘coords’] = gdf_swk[‘geometry’].apply(lambda x: x.representative_point().coords[:])gdf_swk[‘coords’] = [coords[0] for coords in gdf_swk[‘coords’]]fig, ax = plt.subplots(figsize = (10,10))gdf_swk.plot(ax=ax, color=’yellow’, edgecolor=’black’)for idx, row in gdf_swk.iterrows():   plt.annotate(s=row[‘nama_wilayah’], xy=row[‘coords’], horizontalalignment=’center’, color=’blue’)
Bandung City Map divided by Sub-Wilayah Kota (SWK). Image by Author

Labelling Areas

Now let’s have the main course: labelling areas! Now as we have an area labels map, we may use gdf.within .

The method will check whether a Point / Multi-Point or Line / Multi-Line lies within the selected area. Checking areas taking a quite long time since we check every line with every area available. This will return a binary response for every area checked.

swk_list = list(gdf_swk[‘nama_wilayah’])for i in range(len(swk_list)):   gdf[“wilayah_”+swk_list[i]] = gdf[“geometry”].within(gdf_swk[‘geometry’][i]).astype(‘int’)gdf.head(5)
Binary response for every area checked. Image by Author

Then we can put all the information into a single column represents areas.

def get_wilayah(row):   for c in gdf.iloc[:, 3:].columns:      if row[c]==1:         return c[8:]gdf[“wilayah”] = gdf.iloc[:, 3:].apply(get_wilayah, axis=1)gdf.head(5)
Image by Author

We may drop the binary response columns to get a cleaner result.

gdf = gdf[[‘street’, ‘geometry’, ‘wilayah’]]gdf.head(5)
Final Result. Image by Author

Conclusion

GeoPandas package is a handy tool for spatial data analysis. It enables us to label coordinates according to specific areas. I hope this article helps those who are or will be working on spatial data analysis. Don’t hesitate to ask me in the comment section if you found that it is confusing.

--

--