Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

A Practical Guide to Working with Geospatial data using QGIS: Part 1

--

Photo by Brett Zeck on Unsplash

Introduction

Recently, I got to work on a research project involving analyzing GeoSpatial data. To analyze such data, visualization is an extremely important step. You can see several patterns easily on a map. If the data contains 100 features, it would be nice if we could plot a map of those features and see if the data intuitively makes sense. QGIS is a nice application that allows analysis, editing, and visualization of geographical data. In Part 1 of this tutorial, we will learn how to use QGIS to display maps and analyze geospatial data. We will also learn to perform some basic manipulations on data. In Part 2, I will talk about some complex operations and how to use geopandas to manipulate geospatial data.

Introduction to Geographical data and QGIS

Geographical data contains a geometry component to give data a position on the map. Geometries can be of different types like Point/ Multi-Point, Line/Multi-Line, and Polygon/Multi-Polygon.

QGIS is a free geographical information system application that supports viewing, editing, and analysis of geospatial data.

You can download QGIS from here.

There are different formats for shapefiles like .shp, .kml, .kmz, .geojson, etc. You can check all formats here: https://gisgeography.com/gis-formats/

For this tutorial, I will use a .shp format shapefile from the SEDAC website.

SEDAC is the Socioeconomic Data and Applications Center. It is one of the Distributed Active Archive Centers (DAACs) in the Earth Observing System Data and Information System (EOSDIS) of the U.S. National Aeronautics and Space Administration (NASA). SEDAC focuses on human interactions in the environment.

I will use a village-level shapefile of the Indian State of Uttar Pradesh. It contains socio-economic features at the village level. You can download it from here.

Load/ Display Data on QGIS

Firstly, open the QGIS application. Click on “New Empty Project”. You can save the project by selecting: Project -> SaveAs option.

A. Load Shapefile Data

To load a shapefile follow these steps:

  1. Layer -> Add Vector layer.
  2. Select the .shp file or select the directory in which the shapefile is. Now, Click on Add.

Now, you can view the shapefile.

Step 1
Step 2
A view of Shapefile

To view the information about the shapefile:

  1. Right-click on the shapefile name from the left pane.
  2. Click on Properties -> Information. Here you can all the details about the shapefile.
Shapefile information

B. Fix Geometry (Optional)

Sometimes, the shapefile contains invalid geometries. When we try to perform some operations on the shapefile, it gives the error “Invalid geometry”. To fix geometry, follow these steps:

  1. From the menu bar, go to Processing -> Toolbox -> Fix Geometries. Select your shapefile.

2. Click on Run

3. Now, a new shapefile will be created. It will have the geometries fixed in case some geometries were invalid.

Shapefile with geometries fixed

We will continue our work on this new shapefile. You can work on the old shapefile as well if it has no invalid geometry. But, I already know that this shapefile has some invalid geometries. And, it will create some problems at a later point.

C. View Data Attributes

Now, the shapefile is loaded. But how do we know what variables/ attributes are in the shapefile? To view the attributes:

  1. Right-click on the shapefile name in the left pane.
  2. Click on “Open Attribute table”.

Now, you can see all the attributes present in the data.

Step 1
Attribute Table

D. Display data attributes on the map

Let’s display a particular attribute from the shapefile on the map. The attribute can be of 2 types: discrete/categorized variable, or continuous value.

Here are the steps to do that:

  1. Right-click the shapefile name on the left pane.
  2. Click on Properties.
  3. Now, click on Symbology.
  4. On the top, click on the drop-down menu. You will see options like- Single Symbol, Categorized, Graduated, etc.

5. To display a categorized attribute, click on Categorized.

6. Below that, there is an option called “Value” to select the attribute.

7. We will select “DID” which is the unique district ID.

8. Now, at the bottom-left corner, click on “classify”. Press “OK”.

You will see something like the image below.

Displaying categorized attributes on the map

9. To display a continuous value, follow the same steps as above. But, instead of ‘Categorized”, click on “Graduated”.

10. Now, in “Value”, select the attribute “TOT_P” which is the total population.

11. Now, in the bottom-left corner, there is a button called “Mode”. It divides the continuous variables into different intervals using a particular mode like Equal Count, Equal Interval, Logarithmic Scale, Natural Breaks, etc.

12. We will use “Natural Breaks”. Natural Breaks tries to find natural groupings of data to create classes.

13. At the bottom-right corner, there is a parameter called “Classes”. It will divide the variable into the user-specified number of intervals. We will create 15 classes.

14. Now, click on classify.

You will see something like this.

Displaying a graduated/continuous attribute on the map

Manipulating Data using QGIS

We can perform a lot of operations using QGIS. For example: converting polygons to centroids, dissolving boundaries based on a particular attribute, etc. We will see a few operations here.

A. Dissolve geometries by a particular attribute

All geometries in the shapefile are displayed using polygons. This shapefile contains village boundaries also. But, what if we want to merge all boundaries inside each district and, keep the district boundaries only. “Dissolve” operation does exactly that. It will dissolve all boundaries/polygons inside a region and create one single region/polygon from it. Here are the steps to dissolve boundaries:

  1. Go to Vector -> Geoprocessing Tools -> Dissolve
  2. Now, there is an option dissolve field. We want to dissolve everything inside a district and the “DID” variable is the unique district ID. So, we will dissolve by the attribute “DID”.
  3. After selecting “DID” in the dissolve attribute, click on “Run”.

You will see something like this-

Dissolved geometries shapefile

B. Convert polygons to centroids

You can also convert polygons to centroids. We will convert this “Dissolved” shapefile to centroids.

  1. Go to Vector -> Geometry Tools -> Centroids
  2. Select the file “Dissolved” and click on “Run”.

You will see something like this:

Generate centroids

Display StreetMap over shapefile

You can also display a street map over the “Dissolved” shapefile.

  1. Firstly, in the left pane, at the bottom, click on “Browser”.
  2. Now, among many options, double click on “OpenStreetMap”.
  3. Now, in the left pane, at the bottom, click on “Layers” again. And, you will see a new layer added called “OpenStreetMap”
Open Street Map

4. In order to overlay the dissolved map on top of the street map, tick both the layers. And, drag the “Dissolved” layer above the “OpenStreetMap” Layer.

5. But, we don’t see the map behind the shapefile. It is because the opacity of the layer is at 100% by default. We can set the opacity of the layer to 50%.

6. Now, right-click on Dissolved Layer.

7. Go to Properties -> Opacity.

8. Set the opacity to 50%, and click on “OK”.

Now, we can see both layers. You can also zoom in to get a better view. “Magnifier” option is at the bottom-center.

Viewing open street map along with other shapefile

Saving a shapefile

Now, let’s save the centroid file. To save a shapefile, follow these steps.

  1. On the left pane, right-click on a particular file (here “Centroids”).
  2. Click on Export->Save Features As.

3. Now, write the desired output filename. You can also select the attributes that you want to save by checking/ unchecking the boxes beside them.

4. Finally, click on OK. And, your file is saved.

Conclusion

This was a basic tutorial for beginners on how to work with geospatial data using QGIS. Now, you know how to load data into QGIS and analyze it. We also did some basic manipulations on data. I hope it was helpful to you.

Next Steps

In Part 2 of this article, I will try to cover some complex operations using QGIS. I will talk about how to use geopandas with python to read and manipulate shapefiles.

I hope you found this article useful.

Let me know if you have some other issues regarding QGIS and need help with it. I will try to cover them in future articles.

Thank you so much for reading! 🙂

References

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Aditya Dutt
Aditya Dutt

Written by Aditya Dutt

Machine Learning PhD Student at University of Florida (he/him) https://adityadutt.github.io/

Responses (1)