Getting Started with QGIS

Geospatial Data Types, Selection Methods & Symbology

Grga Bašić
Beyond the Anthropocene
14 min readMar 29, 2022

--

Premise and Objectives

This module covers working with raster and vector spatial data within QGIS. After completing this exercise, you will have:

  • Become familiar with QGIS user interface
  • Learned the basics of adding, interpreting, and visualizing raster and vector data in QGIS
  • Learned data selection methods using tabular and spatial queries
  • Explored basic techniques of labeling and symbology

You will then post your work-in-progress to Are.na. The requirements for {L01} are listed at the end of this tutorial; you can jump to the assignment here.

Prep

Download the data repository for this class using this link. Save the downloaded folder to your preferred working location. (We highly recommend establishing a single folder or directory for all of the work and tutorials associated with the class.) The “DATA” folder will have most of the datasets needed for tutorials.

Setting Up QGIS

Launch QGIS. Your new blank map project will look something like this:

Begin to familiarize yourself with the interface. Yours may not look exactly the same as the layout shown above. For instance, most toolbars and panels are movable, allowing you to set up your preferred workspace. You can always add or remove these workspace elements by clicking View > Panels or Toolbars (from the main menu bar). For more information, refer to this brief description of the elements of the interface.

Important Note: Like all GIS software, QGIS provides an environment in which we work with spatial data (whether vector- or raster-based). Datasets are not stored within QGIS. Instead, layers stored elsewhere are assembled within a map project where we can visualize, compare, manipulate, and analyze them together. Within the map project, we can also compose and export a map. Storing your data outside of the project allows you to include a particular data layer in several map projects without having to produce multiple copies of it, but it carries two important implications:

  1. Maintaining clear file organization is essential. The data layers associated with your map projects are linked to each project from their stored location. If those files are moved or reorganized, their links will break, and you will need to re-establish their linkage.
  2. GIS software continuously reads the path to each data layer, accessing the information stored within your datasets. We highly recommend a policy of no spaces in your path or file names to enable this reading smoothly.

Save your (currently empty) map project in your working folder by clicking Project > Save As... on the main menu.

Your project was added to the Browser Panel in the Project Home folder.

Adding Vector Data

There are several ways to add data to a project. We will begin by using the Open Data Source Manager button located on the Data Source Manager Toolbar. If this toolbar is not present, remember you can access it by clicking through View > Toolbars > Data Source Manager from the main menu.

  • Click Add Vector Layer button to reveal the Vector options
  • Navigate to the Vector/Cities/World_Cities folder, which contains the “World_cities” shapefile we will use for this project. (Alternatively, you can always use the Browser Panel to navigate through your files and drag them onto your map project.)
Data Source Manager > Vector > Browse > ../World_Cities.shp > Add

There are several different file extensions here that may be unfamiliar. Shapefiles are collections of separate files that contain different information or perform specific roles. The primary component files are as follows:

  • .shp — The main file that stores the feature geometry (required)
  • .shx — The index file that stores the index of the feature geometry (required)
  • .dbf — The dBASE table that stores the attribute information of features (required)
  • .sbn and .sbx — The files that store the spatial index of features
  • .prj — The file that stores the coordinate system information

For more information on these extensions and others, see this explanation by ESRI.

There are several different components of a shapefile. Keep them always together in the same folder.

Important Note: The files associated with a shapefile must stay together in the same folder; otherwise, QGIS will not be able to load the layer or may not read it correctly.

Add the world_cities.shp and admin_0_countries.shp to your scene.

Points represent the cities layer, and polygons represent the countries layer. The order of the layers can be controlled with the Layers panel (click and drag to reorder). You can toggle layers on and off by clicking the check-mark next to their name, allowing you to choose which are visible when multiple layers are added to a project.

You can use the Browser panel to add layers. Use the Layers panel to change the order of appearance.

Note that the colors associated with each point and polygon are the same for all features in the layer, and the color assigned is arbitrary. You can toggle layers on and off by clicking the check-mark next to their name, allowing you to choose which are visible when multiple layers are added to a project. Save your map project.

Attribute Table and Interactive Selection

Each point in the Cities layer and each polygon in the Countries layer correspond to an entry point in a data table — an Attribute Table. To access the Attribute Table of the associated layer (and thus inspect the attributes of each feature), right-click on the layer name in the Layers panel and choose Open Attribute Table.

In the case of this basic administrative boundary layer, the Attribute Table contains information commonly used to identify countries.

To quickly identify which polygon corresponds to a given feature within the Attribute Table, you can interactively select a feature by clicking on the row number. This will highlight the feature in the attribute table and the data frame.

To dock the Attribute Table to the main window while interactively selecting features, click the `Dock Attribute Table` button on the table’s menu bar. To zoom into the selected feature, right-click on the feature in the Table and select `Zoom to Feature`.

Of course, the relationship between the geometry and attributes of a feature also works in the opposite direction. Using the Selection Tool from the Selection Toolbar (chosen in the GIF below), you can interactively select polygons and highlight them in the Attribute Table (make sure the layer you want to select features from is highlighted in the Layers Panel). Further, you can isolate selected features from the table: choose Show Selected Features from the Attribute Table’s filter drop-down menu.

Select Features > Show Selected Features
Click the `Deselect Features` button on the Selection Toolbar (or on the Attribute Table’s Toolbar) to deselect any selected features. Click the `Pan Map` button on the Map Navigation Toolbar to exit the `Select Features` tool.

On your own: interactively select and inspect features and attributes of the World Cities layer.

Data Querying

Select by Attributes

Another selection technique is to directly query the Attribute Table based on the fields and values within the dataset — a method usually referred to as Select by Attributes. Open the layer’s attribute table and observe the information stored in various fields. Note that some fields contain nominal information, and others contain quantitative information — e.g. population (POP) or population rank (POP_RANK), which ranks cities from 1 (most populous) to 7 (least populous). By querying the World Cities dataset, we can answer questions such as:

  • How many cities in the dataset have the status of “National and provincial capital”? Or,
  • What cities in the dataset have populations greater than two million?

To answer (and visualize) these questions, we will first select a subset of features from the World Cities layer with the status of “National and provincial capital.” Then we will export them as a separate layer. We will do the same for cities with populations greater than two million.

There are multiple routes to select features within a dataset: either we can open its Attribute Table and click onSelect features using an expression button or select the dataset in the Layers panel and navigate to Edit > Select > Select features by Expression... option. (The same icon should show up as a new shortcut button in your toolbar once you have performed this action once.) Either route will open the Select by Expression dialogue box:

The header of the dialogue box tells us what layer we’re selecting the features from. We will combine the field name with other operators to build an expression on the left side of the box.

We want to select just those cities with the status of “National and provincial capital.” To do this follow these steps:

  • Expand Fields and Values
  • Double-click on the Status field and it will appear in the expression box on the left (the Values sub-window will appear on the right)
  • Click All Unique (this will load all six unique field values)
  • Double-click = operator (you can access all operators e.g. +,–,*,<,>… by expanding Operators)
  • Double-click National and provincial capital
  • Your query should look like this: “STATUS” = ‘National and provincial capital’
  • Click Select features
The Attribute Table header will tell you how many features were selected: we have selected 158 cities.

We will now save those 158 cities as a separate shapefile:

  • Right-click on World_Cities in the Layers Panel and choose Export > Save Features As...
  • The Save Vector Layer as… dialogue box will open
  • Set “Format” to ESRI Shapefile
  • Navigate to your working folder and save shapefile as National_provincial_capitals.shp

On your own: create a separate shapefile containing cities with populations greater than two million and save it as World_Cities_2mil.shp (your expression should read “POP” > 2000000).

To query fields containing quantitative information, such as `POP` (Integer), type values directly in the expression box.

Select by Location

The Select By Location tool allows you to select features based on their location relative to features in another layer. For instance, you may want to know how many cities with populations greater than two million are in India:

  • Interactively select India (or use the expression “NAME” = ‘India’)
  • On the main menu, navigate to Vector>Research Tools>Select By Location

The Select by location dialogue box will open. Set the following parameters:

  • “Select features from” →World_Cities_2mil.shp
  • “Where the features…” →are within
  • “By comparing the features from” →admin_0_countries
  • Make sure the Selected features only option is checked
  • Click Run
Running these parameters (with “India” from `admin_0_countries` selected) will create a new selection of thirteen Indian cities with populations larger than two million.

Adding Raster Data

Navigate to ../Raster/Nightlights in the Browser panel and drag BlackMarble_2016_3km_gray_geo.tif to the scene. Toggle off the visibility of all other layers.

Nighttime lights imagery has been widely embraced as a proxy for urbanization, population distribution, and economic activity. For more information on the Nighttime Lights dataset, read this article.

Raster data consists of a matrix of pixels (or cells) organized into a grid. Each pixel contains a value representing information: in this case, the values represent the average amount of light emitted at nighttime within a year as captured by the NOAA’s Suomi NPP satellite. Unlike a vector data layer which has an Attribute Table, and each point, line or polygon can have multiple values associated with it, a raster grid cell can only have one value. The raster we’re using here is a single-layer 8-bit grayscale image, meaning that its values range from 0 to 255.

Single-layer grayscale image, showing the digital numbers on the left, and their visualization on the right. By Anders Knudby, CC BY 4.0.

Not all raster data are satellite images; examples of other types of raster datasets include elevation, population (cell values corresponding to the number of people living in each grid cell), land use/land cover class, etc.

Intro to Symbology

We are now entering the design phase of our mapping project. To start, we will change the appearance of the World Cities layer. (Toggle off the Nightlights and bring back the Cities.)

Click through Project > Properties..., and in the General tab, change the “Background Color” to black (the data frame’s background color is serving as the color of the area not covered by any layers).

Proportional Symbols & Colors

We will symbolize the Cities layer with circles that are sized proportionally to their total populations — a city with a larger population will have a larger circle and vice versa:

  • Right-click on the World_Cities layer and choose Properties...
  • In the Symbology tab (third from the top), choose Graduated.
  • Set the “Value” to POP, “Method” to Size, and “Mode” to Natural Breaks (Jenks).
  • Click Classify and then click Apply. The populated places will now be sized according to their population.

Let’s switch the “Method” from size to color and observe the results:

To render more populous cities brighter on the map, click anywhere on the color ramp, set `Color 1` to `Transparent` and `Color 2` to yellow (or choose a different color combination altogether).

Bonus: what if you want to control both color and size of the symbols?

Once you’ve symbolized the layer based on color, click on Symbol (dot) to open the Symbol Settings dialogue box and then click on “Simple Marker.” Next to the Size setting, there is a small icon:

  • Click on it and select Assistant
  • In the pop-up box, set “Source” to POP
  • Click the Fetch value range from layer (blue curly arrows) button
  • Set the desired size range and click OK

On your own: experiment with symbol colors and size ranges based on different field values.

Labels

Labels are textual information displayed on maps. They add details you could not necessarily represent using symbols or geometry. In a GIS software environment, a label refers specifically to a piece of text on the map that is dynamically placed and whose text string is derived from one or more feature attributes. Technically, any information stored in the Attribute Table of a vector layer can be textually displayed as a label on a map.

To label cities on our map follow these steps:

  • Right-click on the World_Cities layer and choose Properties...
  • In the Labels tab (fourth from the top), choose Single Labels
  • Set “Value” to CITY_NAME
  • Choose desired label font, style, size, and color (refer to the “Text Sample” appearance)
  • Click OK

This will display the name of every single city in our World Cities feature class:

To label only a subset of the cities, and make the map more legible, we can create a filter querying the World Cities feature class within the Labels tab using expressions (the same method we used to Select by Attributes):

  • In the Labels tab, choose Rule-based Labeling
  • Click Add rule (green “+” button) and a new window will pop up
  • Open Expression String Builder by clicking the ε button by the “Filter” field
  • Build a query: “POP_RANK” = 1 (this will filter cities with populations of 5,000,000 and greater)
  • Scroll down and set desired label font, style, size, and color
  • Click OK and save your map project
Formatting the label text

Raster Symbology

Next, we will change the symbology of our Nighttime Lights layer. Make it visible and drag it on top of the Layers panel.

Open the Properties menu for the Nighttime Lights layer. Navigate to the Symbology tab. You’ll notice that this looks different than the style menu we have been working with for our vector layers. Instead of Symbol type, we have a “Render type” field, and many options for how to color the bands in our dataset. Given that we are working with a single-layer grayscale image, our symbology options are fairly limited. Still, we can experiment with color gradients. (Raster datasets can be made up of multiple bands, such as the multispectral satellite imagery. The full power of raster symbology would be better showcased if we were working with such images, but that’s outside the scope of this tutorial.)

If you look at the expandable section labeled Min / Max Value Settings, you will notice that the min and max values for Color gradients do not need to correspond to the absolute value range of the raster (0–255); they can also be calculated based on a Cumulative count cut, or Mean +/– standard deviation * X, both of which are used to get rid of outliers. The Cumulative count cut means that QGIS is only taking into account the values between 2% and 98%, in the default case. Do this and click Apply to see how the image changes:

Now change the “Render type” to Singleband pseudocolor to get something more similar to a symbology we would do for a vector file:

  • Set the Mode from Continuous to Equal Interval
  • Change the “Color gradient” to a ramp with at least three distinct colors
  • Click Classify to load the values and then hit Apply to see it on the map
  • Save your map project

Experiment with your map by changing symbology, adding labels and combining multiple raster and vector layers added to the scene.

Parts of this tutorial were adapted from Leah Meisterlin and Dare Brawley, originally written for the Mapping for the Urban Humanities workshop, hosted by the Center for Spatial Research at Columbia University. More resources and tools for critical mapping and data visualization are available here.

{L01}

ASSIGNMENT OVERVIEW

For {L01}, explore the workflows from the tutorial using datasets from the class DATA repo(sitory) on Dropbox. You may choose to work with any combination of raster and vector data, including the datasets we’ve used in the tutorial. You must use at least one other dataset for your original lab work. You may also choose to work with any other similarly-scaled datasets with which you’re familiar that aren’t included in the repo. However, this lab is not about finding, managing and manipulating datasets, so make sure you’re comfortable working with any data outside the sets we’ve prepared.

Compose a map using your selected datasets that responds in some way to the questions we posed in lecture:

  • What counts as “human activity”?
  • How is it measured?

In addition to those prompts, consider some of the follow-ups to which we alluded in discussion:

  • How does coupling particular datasets — whether two, three, or more — make a specific spatial and/or historical argument about what constitutes measurable human activity?
  • Given the different coupling of datasets, who or what are the protagonists of that regime of activity?
  • How do the datasets you choose represent the nature of that activity? As “damage”? As “impact”? As some other kind of transformation?
  • Who or what is left out by foregrounding particular datasets and the kinds of activities they represent?
  • What additional datasets or narratives would make a more robust argument for the regime of activity you’ve represented? What kinds of data or narratives might challenge that argument?

Your composition may be critical, creative, or simply an exercise with the basics of working in QGIS. Stay loose, experiment, and have fun!

DELIVERABLES

For your {L01} post to the Lab WIP channel in Are.na:

  1. After developing an original map with the questions above in mind, take a screenshot of your work-in-progress showing a combination of raster and vector data and representational techniques from the tutorial. This doesn’t need to be a beautiful, finished product; it just needs to show a moment of the compositional process that i) demonstrates to us you’ve got a handle on the workflow and ii) struck you as interesting or important.
  2. Give your block a simple but descriptive title that gives some insight into what you’re working with and what it shows, e.g., “{L01} World Cities and Proportional Symbology” (the title of the example post).
  3. In your description, include the following:
  • A list of datasets used; if they’re from outside the class repo, make sure to link to the original source.
  • A short description of what’s represented; think of this as a “narrative legend”.
  • Tell us briefly why you chose the datasets you’ve used and how your composition responds to the “human activity” and “measurement” prompts above.
  • Reflect on some aspect of the workflow with a view to the strengths, limitations, and/or biases of particular techniques or datasets in constructing a visual argument about how “human activity” is represented. This could be as simple as commenting on how color choices change whether we perceive “human activity” as impact, achievement, or damage; it could be a set of follow-up questions you want to ask about a workflow or more advanced technique; it could be about how thinking through the metrics used for a given dataset are already constitute a problematic theoretical framework (recall the definition of theory as an “act of seeing” or “staging”). Use your reflection as a public “bookmark” for tracking your own compositional process and as a foothold for future class discussion.

Remember to keep it concise; you don’t need to write more than 300 words (excluding, if you wish, the listing of datasets, which can have lengthy titles).

--

--