Maps in Stata III: geoplot

Asjad Naqvi
The Stata Guide
Published in
22 min readJul 2, 2023

--

(Revised: 15 July 2024)

In this guide, we will learn to create the maps shown in the banner, and many more, using the geoplot package. Geoplot is the latest package by star Stata programmer Ben Jann, whose palettes package is extensively covered in previous Stata guides (for example see customizing color schemes) and I personally rely on it for all my own Stata packages.

The geoplot command supercedes spmap (discussed extensively in previous map guides) as it allow users to plot as many layers as they want through the use of data frames. This is a major upgrade over the classic spmap which focuses on one data layer, and with additional layers requiring some creative workaround solutions. On the down side, the use of frames restricts geoplot to newer Stata versions (v18 recommended but most features work with v16 and v17 as well). Given that version 18 has been out for a while now, most likely users already have access to this version.

In this guide we will cover the core customization options of different layer types: polygons, lines, and points. Additionally, we will also learn how to add and customize legends, create insets, work with projections, and arrows and scalebars.

The topics covered in this guide are sufficient for most usecase scenarios and provide entry points for further customizations using the full set of geoplot features.

A bit of a background story to geoplot: Ben initially discussed the idea of writing this program in November 2022 when I was visiting Bern for the Stata Switzerland conference. In May 2023, Ben messaged saying that the first version is out and I should check it. Over the next weeks, I did extensive testing of the command and provided as much feedback as I could. Ben, being a veteran programmer, went far beyond any expections to put together a long list of very solid and flexible set of features. We continue to exchange ideas and discuss the package and many new exciting features are planned.

All in all, geoplot provides a massive quality-of-life improvement for making maps in Stata. So let’s get started!

Preamble

Like previous guides, a basic knowledge of Stata is assumed. For this guide, you should ideally have Stata 18 or higher to make sure you can use frames. This because certain features like clegend() require Stata 18. In any case, you need at lest Stata 16 to make sure you can use frames.

Install geoplot either from:

  • SSC:
ssc install geoplot, replace
  • or GitHub (might be prefereable as the package is going through frequent updates):
net install geoplot, replace from("https://raw.githubusercontent.com/benjann/geoplot/main/")

Regardless, where you install it from, just make sure that you are using the latest version. This guide is written using v1.2.3 released on 13th July 2024:

Since the command is still evolving, parts of the code discussed in this guide might not work. Please report these to ensure that the guide remains as up-to-date as possible.

  • Make sure that Ben Jann’s palettes package (more on colors in Guide 2 and the Color guide) are installed and/or updated (extremely important for the new version of geoplot):
ssc install palettes, replace
ssc install colrspace, replace
ssc install moremata, replace
  • Optional step: Typically, I set the default graph font to Arial Narrow (or other fonts) to better display large labels (see the Font guide on customizing fonts):
graph set window fontface "Arial Narrow"

Data

In Stata shapefiles need to be converted into Stata format first:

spshape2dta xx, replace saving(yy) 

where xx is the name of the shapefile, and yy is the name of the shapefile. So for example, if we are working with the raw GIS data, for example, Eurostat data we have discussed in previous guides, we can type something like:

spshape2dta NUTS_RG_03M_2021_3035, replace saving(nuts_all) 

which will give us nuts_all.dta and nuts_all_shp.dta or the attributes and outline files respectively. Note, that geoplot can now also convert files directly and auto create frames from them, but we will leave this for now. But since we have covered how to read and fix shapefiles in previous guides, we will start directly from processed data. Links to raw data are still provided in case you want to download these yourself. In this case, going through old guides is also highly recommended.

For now, all the data used in this guide is available on the Stata Guide’s GitHub repository in the GIS folder:

Here download the following files (also highlighted above):

  • nuts0, nuts0_shp, nuts1, nuts1_shp, nuts2, nuts2_shp, nuts3, nuts3_shp
  • demo_r_pjanind3_2022
  • cities, cities_shp, urb_cpop1_clean
  • countries, countries_shp, rivers, rivers_shp

The first set of files represent official administrative boundaries ranging from NUTS 0 (countries) to NUTS 3 (smallest homogenized regions) for Europe. These files have already been cleaned and processed and are ready to use in Stata. The second file contains cleaned demographic indicators from Eurostat. The third set contains city locations and urban indicators. The fourth set contains, global boundaries of countries and rivers.

Once you set your working directory:

clear
cd "<directory here>"

you can either:

  1. Directly download the files using the following script, e.g.
local filelist nuts0 nuts0_shp nuts1 nuts1_shp nuts2 nuts2_shp nuts3 nuts3_shp demo_r_pjanind3_2022 cities cities_shp urb_cpop1_clean countries countries_shp rivers rivers_shp

foreach x of local filelist {

copy "https://github.com/asjadnaqvi/The-Stata-Guide/raw/master/GIS/`x'.dta" "`x'.dta", replace

}

(Once you run the above script and get all the files, then please remember to mark out this script since you don’t need to pull the files every time you run the dofile.)

or

2. Manually download them from GitHub and copy them in a local folder on your computer. Make sure that you set the path to the directory where you have the files.

Preparing the data in geoplot

As discussed in previous map guides, each map layer requires at least two inputs:

  1. An attributes files
  2. A boundary shape file

Since we want to plot the data at the lowest NUTS 3 tier, let’s prepare the NUTS 3 attributes file first:

use nuts3, clear
merge 1:1 NUTS_ID using demo_r_pjanind3_2022
drop if _m==2
drop _m
save nuts3_data.dta, replace

Next step: let’s define a frame for each layer using the geoframe command:

geoframe create nuts0 nuts0.dta, replace shp(nuts0_shp)

After geoframe create, we specify the the name of the frame, and if required, the name of the attributes file if it is different from the frame name. After the comma, we define the corresponding shapefile. But since geoframe is consistent with shapefile names, we can also just say:

geoframe create nuts0, replace 

and this will assume that the input file is called nuts0.dta, and the shapefile is called nuts0_shp.dta. Each geoframe create also provides a set of useful summary statistics:

Note that the map files need to contain standard variables such as _ID, _CX, _CY, _Y, _X created using Stata’s spshape2dta command. If you are using other shapefiles to Stata converters, such as the old shp2dta command, then you may need to define additional options to select the appropriate variables. See the geoplot page or helpfile for examples.

Following the above logic, we can define the next set of geoframes:

geoframe create nuts1, replace 
geoframe create nuts2, replace

But for NUTS 3, since we merged our data and saved it as a new filename, we use the full version of the command:

geoframe create nuts3 nuts3_data.dta, replace shpfile(nuts3_shp)

Working with frames

Since the setup creates a bunch of frames, my recommendation is to switch to the core frame which contains the data we want to manipulate. At no point should you read data and replace frames data unless you are fully aware of what you are doing. If you are new to frames then the Frames guide is highly recommended.

Also note that it does not matter which frame you are in (active frame) for plotting! The geoplot command know which frame to use for which layer.

In our case, since the data we want to visualize is at the NUTS 3 level, so we switch to the nuts3 frame as follows:

frame change nuts3

format y_MEDAGEPOP %6.1f

and here maybe we can add more data or generate new variables. For now, we simply format our variable of interest y_MEDAGEPOP, that contains the median population of each NUTS 3 region. Also feel free to play around with other indicators in the file.

Let’s make maps!

At this point everything is set up so we are ready to make our first map. First let’s check if all the layers are working properly:

geoplot ///
(area nuts0) ///
(area nuts1) ///
(area nuts2) ///
(area nuts3)

which gives us our first map:

Since it is hard to differentiate the different NUTS layers, we can fine tune our map as follows:

geoplot ///
(area nuts3, lc(red) lw(0.02)) ///
(area nuts2, lc(blue) lw(0.05)) ///
(area nuts1, lc(gs6) lw(0.1)) ///
(area nuts0, lc(black) lw(0.2))

And we get a slightly better looking version:

We can also plot a variable as a choropleth map:

geoplot ///
(area nuts3 y_MEDAGEPOP)

Here we simply specify the layer type which is area in our case, the frame is nuts3 and the variable is y_MEDAGEPOP. From the above syntax, we get this map:

Here “viridis” is baked into the geoplot command as the default color scheme for ranges, which is already an improvement over the original spmap, where the default was no color scheme.

We can also define the number of categories we want to display using the levels() option:

geoplot ///
(area nuts3 y_MEDAGEPOP, levels(10))

which gives us more details:

We can also change the color schemes and reverse the colors using the color option:

geoplot ///
(area nuts3 y_MEDAGEPOP, levels(10) color(viridis, reverse))

Here one could have also just used color(, reverse) since Viridis is already the default.

The above command shows regions with a higher median age in darker colors:

Now, let’s add boundaries as additional layers:

geoplot ///
(area nuts3 y_MEDAGEPOP, level(10) color(viridis, reverse) ) ///
(line nuts1, lc(white) lw(0.04)) ///
(line nuts0, lc(black) lw(0.2))

Here we specify NUTS 1 first and NUTS 0 second to make sure that the country boundaries stay on the very top. It is good to remember that in Stata layers that are drawn last, show up on top:

Legends

Now that we have our base map in place, let’s fine tune the legend. The legend position can be changed simply by using the standard legend(pos()) option:

geoplot ///
(area nuts3 y_MEDAGEPOP, level(10) color(viridis, reverse) ) ///
(line nuts1, lc(white) lw(0.04) ) ///
(line nuts0, lc(black) lw(0.2)) ///
, legend(pos(3))

Another huge advantage of this command is that if the map is fairly tight, legends can be pushed outside of the base map’s bounding box using the outside option:

geoplot ///
(area nuts3 y_MEDAGEPOP, level(10) color(viridis, reverse)) ///
(line nuts1, lc(white) lw(0.04) ) ///
(line nuts0, lc(black) lw(0.2)) ///
, legend(pos(3) outside)

Legend style can also be modified using the the label() option:

geoplot ///
(area nuts3 y_MEDAGEPOP, level(10) color(viridis, reverse) label("@lb - @ub") ) ///
(line nuts1, lc(white) lw(0.04) ) ///
(line nuts0, lc(black) lw(0.2)) ///
, legend(pos(2) outside)

Unlike spmap which provide presets for legend styles, geoplot uses macros to completely customize legends:

And here is a slightly more advanced example, where we add the number of NUTS 3 regions in each category, and increase the legend size:

geoplot ///
(area nuts3 y_MEDAGEPOP, level(10) color(viridis, reverse) label("@lb - @ub (N=@n)") ) ///
(line nuts1, lc(white) lw(0.04) ) ///
(line nuts0, lc(black) lw(0.2)) ///
, legend(size(2.3))

Another difference between spmap and geoplot is that in the former, the legend categories using k-mean clusters, and in the latter, equal intervals. Which one works best is a matter of choice, but geoplot also allows users to use this option as follows:

geoplot ///
(area nuts3 y_MEDAGEPOP, level(10, kmean) color(viridis, reverse) label("@lb - @ub (N=@n)") ) ///
(line nuts1, lc(white) lw(0.04) ) ///
(line nuts0, lc(black) lw(0.2)) ///
, legend(size(2.3))

Which provides a slightly better spread of colors:

Other types of legend cuts are also available in the command.

Continuous legends

Ben had already introduced continous legends, in his heatplot command that I covered in an early guide. Continuous legends clegend are a new Stata 18 feature, that are also available with geoplot and they look great with a bit of fine tuning. Let’s start with our usual basic legend:

geoplot ///
(area nuts3 y_MEDAGEPOP, level(20) color(CET L20, reverse)) ///
(line nuts1, lc(white) lw(0.03) ) ///
(line nuts0, lc(black) lw(0.2))

which looks nice on the map but gives us a long list of legend items:

Note here that we can change the height of the legend by specifying, for example, clegend(height(40) pos(2)) above. The default height is set at 40% of minumum of graph height or width.

geoplot ///
(area nuts3 y_MEDAGEPOP, level(20) color(CET L20, reverse) ) ///
(line nuts1, lc(white) lw(0.03) ) ///
(line nuts0, lc(black) lw(0.2)) ///
, ///
clegend(height(40) pos(2))

The legend looks a bit nicer but we still have too many cut-offs:

We can now go a step further and make the labels compact using the zlabel() option, while also increasing the number of levels to 40 for additional detail:

geoplot ///
(area nuts3 y_MEDAGEPOP, level(40) color(CET L20, reverse) ) ///
(line nuts1, lc(white) lw(0.03) ) ///
(line nuts0, lc(black) lw(0.2)) ///
, ///
clegend(height(40) pos(2)) zlabel(#10)

clegend() allows us to display a large number of cuts in our continuous data without forcing us to use a large set of bins or define ranges. See the help file for further options available for clegend().

Points and Custom symbols

Now we come to another interesting and innovative feature of geoplot and that is point data. Note that the applications shown here, can also be easily extended to line layers except that point layers have a lot of customization options available.

We merge the base cities data with a clean version of cities attributes file that contains population information:

use cities, clear
merge 1:1 FID using urb_cpop1_clean
drop if _m==2
drop _m

gen popcat = .
replace popcat = 1 if pop < 100000
replace popcat = 2 if inrange(pop, 100000, 500000)
replace popcat = 3 if inrange(pop, 500000, 1000000)
replace popcat = 4 if pop > 1000000 & !missing(pop)
lab de popcat 1 "< 100,000" 2 "100,000–500,000" 3 "500,000–1,000,000" 4 "> 1,000,000"
lab val popcat popcat
tab popcat, m
save cities_data.dta, replace

Let’s also get rid of the outliers in the cities shapefile (Europe extends way beyond the mainland plus the UK) so we drop the islands:

use cities_shp, clear 
keep if _X > 2100000 & _Y > 1000000
sort _ID
save cities_shp, replace

and last step, we create the geoframe of the cities dataset:

geoframe create cities   cities_data.dta  , replace shp(cities_shp)

Let’s plot the cities:

geoplot ///
(area nuts1 , lc(gs8) lw(0.05)) ///
(area nuts0 , lc(black) lw(0.1)) ///
(point cities , mc(orange%50) msize(0.8))

And we get a nice scatter:

Since we have population information in the pop variable, we can also scale the points using weights:

geoplot ///
(area nuts1 , lc(gs8) lw(0.05)) ///
(area nuts0 , lc(black) lw(0.1)) ///
(point cities [w = pop], mc(orange%50) msize(0.8))

Which gives us a sense of where the large cities are.

We can also use categorical variables where we can also customize the symbols through a list:

geoplot ///
(area nuts1 , lc(gs8) lw(0.05)) ///
(area nuts0 , lc(black) lw(0.1)) ///
(point cities i.popcat , mlwidth(none) msize(0.6) msymbol(T S O D))

This plots the data using Stata 18’s new st color palette that has been made the default for categorical data:

We can also combine the two, which would make sense, e.g. if we want to show two additional dimensions. Here we just use pop for both:

geoplot ///
(area nuts1 , lc(gs8) lw(0.05)) ///
(area nuts0 , lc(black) lw(0.1)) ///
(point cities i.popcat [w = pop], color(Oranges%80) mlwidth(none) msize(0.5) msymbol(T S O D))

Legends

Legends are now highly flexible in geoplot. In addition to using the standard legend() option in geoplot, users now have the option to also use glegend() for custom implementations.

For example, in the code below we tell the command to draw the legend in two columns using the pipe | separator. The - demarcates a title option, followed by the layers to show below this title. For example:

geoplot ///
(area nuts1 , lc(gs8) lw(0.05)) ///
(area nuts0 , lc(black) lw(0.1)) ///
(point cities i.popcat [w = pop], color(tableau%80) mlwidth(none) msize(*0.4) msymbol(T S O D)) ///
, ///
glegend(layout( - "{bf:Boundaries}" 1 2 | - "{bf:City size}" 3))

Says the first column should have a title “Boundaries” under which layers 1 and 2 are show, whereas the cities layer is shown in the second column.

Now we come to an exciting feature of geoplot and that is custom markers. Here we split the cities points into two layers. The first layer is the standard marker and color scaling. The second option allows us to use symbols by calling the symbol() option. Within this, we have also have a shape() option that provides a decent selection of pre-defined markers.

In the example below, we use the pin2 shape:

geoplot ///
(area nuts1 , lc(gs8) lw(0.05)) ///
(area nuts0 , lc(black) lw(0.1)) ///
(point cities i.popcat if CITY_CPTL!="Y" [w = pop], color(Reds%70) mlwidth(none) msize(*0.4)) ///
(symbol cities if CITY_CPTL=="Y", shape(pin2) color(red) size(*0.5) label("Capital cities")) ///
, ///
glegend(layout( - "{bf:Boundaries}" 1 2 | - "{bf:City size}" 3 4))

to draw the following map. Note how the symbol layer is labeled and drawn:

One can also add as many different types of layers and shapes as one wants including even defining own shapes. Check out the helpfile for additional information.

We can now also combine various different layers to give us a fairly detailed map:

  geoplot ///
(area nuts3 y_MEDAGEPOP, level(8, kmean) color(viridis, reverse) label("@lb - @ub") ) ///
(area nuts0 , lc(black) lw(0.1)) ///
(point cities i.popcat if CITY_CPTL!="Y" [w = pop], color(Reds%70) mlwidth(0.02) mlcolor(white) msize(*0.3)) ///
(symbol cities if CITY_CPTL=="Y", shape(pin2) color(red) size(*0.5) label("Capital cities")) ///
, ///
glegend(layout( - "{bf:Median age}" 1 - "{bf:Boundaries}" 2 | - "{bf:City size}" 3 4))

Of course this map can be customized further to make the information more meaningful. But here the aim is to showcase the possibilities that exist with all these new features.

We can also show scaled markers as overlapping symbols, for which we need to call in the slegend() option.

geoplot ///
(area nuts3 y_MEDAGEPOP, level(8, kmean) color(viridis, reverse) label("@lb - @ub") ) ///
(area nuts0 , lc(white) lw(0.1) label("NUTS 0")) ///
(symbol cities if CITY_CPTL!="Y" & pop > 5e5 [w=pop], color(red%50) lcolor(white) lw(0.2) size(*0.5)) ///
(symbol cities if CITY_CPTL=="Y", shape(pin2) color(red) size(*0.5) label("Capital cities")) ///
, ///
glegend(layout( - "{bf:Median age}" 1 - "{bf:Boundaries}" 2 | - 4)) ///
slegend(1e6 "1m" 5e6 "5m", pos(3) lc(red) overlay heading("{bf:City size}") hsize(2) )

This gives us a neater set of categories. A note of caution here is that the smaller the original symbols, the more smaller the overlapping circles will be displayed. One option is to incrase the marker sizes or turn the overlapping feature off to just display a list of symbols.

Zooms and insets

We can also zoom into specific regions in the map using the zoom option. In the map below, we want to highlight Berlin and Vienna. For this we defined two additional layers for each of the city. These layers 5 and 6 are then passed onto the zoom() option which has the following structure: zoom("layer number": "magnification" "displacement in percentage" "angle of displacement", <options>). So in ths code below, we zoom layer 5, magnify is 10 times, displacement is 500 percentage of the minimum of shape width or height, and at an angle of 110 degees from the horizontal axis. We also connect it to the original plot with dotted lines and label it:

geoplot ///
(area nuts3 y_MEDAGEPOP, color(viridis, reverse) cuts(20(5)60) label("@lb - @ub") ) ///
(line nuts0 , lc(white) lw(0.1) label("NUTS 0")) ///
(symbol cities if CITY_CPTL!="Y" & pop > 5e5 [w=pop], color(red%50) lcolor(white) lw(0.2) size(*0.5)) ///
(symbol cities if CITY_CPTL=="Y", shape(pin2) color(red) size(*0.5) label("Capital cities")) ///
(area nuts3 y_MEDAGEPOP if NUTS_ID=="DE300", color(viridis, reverse) cuts(20(5)60) ) ///
(area nuts3 y_MEDAGEPOP if NUTS_ID=="AT130", color(viridis, reverse) cuts(20(5)60) ) ///
, ///
glegend(layout( - "{bf:Median age}" 1 - "{bf:Boundaries}" 2 | - 4)) ///
slegend(5e5 "500k" 1e6 "1m", pos(3) lc(red) overlay heading("{bf:City size}") hsize(2)) ///
zoom(5: 10 500 110, circle connect(lp(dash)) lcolor(black) title("Berlin", size(1.5)) ) ///
zoom(6: 12 500 20, circle connect(lp(dash)) lcolor(black) title("Vienna", size(1.5)) )

which gives us this very nice figure with cutouts:

Here we can also use the select() feature for our insets:

area nuts3 y_MEDAGEPOP, select(NUTS_ID=="DE300")

which applies colors after the original data has been processed. This removes the need for applying custom cuts in the inset layers.

We can also go a step further and clip the areas we want to show using a bounding box. Here first we clip layers using geoframe query where we also “pad” 60 percent of the minimum of bounding box’s height/width to also how surrounding regions and the number of points define how many points to take for a circle. Each clip is saved into it’s own frame as follows:

*** lets try the bounding boxes  

frame change nuts3


geoframe query bbox if NUTS_ID=="DE300", pad(60) n(50) circle
mat zoom1 = r(bbox)

geoframe query bbox if NUTS_ID=="AT130", pad(30) n(50) circle
mat zoom2 = r(bbox)

foreach frame in nuts3 nuts0 {
frame `frame': geoframe clip zoom1, into(`frame'_zoom1) replace
frame `frame': geoframe clip zoom2, into(`frame'_zoom2) replace
}

We can now use a code similar to above with the simple zoom. But instead of one layer, we now also draw two layers to show outlines as well. So in this case, we need to zoom into two layers where the code is the following:

geoplot ///
(area nuts3 y_MEDAGEPOP, color(viridis, reverse) cuts(20(5)60) label("@lb - @ub") ) ///
(line nuts0 , lc(white) lw(0.1) label("NUTS 0")) ///
(symbol cities if CITY_CPTL!="Y" & pop > 5e5 [w=pop], color(red%50) lcolor(white) lw(0.2) size(*0.5)) ///
(symbol cities if CITY_CPTL=="Y", shape(pin2) color(red) size(*0.5) label("Capital cities")) ///
(area nuts3_zoom1 y_MEDAGEPOP, color(viridis, reverse) cuts(20(5)60) ) ///
(line nuts0_zoom1, lc(white) lw(0.1)) ///
(area nuts3_zoom2 y_MEDAGEPOP, color(viridis, reverse) cuts(20(5)60) ) ///
(line nuts0_zoom2, lc(white) lw(0.1)) ///
, ///
glegend(layout( - "{bf:Median age}" 1 - "{bf:Boundaries}" 2 | - 4)) ///
slegend(5e5 "500k" 2e6 "2m" 5e6 "5m", pos(3) lc(red) overlay heading("{bf:City size}") hsize(2)) ///
zoom(5/6: 8 500 130, circle connect(lp(dash)) lcolor(black) title("Berlin", size(1.5)) ) ///
zoom(7/8: 10 500 20, circle connect(lp(dash)) lcolor(black) title("Vienna", size(1.5)) )

This gives us an additional step but the final result allows us to highlight regions not visible from the map itself.

One can also create as many zooms as one wants. Note that the colors are also controled using the cuts() option in the clipped layers. We can also ensure the consistency of the original colors scales by using the nodrop option when processing the data using geoframe clip so the original information is not lost.

This is essential because the drawing with generate auto cuts based on the data available in the zoomed layers.

Note if that you want more extensive clipping options, I have my own standalone clipgeo package with a lot of custom clipping options.

Here is a bit of a different example below:

geoplot ///
(area nuts3 y_MEDAGEPOP if CNTR_CODE=="DE", level(8, kmean) color(viridis, reverse) label("@lb - @ub") ) ///
(line nuts1 if CNTR_CODE=="DE", lc(white) lw(0.1) label("NUTS 0")) ///
, legend( outside pos(8)) ///
inset( (area nuts0, lc(black) lw(0.05)) ///
(area nuts0 if CNTR_CODE=="DE", lc(black) lw(0.05) fc(sand) ) ///
, pos(2) title("Germany") nobox size(40)) ///
margin(r=40) tight

Here we focus on Germany but highlight on Europe’s map in a small inset on the right. Note that the inset() option takes on the full range of layer options together with additional box, size, and position options for full customizations. The size is defined as a percentage of the minimum of total height or width of the map. Note that the margin option is necessary here to extend the right boundary by 40 points. This allows us to tuck in the Europe map without having it overlaid over Germany.

Mapping the world and projections

Let’s now move to the global level to explore various other options including map projections. First, let’s convert the data in Stata shapefiles and create a geoframe from these.

spshape2dta ne_50m_admin_0_countries, replace saving(countries)
spshape2dta ne_50m_rivers_lake_centerlines, replace saving(rivers)

and create geoframes:

geoframe create countries, replace
geoframe create rivers, replace feature(water)

Note the use of the feature(water). This automatically gives the layer a bluish water color. This can be done manually as well but it is a nice quality-of-life feature. See the help file for other feature() options.

Let’s switch to the “countries” frame that we just created and look at World Bank’s region definitions:

frame change countries

tab REGION_WB

We can plot these regions as well using the basic area option in geoplot and we keep the legend outside as well:

  
geoplot ///
(area countries REGION_WB, color(tableau)) ///
, ///
tight legend(position(6) rows(2) outside)

Here we do another variation of the legends:

geoplot ///
(area countries REGION_WB, color(tableau) lc(white)) ///
, ///
tight legend(position(7) size(2))

We can also add river lines:

geoplot ///
(area countries REGION_WB, color(tableau) lc(white)) ///
(line rivers, lw(0.1)) ///
, ///
tight legend(position(7) size(2))

And we can even define a background either using a pre-cooked definition, water in this case, or a custom color scheme:

geoplot ///
(area countries REGION_WB, color(tableau) lc(white)) ///
(line rivers, lw(0.1)) ///
, ///
tight legend(position(7) size(2)) ///
background(water)

Since we are dealing with maps, we can also add grids, or “graticules” that show latitute and longitude options:

geoplot ///
(area countries REGION_WB, color(tableau) lc(white)) ///
(line rivers, lw(0.1)) ///
, ///
tight legend(position(7) size(2) rowgap(1) region(color(gs15))) ///
background(water) grid(label)

Now that our base map is in place, we can start playing with different projections.

By adding a simple option project:

geoplot ///
(area countries REGION_WB, color(tableau) lc(white)) ///
(line rivers, lw(0.1)) ///
, ///
tight legend(position(7) size(2) rowgap(1) region(color(gs15))) ///
background(water) grid(label) project

The data is converted into web mercator projection and plotted:

Web Mercator, as we know, stretches the areas around the north and south poles creating large distortions.

We can also use a more classic Robinson projection:

geoplot ///
(area countries REGION_WB, color(tableau) lc(white)) ///
(line rivers, lw(0.1)) ///
, ///
tight legend(position(7) size(2) rowgap(1) region(color(gs15))) ///
background(water) grid(label) project(robinson)

Or Albers:

geoplot ///
(area countries REGION_WB, color(tableau) lc(white)) ///
(line rivers, lw(0.1)) ///
, ///
tight legend(position(7) size(2) rowgap(1) region(color(gs15))) ///
background(water) grid(label) project(albers)

Or even an Orthographic projection. The second option in orthographic rotates the globe so it allows users to zoom into specific features:

geoplot ///
(area countries REGION_WB, color(tableau) lc(white)) ///
(line rivers, lw(0.1)) ///
, ///
tight legend(position(7) size(2) rowgap(1) region(color(gs15))) ///
background(water) grid(label) project(orthographic 1 30)

Check the helpfile help geoframe##project for the full set of options on projections:

Putting different layers together

Now we can even draw the Germany map shown above but rather than using the EU level map, we can use the global map layers as follows:

geoplot ///
(area nuts3 y_MEDAGEPOP if CNTR_CODE=="DE", level(8, kmean) color(viridis, reverse) label("@lb - @ub") ) ///
(line nuts1 if CNTR_CODE=="DE", lc(white) lw(0.1) label("NUTS 0")) ///
, legend( outside pos(8)) ///
inset((area countries, lw(0.01) col(white) lc(black)) ///
(area countries if ADMIN=="Germany", col(red) lw(0.01)) ///
, nobox size(40) pos(2) title("Germany is here") ///
project(orthographic 1 20) ///
backgr(water lc(gray) limits(-180 180 -90 90))) ///
margin(r=40) tight

which gives us a neat way of showing a complete picture. While people are likely to know where Germany is, a lot of people cannot identify the location of most countries on a map. So please use this option to create nice insets!

In the last step, we add all the bells and whistles such as titles, arrow, scale bar:

geoplot ///
(area nuts3 y_MEDAGEPOP if CNTR_CODE=="DE", level(8, kmean) color(viridis, reverse) label("@lb - @ub") ) ///
(line nuts1 if CNTR_CODE=="DE", lc(white) lw(0.1) label("NUTS 0")) ///
, legend( pos(3)) ///
inset((area countries, lw(0.01) col(white) lc(black)) ///
(area countries if ADMIN=="Germany", col(red) lw(0.01)) ///
, nobox size(35) pos(2) title("Germany is here") ///
project(orthographic 1 20) ///
backgr(water lc(gray) limits(-180 180 -90 90))) ///
margin(r=40 b=10) tight ///
title("Median age of Germany", margin(medium)) ///
compass(size(8) type(2) pos(11)) sbar(length(200) units(km))

to get a very comprehensive map!

Given all these options, the possibilities for drawing high quality maps has come a long way in Stata. Please try these options out and create your own maps.

And that is it for this guide! Hope you liked it. This guide might still be updated to reflect new and upcoming features.

About the author

I am an economist by profession and I have been using Stata since 2003. I am currently based in Vienna, Austria. You can see my profile, research, and projects on GitHub or on my website. You can connect with me via Medium, Twitter/X, LinkedIn, or simply via email: asjadnaqvi@gmail.com. If you have questions regarding the Guide or Stata in general post them on The Code Block Discord server.

The Stata Guide releases awesome new content regularly. Subscribe, Clap, and/or Follow the guide if you like the content.

--

--

Asjad Naqvi
The Stata Guide

Here you will find stuff on Stata, data visualizations, data wrangling, workflows, and programming.