Stata Graphs: More than just a heatmap

Published in

The Stata Gallery

11 min readNov 13, 2022

This post is about a data visualization I recently published. It guides you through some ideas and twists for how to create this or a similar graph in Stata. But first, let’s go through the figure and its elements.

(at the end of the post is a higher-resolution version of this figure)

The figure plots the feelings (affect) that supporters of all major parties hold towards their own and all other parties in Germany, and how that has changed over time. If you want to read more about the substantial insights — is Germany a politically divided nation? — check out the publication.

We will start with the bigger picture of the visualization and then zoom in. The bigger picture shows a heatmap that plots the feelings supporters of parties (y-axis / rows) hold for their own and other parties (x-axis / columns). The parties are sorted according to their political left-right positions. Parties that are plotted next to each other are ideologically more similar than parties that are plotted far away from each other. When talking about partisan affect, it is common to talk about cold (negative) or warm (positive) feelings. The colors of the tiles represent the average warmth across time and, as is typical for the temperatures, cold feelings are displayed in blue and warm feelings in red. The medium, neutral or lukewarm temperatures are shown in light gray. We can see that people tend to like their own parties, shown by the red diagonal line. The further we more away from that diagonal, the bluer the tiles become. This shows that, as people rate more ideologically distant parties, they feelings become colder and colder. However, not all these changes are gradual. The rightmost column is almost completely dark blue, meaning that supporters from all other parties agree in disliking the radical right-wing AfD.

Once we zoom into the figure, we see that the visualization is more than a regular heatmap because each tile contains a curve, which represents time trends. For example, look at the second row, first column. This shows us the affect that supporters of the party “the Greens” hold towards the party “the Left/PDS”. The background color, the printed number, and the white horizontal line all represent the average feelings across time. The black curve shows a smoothed time-trend. The white reference line helps to make changes across time more visible. For these feelings from the Greens to the Left/PDS, we see moderately cold feelings overall and an upwards trend in around the last decade.

How to create this figure

Now, let’s get technical. Which steps are needed to create this figure?

In broad, the figure is created in these three steps:

Create each of the 7x7 tiles as an individual figure
Combine the 7x7 figure into one
Make several adjustments for a nicer and more readable figure

I will skip most of the data preparation, but you can find the complete replication files here. Before we get into these steps, some words on data and method, and on the general setup in Stata.

Data and method: Here is an overview of the four needed variables:

“vote” is the “respondent’s party” (shown in the y-axis / rows). For easier handling, this variable is numeric and logically ordered: the leftmost party has the value 1, the rightmost party the value 7.

           1 The Left/PDS
           2 The Greens
           3 SPD (Social dem.)
           4 FDP (Liberals)
           5 CDU (Christian dem.)
           6 CSU (Liberals)
           7 AfD (Right-wing pop.)

“thermo_1 — thermo_7”. thermo_1 are the feelings that people hold towards the party 1 (the Left/PDS), thermo_2 the feelings towards the Greens, and so forth. These feelings range from 0 (very cold / negative) to 10 (very warm / positive).
“year” is the survey period. The time information is month-specific and the variable is coded so that, e.g., January 2000 has the value 2000.00, February 2000 the value 2000.08 ( = 2000 + 1/12), and so forth.
“weight” is the survey weight.

To show the time trends, I smooth the association to reduce noise in data and identify the relationships without making strong prior assumptions about the shape of the association. For details, look at the “supplementary material” from the publication and the literature referenced there (link).

General set up in Stata: Creating good visualizations in Stata usually starts with changing the default scheme. I usually take a very minimalistic scheme, s1mono, adapt some minor aspects and add colors manually. The minor edits requires grstyle by Ben Jann, an ado that I would recommend to any Stata user. Here, I define the general look of all figures I preoduce thereafter; e.g., the sizes of fonts (7pt) or margins (zero).

* INSTALL SOME ADOS:
ssc install grstyle, replace
ssc install resize, replace 
ssc install palettes, replace

* CHANING THE DEFAULT GRAPH STYLE
set scheme s1mono
grstyle init
grstyle anglestyle vertical_tick horizontal
grstyle set size 7pt: subheading heading axis_title tick_label key_label  ///
  legend text_option body
grstyle set margin 0: graph bygraph combinegraph heading subheading body
grstyle set margin 0: axis_title
grstyle gsize axis_title_gap zero

1. Create each of the 7x7 tiles as an individual figure

Each tile of the final visualization is an individual graph that plots a time trend. In the simplest version, e.g. for feelings that supportes of the Greens hold towards the Left/PDS, it would be following code:

twoway lpoly thermo_1 year [aweight = weight] ///
 if vote == 2  ///
 , degree(1) bwidth(2)

This plots the feelings towards the party the Left/PDS (thermo_1) for the subset of respondents who voted for the Greens (if vote == 2). The options degree and bwdith determine the smoothing.

This simple version is missing several graph elements: the specification of the background color, the horizontal reference line, the bold text (“4.25”). Further, it is missing a couple of design adjustments.

The background color is central. To specify this, we can use the following option:

plotregion(fcolor( ---enter color here--- ))

Now, we could define the colors of all 7x7 tiles by hand, but that would quite tedious. We better automate it so that we can loop over it later on.

We first define a colorpatte using the colorpalette-package by Ben Jann. We can create 20 color shadings which means that there is one color for the values 0.0–0.5, one for 0.5–1.0, and so forth. I choose a scheme that takes the color “midblue” for the cold extreme-value, “gs14” for the middle value, and “red” for the hot extreme-value.

colorpalette midblue gs14 red, ipolate(20) nograph

This command then stores these 20 color values. For instance, this is how we can see the color code of the second lowest value:

. display r(p2)
62 139 240

Next, we need to match the average values with their respective color. For instance, for the 4.25-value (from the Greens to the Left/PDS), we want medium-low-value color, say number 9:

* average value of feelings from the Greens to the Left / PDS:
sum thermo_1  [weight = weight] if vote == 2
local mean = r(mean)* the color codes are stored with values from 1 to 20 --> double the average and then round it:
local color1 = round(`mean' * 2)local color ="r(p`color1')"
display "`color'"  // brings this output: r(p9)

If we put this together, the background of the figure will be colored:

twoway lpoly thermo_1 year [aweight = weight] ///
 if vote == 2  ///
 , degree(1) bwidth(2) plotregion(fcolor(   "``color''"  ))

Here is the same, but I added several smaller adjusments to the axis etc., which produces a graph that looks almost like final panel (some labels are still missing):

sum thermo_1  [weight = weight] if vote ==2
local mean = r(mean)
local lab_mean : di %9.2f `mean'local color1 = round(`mean' * 2)
local color ="r(p`color1')"colorpalette midblue gs14 red, ipolate(20) nographtwoway lpoly thermo_1 year [aweight = weight] ///
 if vote == 2  ///
 , degree(1) bwidth(2) ///
 yscale(range(0 10)) ylabel(0 10, labgap(*.3) tlength(*.3)   labcolor(none) tlcolor(none)) ///
 xlabel($minyear "   1977" 2021.917 "2021     ", labgap(*.3) tlength(*.4)   labcolor(none) tlcolor(none) ) ///
 xtitle("") ytitle("")  ///
 lwidth(medthick) lcolor(black) lpattern(solid) ///
 name(thermo_`from'`to', replace) ///
 legend(off) plotregion(fcolor(   "``color''"  ) margin(small) ) ///
 yline(`mean', lcolor(white) lwidth(thick) noextend) ///
 text(9.8 1999 "{bf:`lab_mean'}  ",  placement(center)justification(left))

Now we have 1 panel, but we want 49 (7x7). To avoid doing everything by hand, we work with double loops. The first loop defines the “respondent’s party”, which is the y-axis / rows in the figure. The second loop defines the “evaluated party”, the x-axis / columns. In other words, the first loop is about who the evaluation is from, and the scond loop about who the evaluation refers to.

forvalues from = 1/7 {  // respondent's party
....
* define some labels
...
forvalues to = 1/7 {  // evaluated party 
...
* --> produce the graph <--
...
}
}

Here is the final code for the 7x7 subgraphs. Compared the code shown above, I add several more labels and text elements.

I am working with two different font sizes, 7pt and 8pt. In the preamble, I defined all added text to be in 7pt. To change some text to 8pt, I write “size(*1.143)” (7 * 1.143 = 8).

forvalues from = 1/7 {  // respondent's party
...
if `from' == 3 {
  local lab_from1 = "SPD"
  local lab_from2 = "(Social dem.)"
...
}forvalues to = 1/7 {  // evaluated party 
...
if `to' == 4 {
  local lab_to1 = "FDP"
  local lab_to2 = "(Liberals)"
 }
...
}
}forvalues from = 1/7 {if `from' == 1 {
  local lab_from1 = "The Left/PDS"
  local lab_from2 = ""
 } ... if `from' == 7 {
  local lab_from1 = "AfD"
  local lab_from2 = "(Radical right-wing)"
 }
******
forvalues to = 1/7 {
******
 if `to' == 1 {
  local lab_to1 = "The Left/PDS"
  local lab_to2 = "" ... if `to' == 7 {
  local lab_to1 = "AfD"
  local lab_to2 = "(Radical right)"
 }sum thermo_`to'  [weight = weight] if $own_party == `from'
local mean = r(mean)
local lab_mean : di %9.2f `mean'local color1 = round(`mean') * 2local color ="r(p`color1')"colorpalette midblue gs14 red, ipolate(20) nographtwoway lpoly thermo_`to' year [aweight = weight] ///
 if vote == `from'  ///
 , degree(1) bwidth(2) ///
 yscale(range(0 10)) ylabel(0 10, labgap(*.3) tlength(*.3)   labcolor(none) tlcolor(none)) ///
 xlabel($minyear "   1977" 2021.917 "2021     ", labgap(*.3) tlength(*.4)   labcolor(none) tlcolor(none) ) ///
 xtitle("") ytitle("")  ///
 lwidth(medthick) lcolor(black) lpattern(solid) ///
 name(thermo_`from'`to', replace) ///
 legend(off) plotregion(fcolor(   "``color''"  ) margin(small) ) ///
 yline(`mean', lcolor(white) lwidth(thick) noextend) ///
 text( 5 1996  "{bf:`lab_from1'}" "{bf:`lab_from2'}", color(none) size(*1.143) placement(west)justification(right) ) ///
 text( 5 1999  "{bf:`lab_to1'}" "{bf:`lab_to2'}", color(none) size(*1.143) ) ///
 text(9.8 1999 "{bf:`lab_mean'}  ",  placement(center)justification(left))  ///
 text( 1 1999 "Respondent's party",size(*1.143) color(none))  ///
 text( 1 1999 "Evaluated party",size(*1.143) color(none)) 
}
}

Here, you may notice that I include several text elements that are invisible because I defined “color(none)”. These are text elemens I only want to show in a couple of the 7x7 subgraphs. Showing for instance the axes labels on all tiles would be unnecessary and make the figure overloaded.

I first define them as invisible and make them visible in the last step. There are two reasons why I proceed like this:

Overall, adapting Stata figures (via grstyle) is surprisingly slow in terms of computation. If we want to show y-axis labels on 7/49 subfiguers, it is faster to plot them invisibly in 49 subfigures and then make them visible in 7 than vice-versa
Including all text before combining figures and adapting graph sizes helps with keeping font sizes coherent.

2. Combine the 7x7 figure into one

Now that we have the 7x7 subgraphs, we need to put them all together. For this, I use the self-written command “resizecombine” (ssc install resize). Here is what this command is for:

resize, resizecombine, and resizec1leg keep the absolute font size (usually defined in pt) constant when changing the height and/or width of figures. These commands help you control font sizes and keep them coherent across all your figures.
If you change the size of a figure using graph display or similar commands, Stata automatically rescales the font size in proportion to the minimum out of the figure’s height and width. However, this is not desired in many situations. When we want to increase the size of a figure, we often do so to fit more content into the figure. Stata’s default rescaling counters that. The new commands keep the absolute font size constant, meaning that we can increase (decrease) the figure’s size if we want to show more (less) information in the figure.

Well-chosen and coherent font sizes improve the readability of the figure and they also just make it nicer to look at. If you copy the figure produced here into a program like Word oder PowerPoint, it will have the size 16 x 18cm and the font sizes will be 7 and 8 pt.

resizecombine ///
thermo_11 ///
thermo_12 ///
...
thermo_21 ///
thermo_22 ///
...
thermo_77 ///
, ycommon  xsize(18cm) ysize(16cm) name(combined, replace) ///
imargin(zero)  graphregion(margin(t=7 l=16 r=1 b=1))

3. Make several adjustments for a nicer and more readable figure

We are almost finished, just need to give the figure some more polishing. Msot of all, we need to make several graph elements visible and move some of them to their adequate position.

First are the overall titles (“Respdent’s party” and “Evaluated party”). Currently, these text bits are included in every subfigure, but they are invisible there. With the following code, we make the text visible, rotate it, and move it to the desired location:

gr_edit plotregion1.graph1.plotregion1.textbox4.style.editstyle color(black) editcopy
gr_edit .plotregion1.graph1.plotregion1.textbox4._set_orientation vertical
gr_edit .plotregion1.graph1.plotregion1.textbox4.DragBy -38 -104

With the further steps, we can make the y-axes labels visible on the left-hand column and the x-axes labels visiible on the botton row:

* MAKE OVERALL TITLES VISIBLE:
gr_edit plotregion1.graph1.plotregion1.textbox5.style.editstyle color(black) editcopy
gr_edit .plotregion1.graph1.plotregion1.textbox5.DragBy 16 193* MAKE Y-AXIS VISIBLE ON THE LEFT-HAND PLOTS:
foreach n in 1 8 15 22 29 36 43 {
gr_edit plotregion1.graph`n'.yaxis1.style.editstyle majorstyle(tickstyle(textstyle(color(black)))) editcopy
gr_edit plotregion1.graph`n'.yaxis1.style.editstyle majorstyle(tickstyle(linestyle(color(black)))) editcopy
}* MAKE X-AXIS VISIBLE ON THE BOTTOM PLOTS:
forvalues n = 43 / 49 {
gr_edit plotregion1.graph`n'.xaxis1.style.editstyle majorstyle(tickstyle(textstyle(color(black)))) editcopy
gr_edit plotregion1.graph`n'.xaxis1.style.editstyle majorstyle(tickstyle(linestyle(color(black)))) editcopy
}* MAKE PARTY-LABELS VISIBLE ON THE LEFT-HAND PLOTS:
foreach n in 1 8 15 22 29 36 43 {
gr_edit plotregion1.graph`n'.plotregion1.textbox1.style.editstyle color(black) editcopy
gr_edit plotregion1.graph`n'.plotregion1.textbox1.DragBy 0 -31
}forvalues n = 1 / 7 {
gr_edit plotregion1.graph`n'.plotregion1.textbox2.style.editstyle color(black) editcopy
gr_edit plotregion1.graph`n'.plotregion1.textbox2.DragBy 8.5 0
}

For such finetuning with gr_edit, I usually proceed in a backwards manner: I make adaptations via the Graph Editor, where I adapt elements via drop-down menus or via click-and-dran. I go into Stata’s “Graph Editor”, click “start recoding”, make the changes and store the recording-file. Then, I open the recording-file in a do-file editor (or any other text editor), extract the relevant parts and move them to my do-file. Often I re-adapt them there or loop over them. Unfortunately, these procedures can be a bit tedious and prone to error because they are partly not documented in Stata.

…and we are finished! Here is what the end result looks like:

Do you have any comments or questions? Maybe you have suggestions where I could improve my Stata workflow? Let me know!

About the author

I’m a sociologist at the University of Cologne, working on (1) family, gender, and the life course, (2) political attitudes, (3) sustainability and mobility behavior, and (4) quantitative research methods. When I have something exciting to share about my research, you can read about it on Twitter. You can also reach me via mail: hudde@wiso.uni-koeln.de