Visualising Hidden Insights: Enhancing Data Interpretation with allcatplot in Stata

Ketki Samel
The Stata Gallery
Published in
7 min readJul 11, 2024

--

When analysing survey data with Likert Scales or predefined response categories, it is crucial to ensure that all potential responses are represented in your visualisations. Traditional plotting methods often omit categories that have no responses, which can skew the interpretation. For instance, if a bar plot omits the ‘Very Dissatisfied’ category with zero responses in a satisfaction survey, it might misleadingly suggest that respondents only had a scale starting from ‘Dissatisfied’. The allcatplot program in Stata, written by Kabira Namit, Zaeen de Souza and Prabhmeet Kaur Matta addresses this issue by including all predefined response categories in graphs, even those not present in the dataset.

Importance of allcatplot in data visualisation

Imagine you’re conducting a survey in a classroom where students are asked to rate their understanding of a topic on a scale of 1 to 5, where:

  • 1 means “Poor”
  • 2 means “Below Average”
  • 3 means “Adequate”
  • 4 means “Good”
  • 5 means “Excellent”

After collecting the responses, you find that no student rated their understanding as “Poor” (1) or “Below Average” (2). If you create a regular bar graph of the results, it might only show bars for the ratings “Adequate” (3), “Good” (4), and “Excellent” (5), completely omitting the categories for “Poor” (1) and “Below Average” (2).

This omission can give a misleading representation, suggesting that the lower ratings were never an option or were not considered by the students.

This issue is addressed by allcatplot, which ensures that all predefined rating categories (1 to 5) are included in the graph. Even though no students selected “Very Poor” (1) or “Poor” (2), these categories will still appear as bars with zero height in the graph. This way, viewers can see that these categories were available choices, and it gives a complete and accurate representation of the survey results.

Apart from this primary function, allcatplot also has other features like customizing the order and label of categories, disaggregating data by groups, plotting actual frequencies instead of percentages, and even selectively displaying specific categories (contrary to its name!). Additionally, graph title can be modified, information about missing observations can be included in the subtitle, and the type of graph can also be changed.

Prerequisites for using allcatplot

This program can be run on all versions of Stata (Stata 11 onwards). It requires the elabel and splitvallabels packages for full functionality. The following commands should be used for installing these packages:

ssc install elabel, replace
ssc install splitvallabels

Usage of allcatplot through examples

To install allcatplot, execute the following command:

ssc install allcatplot, replace 

Syntax for allcatplot:

allcatplot varlist [IF] [IN], [Over(varname)] [List(string)]
[RElabel(string)] [Freq] [Sort] [Title(string)] [Missing]
[Recast(string)] [Graphopts(string)]

Note: For the graphs, I have used the scheme ‘white_tableau’ from schemepack by Asjad Naqvi. To use this scheme, execute the following commands:

ssc install schemepack, replace
set scheme white_tableau

(1) Let us first look at the basic functionality of allcatplot through an example:

sysuse nlsw88.dta, clear 
allcatplot race

* Replacing observations of ‘other’ category to missing
replace race = . if race == 3
allcatplot race

Having seen the primary function of allcatplot, now we can look at the other additional features which can be used to customize the graph as per our requirements.

(2) At times, original labels might be coded, making it difficult for the audience to understand the data. When presenting data to non-experts, using clear, everyday language instead of technical jargon or codes can make the information more accessible. We can relabel and customize the names of the categories using the ‘relabel’ option. Using the same data, let us modify the category names from ‘white’, ‘black’, ‘other’ to ‘WHITE’, ‘BLACK’, ‘OTHER’:

allcatplot race, relabel(WHITE BLACK OTHER) 

(3) In datasets with many categories, plotting all categories can result in a cluttered and confusing graph. Selecting only the most relevant categories can make the graph more readable and easier to interpret. The ‘list’ option of allcatplot allows us to do so. Additionally, we can also customize the order of the categories we want to plot. Let us try this out using the same dataset:

* Plotting category 3, 6 and 8 of the variable occupation
allcatplot occupation, list(3 6 8)

* Changing the order of the categories to 8, 6 and 3
allcatplot occupation, list(8 6 3)

Apart from this, an extra category, which does not occur in the dataset can also be included:

allcatplot occupation, list(8 6 3 14) relabel(Laborers Operatives Sales Military)

(4) At times, during data analysis, the number of times that a variable takes a certain value is more relevant than the proportion of that value. While percentages are useful in many scenarios, actual frequencies offer a straightforward, unambiguous view of the data. Thus, sometimes, we might feel the need to plot the actual frequencies instead of percentage values in the graph. This can be done using the ‘freq’ option.

allcatplot race, freq

(5) Sorting categories in a logical order helps the audience follow the data narrative more intuitively. The ‘sort’ option in allcatplot can be used to sort the categories in descending order. This time, we shall use a new dataset:

sysuse auto.dta, clear
allcatplot rep78
allcatplot rep78, sort

(6) For thorough reporting, especially in academic or professional contexts, it is important to disclose all aspects of the data, including missing values. Inclusion of information about missing data to ensure completeness and transparency. We can add information about the number of missing observations as a subtitle using the ‘missing’ option. Adding a bar for missing observations requires a couple of additional steps.

* Adding information about the number of missing observations as a subtitle
allcatplot rep78, missing
* Adding a missing bar
sysuse auto.dta, clear
tostring rep78, gen(rep78_string)
replace rep78_string = "Missing" if rep78_string == "."
allcatplot rep78_string, sort

(7) The title of the graph can also be customized using the ‘title’ option

allcatplot rep78, title(1978: Repair Records)

(8) In order to improve data visualization, enhance interpretability, or tailor the presentation to the audience or specific analytical needs, changing the graph type might be required, which can be done using the ‘recast’ option.

allcatplot rep78, recast(hbar)
allcatplot rep78, freq recast(dot) graphopts(ylabel(0(10)50))

(9) Disaggregating data provides a clearer picture of performance and outcomes across different areas. For disaggregating data by groups, ‘over’ option can be used.

sysuse auto.dta, clear
allcatplot rep78, over(foreign) relabel(A B C D E) freq
allcatplot rep78, over(foreign) missing freq recast(dot) graphopts(linegap(60))
allcatplot rep78, over(foreign) sort
decode foreign, gen(foreign_decode)
allcatplot rep78, over(foreign_decode) missing

I hope that this guide has been useful for understanding the allcatplot program. You can always use the help command for explanations and examples:

help allcatplot

About the Program Creators

Kabira Namit is a Consultant at the World Bank and a Doctoral Teaching Fellow at the University of Oxford.

He can be reached at: knamit@worldbank.org

Zaeen de Souza currently works as an Economist (ODI Fellow) at National Social Protection Secretariat — The Gambia.

Prabhmeet Kaur Matta currently works as a Research Assistant at Centre for the Study of African Economies, University of Oxford.

About the author

Ketki Samel works as a Research Associate at Frontline Impact. She has completed her Master’s in Economics from St. Xavier’s College, Mumbai.

--

--