Process documentation: Democracy and education expenditure visualised

Bishal Goswami
10 min readMay 8, 2022

--

(This work was done as a part of PhD coursework in the subject Mini Project-1 at IDC, IIT Bombay towards fulfilling the requirements of IITB-Monash Research Academy)

Education is one of the key factors that ensures democratic values to be upheld in a nation. Although the causal relationship between democracy and education is unclear, it was worth exploring the connection between the two. It is quite clear from the visualisations in my previous blog that there is a definite relation between the kind of regimes in the world and their expenditure on education. This blog documents the process and finer details of those visualisation exercises.

I began with two questions before visualising the data sets:

1. Do liberal democracies spend more on education (in terms of percentage of GDP) as compared to other forms of government?

2. In the case of India, how does the expenditure on education of a particular state compare with the median of expenditure on education?

The entire documentation process was divided into four parts: data collection, data processing, exploratory visualisation, final visualisation.

1. Data collection

World education expenditure:

For the relation between democracy and education to be established, I needed two sets of data. The first set should have different government’s expenditure on education as a percentage of their GDP in the world and the second set should have types of governments in the world. The first data set was extracted from Our World in Data ‘Global Education’ section. This data set appears in the section ‘Cross-country spending patterns’ on their ‘Global Education’ page. The data set was available for almost five decades, from 1970 to 2019 for 167 countries in the world. However, the data was inconsistent across countries and data for certain years were missing. The second data set required me to collect the list of different governments that ruled these 167 countries for the same timeline (1970–2019) to establish the relation. Available data sets were largely unconsolidated. However, there is a data set available, developed by V-Dem, that was used in the Our World in Data ‘Democracy’ section. This data set categorises the countries of the world under four political regimes, namely, closed autocracy, electoral autocracy, electoral democracy and liberal democracy.

India education expenditure:

For the case of India, I needed one data set. It should have the expenditure on education as a percentage of GSDP of different states of India, across a timeline. While going through the bulk of secondary data with the help of literature, I came across a research article by Mr Gunwant B Gadbade and Dr Chandrakant N Kokate. The paper was about an interstate analysis of public expenditure on education in India and the data set was available for 18 states and the period 1990–2019, where the data for the year 2019 was projected data. Additionally, I also tried to relate the expenditure on education with the literacy rate of a particular state. The census data on literacy rates was extracted from Reserve Bank of India publications webpage.

2. Data processing

For the world education expenditure visualisation, the two data sets from different sources were merged together on R software. The code used for the same can be found here. While processing the data, it was found out that the data set on political regimes was richer in terms of both, number of countries and number of years. However, in the case of data set on expenditure on education, it was only available for 167 countries for a period 1970–2019, with a lot of exceptions because of missing data for different countries in different years. Nonetheless, the data set was good enough to look for the intended relation. For the case of India, the data set was pretty much in shape and it was good for visualisation.

3. Exploratory visualisation:

Initial exploration

For the world education expenditure visualisation, initial exploration with the data sets was carried out on Google Sheets and R. The preliminary visualisations were also performed in R, to get some sense of the data before creating interactive visualisations. I wanted to find out world education expenditure data across the timeline. Scatter plots seemed to be the best option as I was trying to understand the relation between two variables, expenditure and regime. Fig. 1 shows the scatter plot which was visualised with the merged data set. A different shape was assigned to each kind of regime before plotting the graph. And a set of colours were automatically assigned to countries. But since it was a limited set of colours, they were repeated and multiple countries started to appear in the same colour. That added to a lot of confusion.

Fig. 1: Scatter plot showing the world education expenditure data (% GDP) across time in relation to the corresponding regime

To avoid the visual clutter, this time, only BRICS countries (Brazil, Russia, India, China and South Africa) (Fig. 2) were chosen. Expenditure on education (% of GDP) was plotted against the timeline 1970–2019. Each country was denoted by a different colour. This time, the graph was more legible and one could clearly see the patterns.

Fig. 2: Education expenditure data (% GDP) across time in relation to the corresponding regime for BRICS countries

But the idea was to plot this data for the whole world and not just limited to BRICS countries. Also, the idea was to see the relationship between the kind of regime and expenditure on education. That was not quite clear from the earlier graphs. Although they were showing the change in expenditure on education over time for different countries along with the trajectory of regimes, the relation was hard to read. So, I decided to plot the medians of expenditure for each kind of regime over the timeline. This was done using box plots (Fig. 3) for each regime across the entire timeline for all the 167 countries. This graph was the most insightful one so far, as it clearly established the relation between the expenditure on education and the kind of regime across the timeline. It could be seen from the graph that liberal democracies spend more on education as compared to other forms of regimes in the world.

Fig. 3: Expenditure on education (% of GDP) by countries of the world grouped by the type of government (regime) for data from 1970 to 2019

Although informative, the graph wasn’t an interactive visualisation and I did not have the required skill sets in R to make it one. My supervisors introduced me to a range of options over the internet, such as Datawrapper, Flourish Studio and Raw Graphs. Out of these three, I explored two and found Flourish Studio to be the one that was giving me a wide range of options for visualisations and better controls. So, I went ahead with that.

For the case of India, the explorations were directly performed on Flourish Studio, as the data set was pretty much in shape. A Hans Rosling scatter plot (Fig. 4) was created, where time was plotted on the X-axis and expenditure of different states on the Y-axis. In this chart, the median was treated as a ‘State’ category. So, the values of the median for different years appeared as different dots in those years, just like the values of different states. The filter control was there for ‘State’. Because of that, it was not possible to see both, ‘Median’ (treated as one state) and a particular ‘State’ at the same time. Therefore, the idea of Hans Rosling chart to visualise this data set was dismissed and it was realised that a line chart with appropriate filters would be more useful.

Fig. 4: Hans Rosling scatter plot showing the trends in expenditure on education (% of GSDP) for Indian states along with the median of expenditure (black) plotted as a category just like a ‘State’

Detailed exploration on Flourish Studio

Through my experience, I found that Flourish Studio offers a wide range of visualisation options including line charts, bar charts, pie charts, maps, scatter plots, hierarchy charts, heat maps, radar charts, and Sankey diagrams. For the world education expenditure visualisation, since I was exploring scatter plots for my visualisations earlier on R, I thought of adhering to that as it was informative and was perfectly able to establish the relation that I intended to observe. I was not sure about the rest of the options for its appropriateness. Amongst the different options of scatter plots to offer by Flourish, I selected the Hans Rosling chart to visualise my data sets as I wanted to show change over time. For the case of India, since I failed to visualise the expenditure of a particular state alongside the median of expenditure on Hans Rosling chart, I chose the Line chart (searchable) template to visualise the data set.

4. Final visualisation on Flourish Studio

Disclaimers for interpreting the visualisations

  1. For the world education expenditure visualisation, the current charts solely depend on the available data that was extracted from Our World in Data. There is a range of missing data which includes some countries; education expenditure data of some countries for some years; and regimes of some countries for some years. The current data set has full/partial information of 167 countries across the chosen timeline 1970–2019.
  2. The data set containing regimes of different countries was a larger data set in terms of number of countries and number of years as compared to the data set containing the information on education expenditure of different countries. While merging the two data sets, only those countries and those years were considered for which information on education expenditure was available. This was to ensure that a relation between education and regime could be studied.
  3. Since there was a range of missing data, the box plots which plotted the medians of expenditure on education for 167 countries might not be a true representation of medians for a particular year or regime. Although the graphs show some indicative trends, please interpret the visualisations cautiously.
  4. For the case of India, the chart is solely based on the data set which included only 18 states. The median is purely based on the available data. Andhra Pradesh was an undivided state during the larger part of the period considered in the dataset.

Interactive visualisations

Note: The following visualisations use the same legend:
Circle = Closed autocracy; Cross = Electoral autocracy; Diamond = Electoral democracy; Wye = Liberal democracy

1. Box plots showing the relation between expenditure on education (% of GDP) and regime without time-slider

Click the link for interactive visualisation

Data controls: X-axis: Regime; Y-axis: Expenditure on education (% of GDP); Pop-up panel name: Country; Colour: Country; Shape: Regime; Filter control: Regime; Box-plot: Yes; Beeswarm plot: Yes; Clickable legend: Show

2. Box plots showing the relation between expenditure on education (% of GDP) and regime with time-slider

Click the link for interactive visualisation

Data controls: X-axis: Regime; Y-axis: Expenditure on education (% of GDP); Pop-up panel name: Country; Colour: Country; Shape: Regime; Timeslider: Year; Filter control: Regime; Box-plot: Yes; Beeswarm plot: Yes; Clickable legend: Hide

3. Expenditure on education (% of GDP) of different regimes in the world

Closed autocracy: Click the link for interactive visualisation

Electoral autocracy: Click the link for interactive visualisation

Electoral democracy: Click the link for interactive visualisation

Liberal democracy: Click the link for interactive visualisation

Data controls: X-axis: Year; Y-axis: Expenditure on education (% of GDP); Pop-up panel name: Country; Colour: Country; Shape: Regime; Filter control: Continent/Region; Box-plot: Yes; Beeswarm plot: Yes; Clickable legend: Show

4. Expenditure on Education (% of GDP) of different continents/regions of the world

Africa: Click the link for interactive visualisation

Asia: Click the link for interactive visualisation

Europe: Click the link for interactive visualisation

The Middle East: Click the link for interactive visualisation

North America: Click the link for interactive visualisation

Central America: Click the link for interactive visualisation

South America: Click the link for interactive visualisation

Oceania: Click the link for interactive visualisation

Data controls: X-axis: Year; Y-axis: Expenditure on education (% of GDP); Pop-up panel name: Country; Colour: Country; Shape: Regime; Size: Expenditure on education (% of GDP); Filter control: Continent/Region; Trend lines: Yes; Clickable legend: Show

5. India: Trends in state-wise expenditure on education (% of GSDP)

Click the link for interactive visualisation

Data controls: X-axis: Year; Y-axis: Expenditure on education (% of GSDP); Pop-up panel: Yes; Colour: State; Series filter control: Individual states and median; Clickable legend: No

6. India: Trends in state-wise expenditure on education (% of GSDP) along with literacy rates

Click the link for interactive visualisation

Data controls: X-axis: Year; Y-axis: Expenditure on education (% of GSDP) and literacy rates (%); Pop-up panel: Yes; Colour: State, individual state literacy rate; Series filter control: Individual states, median, individual state literacy rate; Clickable legend: No

Conclusion

Although causal relationships could not be established, the trends of education expenditure do suggest that certain regimes spend more on education as compared to others. This data visualisation exercise was very helpful in understanding the challenges associated with visualising large data sets. By doing this, I also learnt about different forms of visualisation options and their appropriateness. Sometimes, relatively obvious assumptions might not be substantiated by data and might need more external validation factors. In the case of this visualisation exercise, although it was assumed that liberal democracies spend more on education which was also found to be true, the margin of expenditure as compared to other regimes was very narrow. Therefore, along with data visualisations, studying the trends and causal factors become imperative to tell a compelling data story.

Useful links

Visualisation resources: Datawrapper, Flourish Studio, Raw Graphs

Data portals: Our World in Data ‘Global education’, Our World in Data ‘Democracy’, Research article by Mr Gunwant B Gadbade and Dr Chandrakant N Kokate

Data sets used for visualisations: Google Sheet

Coding on R: GitHub

Acknowledgements

I would sincerely like to thank Prof. Venkatesh Rajamanickam (IDC, IIT Bombay), Dr Arnab Jana (CUSE, IIT Bombay), Dr Ilya Fridman (MADA, Monash University) and Dr Xavier Ho (MADA, Monash University) in helping me with great deal of resources, giving me extremely valuable feedback and encouraging me over the course of this work.

I would also like to extend my thanks to Dr Shreekant Deodhar, who helped me in the earlier explorations of the data sets on R and formulating my thoughts for these visualisations.

--

--

Bishal Goswami

Data and map researcher writing on data stories, field observations, other thoughts and musings