2024 Du Bois Challenge using R Programming.

Simisani Ndaba
ILLUMINATION
Published in
7 min readMay 15, 2024

--

2024 Du Bois challenge photomontage of original plates and recreated charts developed by the author.

Reflecting on the #2024DuBoisChallenge

This year, I wanted to be involved in the 2024 Du Bois Challenge, which started on February 5 and went on to April 8. I did not expect to consistently contribute over the ten weeks. Fortunately, the challenge required only one posting per week, which gave me time to work on a plot per week. My repository of the challenge can be found here. I managed to complete 7/10 of the charts. I realized I was getting better as the weeks went by. I did not think I could accomplish the challenge with R alone. However, the tidyverse and ggplot packages were used in all plots.

2024 Du Bois Challenge poster.

The theme for the challenge was organized around the colors of the Pan-African flag:
Challenges 1–3: red,
Challenges 4–6: black,
Challenges 7–9: green,
Challenge 10: a combination.
According to an article by Anthony Stark, the idea for the Du Bois Challenge came from Allen Hillery and Sekou Tyler. In early February 2021, Allen and Sekou approached Anthony Stark with the idea of creating an online challenge where people would recreate selected Du Bois visualizations and share their work online. The goal of the challenge to celebrate the data visualisation legacy of W.E.B Du Bois — a Black American civil rights activist, sociologist and writer — by recreating the visualizations from the 1900 Paris Exposition using modern tools. He also helped form the NAACP.

All of the visualizations (the originals and my recreations) and their underlying data were already collected in Anthony Starks’s GitHub repository, so I made a special directory for the challenge.

This year the challenge offered a prize for consistent contributors who posted their creation in the Data Visualisation Society slack channel. The contributors who submitted consistently over the ten weeks won a one-year DVS Membership (valued at $99) or a Nightingale Magazine (valued at $40).

All re-created images in this article were created by the author.

Challenge 01 _week 01, February 5— Negro Population of Georgia by Counties, 1870, 1880 (plate 06)

For challenge 01, I began creating the two maps of Georgia using the 1870 and 1880 datasets. By using the maps package to get the Georgia counties, the county names in both the1880 and 1870 datasets had to merge with the Georgia region names from the maps package by changing them to lower case. The population values had to be standardized as well and colour coded according to population number.

The two legends were created using a 10 x 12 plot for text annotation. I then used the cowplot package to combine the plots together using the plot_grid() function. The background colour was set to “papayawhip”.

Challenge 01 original and recreation.

Challenge 02_week02, February 12 — Slave and Free Negroes (plate 12)

The following week of the challenge, I realised I needed to change the text font to “mono” which looked a lot more like the original plates. I used this font for the rest of the plots.

Using the Challenge 02 dataset, I used the geom_line() function to draw the line through the plot. To shade the black and red areas, I used the geom_ribbon() function to show the spread or range of data points. As seen in the code block below, within the geom_ribbon function, ymin represents the lower bound of the ribbon, and ymax represents the upper bound. Free and Inf are variables. The data was set to NULL because the data had already been set to the challenge 02 dataset in the ggplot. The fill colour is set to black and red and the alpha specifies the transparency of the fill color. Unfortunately, I did not show the white lines in the plot, and the outside text is shown in the plot.

#geom_ribbon set the black and red area in the plot
geom_ribbon(data = NULL, aes(ymin = Free, ymax = Inf), fill = "black", alpha = 0.5) +
geom_ribbon(data = NULL, aes(ymin = -Inf, ymax = Free), fill = "red", alpha = 0.5)
Challenge 02 original and recreation.

Challenge 05_week 05, March 4 — Race Amalgamation in Georgia (plate 13)

Challenge 05 had the shortest lines of code of all the weeks. The Challenge 05 dataset had three values, making the plot straightforward.

I set the position to “stack” in the geom_bar() function and filled the bar by category. The three text annotations were added to the right side of the bar in their respective categories.

Challenge 05 original and recreation.

Challenge 07_week 07, March 18 — Illiteracy of the American Negro compared with other nations (plate 47)

The dataset used for challenge 07 had population values for different nations, making the bar chart straightforward to create. However, the coord_flip() function finalised the horizontal bar chart.

Challenge 07 original and recreation.

Challenge 08_week 08, March 25— The Rise of Negroes from Slavery to Freedom in One Generation (plate 50)

In order to create the Challenge 08 chart, I started with the two stacked bar plots using the dataset that had to be created into two new data frames, 1890 and 1860. Unlike challenge 05, these stacked bar charts did not need to use the coord_flip() function. The years in text in bold on top of the bar charts were annotated. The text in between the plots was created using a 4 x 4 plot just to add text.

All three plots were combined using plot_grid() function. Adjusting the height of the 1890 plot higher than the 1860 plot completed the recreation. After finalising the plot, I did not notice the dashed lines between the 1860 and 1890 plots. Honestly, I could not have been bothered to add the dashed lines..LoL.

Challenge 08 original plate and recreation.

Challenge 09_week 09, April 1 — Proportion of Freemen and Slaves (plate 51)

My challenge 09 chart recreation used a geom_area() function to fill the status of two categories, Free and slave people. The Challenge 09 dataset needed to be converted from a wide to a long format. I found it easier to work with data in a long format for certain types of analyses and visualizations, especially when dealing with categorical variables or when you need to perform certain operations like plotting multiple variables in a single graph. I added the numerical annotations per point on the Free categories and Slave categories.

Challenge 09 original plates and recreation.

Challenge 10_week 10, April 8 — A Series Of Statistical Charts Illustrating The Conditions Of Descendants Of Formal African Slaves Now Resident In The United States (plate 37)

On the final week of the challenge, I did not think I would be able to create the final chart because of how complicated it looked. My steps for creating the plots are as follows.

Step1 — As I did with the previous plots, my challenge 10 used multiple plots to create each design. I used the maps package to colour code all the States and created four text annotations that were around it.

Step 2 — The pie chart dataset needed ordering to get the desired layout, but somehow, the ordering did not exactly match the original plate. I used two 10 x 12 plots to create both coloured circle legends on both sides of the pie chart. The bottom text annotation used a 5 x 12 plot to add text.

Step 3 — In combining the plots, I started with the title and the subtitle as one plot. I then combined the map plot with the surrounding text annotations, making another plot. I added the pie chart with the two legends and the bottom text annotation, making it my third plot. Putting all three plots together, I used the papaya whip color as the background color so that all the plots could be unified and added the caption at the bottom of the chart.

Challenge ten original plates and recreation.

Participating in the challenge helped me learn what was possible to create using R. Even though I could not recreate every chart due to work commitments, I’m glad I shared what I had done on the Slack channel and received great feedback from other contributors. I also learned that the cowplot package is a more powerful combining tool than patchwork, in my opinion, and needs a lot of width and height adjusting to fit plots together. Hopefully, next year, I can complete the challenge, and the recreations can look more like the original plates.

--

--

Simisani Ndaba
ILLUMINATION

Teaching Assistant in the Department of Computer Science at the University of Botswana. Interests are in Machine Learning and Data Science.