Some notes on my collaboration with The Washington Post on the gender pay gap

Xaquín González Veira
Xocas
Published in
6 min readOct 31, 2017
Yay! I’m on the WaPo homepage.

Candidate A is a terrific developer: has a Master’s degree from a well-regarded American University, R&D experience, two years of experience in one of the best newsrooms in the country, some of candidate A’s projects are quite recognizable.

Candidate B is a terrific developer: has a Master’s degree from a well-regarded European University, R&D experience, no experience working in newsrooms but works for one of the largest social networks so we know candidate B is quite good.

They both get an offer from a headhunter, for the exact same job opening, for a big London-based news organization. Candidate A’s offer is 15K shorter than Candidate B’s.

The headhunter estimated 👩 to be worth 20% less than 👨.

What the headhunter didn’t know is that they’re a couple. And they found out about each other’s offers. And confronted the headhunter. Shit! Sorry, it was a terrible mistake, I may have sent the wrong email to her. (Ha! Nice save! Not really.)

Three things:

  • Yes, this is a true story —in case you were wondering.
  • If this is shocking to you, you are either very naïve, or are willfully ignoring a dreadful reality: in 2017, women still get paid a lot less than men.
  • If this doesn’t infuriate you, or it does but for the wrong moral reasons, don’t worry, this is a making-of post, if you want to get mad here’s the actual story.

I had been researching gender pay gap for a few months and I thought it’d be interesting to frame the story as a structured rebuttal of the main claims made by pay gap deniers; so when Kat Downs and Chiqui Esteban told me they wanted to collaborate, I had the pitch ready :)

Let me first get out of the way the recent rekindling of my relationship with R. I learned how to use it at the Times, where I also learned how to seriously think about data journalism from many colleagues there.

I’m not used to datasets this large, and I didn’t want to go the SQL route, so after reading a few great tutorials on R for large datasets, I found that combining data.table and dplyr worked best for me. (I’d love to know on the comments what’s your library or combo of choice to do this, I heard some people use survey.)

I wasn’t all that familiar with IPUMS, so I couldn’t have done it without Dan Keating, a data reporter at WaPo, who walked me through some of the nuances of the data; and Dr. Asaf Levanon, one of the economists whose study I quote in the article, who was brilliantly thorough —and patient— at explaining the methodology of their study and who helped me round mine.

Ok, ok … Onto the visualization stuff —which is what you came here for. There are many ways of showing the gender gap.

I’m quite fond of Hannah’s scatterplot from 2009

… and last year’s WSJ interactive is great as well.

And even though what I chose is also a scatterplot, I was trying to make a slightly different point.

Two things weighed in on the choosing of the visualization. I wanted to reveal:

  • How the gap changed depending on the share of women on the job —so we arranged them horizontally from smallest to biggest percentage of women …
  • … and where the jobs with the biggest number of workers landed —so we sized the circles accordingly.

Because the outliers —physicians, lawyers, dentists and pharmacists— made the trend less apparent, we zoomed in after the first step, so more of the visualized data could occupy most of the chart space. As I think about it again, we probably should have included a trend line —but I always worry it may introduce an additional layer of complexity for the reader.

The scrolly allowed us to pace some of the annotations, and make a somewhat complicated view easier to digest. And with Kat and Chiqui’s many invaluable edits it turned out quite nicely.

The most revealing bit of the scrolly was Kat’s idea:

But perhaps the most interesting bit of all is the first few lines of the story. I’ve already explained my obsession with crafting those initial moments.

Framing the pay gap in terms of time was modeled after the inspiring massive demonstrations held in Iceland since 1975, and the concept has been used by feminist groups in other parts of Europe. I agree with the organizers of the strikes: translating the gap into time-working-for-free is a very raw way of denouncing inequality.

The earliest drafts of the big calendar had a visual hint to the days in a calendar. But I thought the small calendars were very cute, and wanted the connection between the big calendar and the small multiples to be immediately visible.

It did cross my mind to use a calendar view layout, but since I was going to show just one year it would have looked just like an awkward tetris-like bar chart. The 12 squares resembled more the physical object of the calendar.

For me, one of the most important and nuanced points of the story was explaining devaluation, how as women joined certain occupations, the perceived worth of those occupations dropped. It’s actually the reason why I decided to use IPUMS, so I could have a consistent criteria for current and past data.

One of their earlier incarnations were these ‘noodle’ charts. They show the percentage change in salary between every census and 5-year ACS with 1960 as the base.

Too complicated:

Sketch of the noodles without annotations

Sometimes less is more:

Sketch of the slopes without annotations

The day after the story was published, Carlo Zapponi messaged me ‘complaining’: “I thought you didn’t like slopegraphs. I’m still thinking of all the slopes I could have done at the Guardian!! 😃”. “I call them line charts with only two dates” I said. Tomayto. Tomahto.

The ‘Since you opened this page …’ ending was a last minute addition. The Guardian is/was quite fond of this device —and its apparently shocking effect— but I’m always reluctant to use because it rarely gives you a tangible comparison.

You may need to load the Guardian article a few times to get this, since it gives you a different version of the widget at random.

In this case I believe it does, and it closes the circle of the story by visually referencing the header, and conceptually referring to the pay gap again in terms of time.

Quite happy that it was well-liked.

For those wondering about the little people … I grew up with my dad’s architectural models —he worked at an architecture firm before he became a graphics editor, and I’ve been for years obsessed with these models from Japanese architect Naoki Terada. Finally, I had the chance to pay homage to them.

Oh, and I have a feminist bot @77centimos that tweets data on the pay gap in Spain at random. In case you wanted to get infuriated in Spanish.

--

--