Combining Australian Census data with the Same Sex Marriage Postal Survey in R

Last week I put out a post that showed you how to tidy the Same Sex Marriage Postal Survey Data in R. In this post we’ll visualise that data in combination with the 2016 Australian Census. Note to people just here for the R — the main challenge here is actually just navigating the ABS’s Census DataPack, but I’ve tried to include a few pearls of wisdom on joining datasets to keep things interesting for you.

Load Same Sex Marriage Survey data

We’ll start with the survey response dataset from last week. Grab it from my repo if you need to. Note the little fix in the mutate for some untidiness that slipped through. Both of those refer to Tasmania.

Get Census Data

The ABS has a few different census offerings. I looked at them all. You can trust me when I say to ignore all the reports and go straight for the ‘DataPacks’ at https://datapacks.censusdata.abs.gov.au/datapacks/.

Navigating the DataPack

The datapack consists of 59 encoded csv files and 3 metadata excel files that will help us decode their meaning. What? You didn’t think this was going to be straight forward did you?

Examining Religious Affiliation

Suppose we are interested in reproducing this Guardian article's analysis of the correlation between percentage of religious people and percentage of ‘No’ responses. The first step is to identify the DataPack table that contains religious affiliation data. If we look at sheet two of `Metadata_2016_GCP_DataPack.xlsx` we can see a likely choice is table ‘G14’:

# A tibble: 168 x 103
CED_CODE_2016 Buddhism_Males Buddhism_Females Buddhism_Persons
<chr> <int> <int> <int>
1 CED101 2766 3921 6687
2 CED102 3764 5078 8849
3 CED103 3136 4200 7337
4 CED104 1718 2240 3953
5 CED105 6508 7815 14323
6 CED106 2043 3012 5061
7 CED107 399 573 971
8 CED108 1092 1222 2312
9 CED109 789 1224 2017
10 CED110 611 878 1491
# … with 158 more rows, and 99 more variables:
# Christianity_Anglican_Males <int>,
# Christianity_Anglican_Females <int>,
# Christianity_Anglican_Persons <int>,
# Christianity_Assyrian_Apostolic_Males <int>,
# Christianity_Assyrian_Apostolic_Females <int>,

# Christianity_Total_Persons <int>, Hinduism_Males <int>,
# Hinduism_Females <int>, Hinduism_Persons <int>, Islam_Males <int>,
# Islam_Females <int>, Islam_Persons <int>, Judaism_Males <int>,
names(religious_affiliation_by_person)[-1] <- column_codes$Long[match(names(religious_affiliation_by_person), column_codes$Short)[-1]] 

An Aside on Joining Data

I’d like to spend a minute discussing joins, feel free to skip to the plots if this doesn’t sound exciting.

left_join(response_data, census_data)

A Plot

Getting back to reproducing the investigation into religious affiliation and No responses: Using the ggplot2 code below, we observe the expected positive correlation between religious affiliation and ‘No’ response on same sex marriage:

My state of QLD cops a lot of flack for our rural conservatives, but it’s NSW that is looking red neck here.

Won’t Somebody Please Think of the Children

I wasn’t content to simply reproduce someone else’s analysis, so I thought I’d make some original contribution to the discourse by looking for a trend in responses from parents.

With this topic, I’m playing my Dad card for the first time. Am I doing it right?

Discussion

Once again we made short work of a fairly hairy data problem with the R tidyverse. For those new to joining datasets with dplyr, I appreciate it can be a little dizzying at first. Running the code to view the results of each successive step will help. Doing many joins will help you get a feel for the different types of outcomes. Why not try to mix some different Census data with the participation data we tidied last week? Tweet your plots at me if you do!

Now posting on: milesmcbain.xyz

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store