Tidying the Australian Same Sex Marriage Postal Survey Data with R

It could be worse. At least there’s no date-time bullshittery to sob quietly into a pillow over.

The Participation Data

Divide and conquer
Tidying A) into a column of areas
Tidying B) into long format and including A)
>age_counts
# A tibble: 7,632 x 5
measure area gender age count
<chr> <chr> <chr> <chr> <dbl>
1 Total participants Banks Male 18-19 years 1102.0
2 Eligible participants Banks Male 18-19 years 1431.0
3 Participation rate (%) Banks Male 18-19 years 77.0
4 Total participants Barton Male 18-19 years 977.0
5 Eligible participants Barton Male 18-19 years 1278.0
6 Participation rate (%) Barton Male 18-19 years 76.4
7 Total participants Bennelong Male 18-19 years 1177.0
8 Eligible participants Bennelong Male 18-19 years 1488.0
9 Participation rate (%) Bennelong Male 18-19 years 79.1
10 Total participants Berowra Male 18-19 years 1523.0
# ... with 7,622 more rows
‘extract_participation_counts’ is a function that parameterises the code I showed you for the male table.
Appending physical Area and State from an online html table, and spreading the participation measures.
>ssm_participation_state %>%
+ spread(measure, count)
# A tibble: 4,800 x 8
area gender age State `Area (sq km)`
* <chr> <chr> <chr> <chr> <chr>
1 Adelaide Female 18-19 years SA 76
2 Adelaide Female 20-24 years SA 76
3 Adelaide Female 25-29 years SA 76
4 Adelaide Female 30-34 years SA 76
5 Adelaide Female 35-39 years SA 76
6 Adelaide Female 40-44 years SA 76
7 Adelaide Female 45-49 years SA 76
8 Adelaide Female 50-54 years SA 76
9 Adelaide Female 55-59 years SA 76
10 Adelaide Female 60-64 years SA 76
# ... with 4,790 more rows, and 3 more variables: `Eligible
# participants` <dbl>, `Participation rate (%)` <dbl>, `Total
# participants` <dbl>

The Response Data

I used the same strategies as with tidying the participation data
> response_data
# A tibble: 150 x 17
area Yes `Yes pct` No `No pct` `Response Total`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Banks 37736 44.9 46343 55.1 84079
2 Barton 37153 43.6 47984 56.4 85137
3 Bennelong 42943 49.8 43215 50.2 86158
4 Berowra 48471 54.6 40369 45.4 88840
5 Blaxland 20406 26.1 57926 73.9 78332
6 Bradfield 53681 60.6 34927 39.4 88608
7 Calare 54091 60.2 35779 39.8 89870
8 Chifley 32871 41.3 46702 58.7 79573
9 Cook 47505 55.0 38804 45.0 86309
10 Cowper 57493 60.0 38317 40.0 95810
# ... with 140 more rows, and 11 more variables: `Response Total
# pct` <dbl>, `Response clear` <dbl>, `Response clear pct` <dbl>,
# `Response not clear(b)` <dbl>, `Response not clear(b) pct` <dbl>,
# `Non-responding` <dbl>, `Non-responding pct` <dbl>, `Eligible
# Total` <dbl>, `Eligible Total pct` <dbl>, State <chr>, `Area (sq
# km)` <chr>

Discussion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store