Using Open Data to Understand Communities Around Seattle Public Libraries
This summer, as a part of my Open Data Literacy internship, I worked with the Seattle Public Library (SPL) to transform external open data into actionable intelligence for frontline staff. SPL is a large public library system serving the city of Seattle. It has 27 different branch locations divided into 6 different regions and offers a variety of services and outreach programs. Though library staff already learn about their community through methods such as talking to patrons and local organizations, open data could be used to supplement these efforts and assist staff in their outreach and service planning processes. With my mentor at SPL, David Christensen, I created a plan to assess the needs of SPL staff, search for relevant open datasets, and create deliverables for use by staff.
Assessing Data Needs of SPL Staff
Before I could start transforming open data, I needed to find out what types of information would be most useful for SPL staff. Initially I started my project with the idea that I would try to assess the needs of all staff at SPL. However, since this project had a limited timeframe, we decided to limit the assessment to only two regions of SPL. To get a general idea of the data needs of staff at multiple locations, I interviewed Francesca Wainwright, the regional manager for the northeast region, and Wei Cai, the regional manager for the southeast region. Each interview lasted about 20 minutes and was conducted over the phone. For the interviews, I prepared a script with background information on the project, questions about the goals and needs of staff at each branch, and suggestions of possible datasets in case the regional managers did not have any ideas.
The regional managers were able to provide insight on the current programs and future projects at their region’s branch libraries. Overall, both regional managers had a really good idea about what kind of information would be useful for their staff and did not need the prepared dataset ideas. Both regions expressed the desire for demographic and socio-economic information about the communities around each branch. Specifically, they wanted to know about age, income, languages spoken, education levels, and employment rates. With this information, staff could identify areas of greater need among the community and better plan outreach services. One region was particularly interested in how many students were attending private schools versus public schools, with the thinking that an area with more public schools might need more library services because public schools tend to have less funding. One commonality among both interviews was that the managers really wanted more granular data than what they usually received. Both mentioned that they would receive information about the region or city as a whole and not about the immediate community around the branch, making it hard to pinpoint specific areas of focus.
From the interviews with the regional managers, I was able to create a large list of potential factors about the community that I could use to look for open datasets. To keep this project’s scale manageable, I decided to limit the list of factors to four main topics: age, household income, language spoken at home, and school enrollment type. These topics covered basic traits of the community and could be used by any branch in their planning.
Searching for Suitable Open Data
With the four key topics I had collected from the interviews, I started to look for open datasets that would best serve SPL staff needs. Since there was a lot of focus on communities around the libraries, I began with looking for local open datasets about Seattle in the City of Seattle Open Data Portal and the Washington State Open Data Portal. Both of these portals provide many interesting open datasets published by various city departments and state departments. For this project, I found a few datasets about low-income housing and public-school locations, but not many that would provide basic information about the communities in Seattle that the regional managers were looking for.
For Seattle community demographics, I decided to look at the United States Census Bureau. Since the Census Bureau’s mission is “to serve as the nation’s leading provider of quality data about its people and economy,” the Census Bureau is the perfect place to find data about age, income, and many other qualities about the American people. When people think of the “Census,” they usually think of the Decennial Census that is conducted every 10 years to record population and housing counts, but there are many other censuses and surveys that they conduct periodically. The Census Bureau is committed to open government, so they release pubic data from their censuses and surveys, and it is openly available to everyone. To start searching the large amount of datasets from the Census Bureau, I used their American FactFinder tool’s advanced search feature to search for the topics I was interested in for this project. I was pleasantly surprised to find that there were datasets that aligned with all four of my topics from the American Community Survey (ACS) and that there were relatively recent estimates from 2017. The American FactFinder allows users to download each dataset as a spreadsheet or presentable document, both with associated metadata. Census data can also be downloaded via the Census API, which can be a better option if downloading the data for use in a programming language like R or Python. For this project, I chose to use the API because I wanted to use R.
Creating an interactive Map Dashboard Using R
Many different types of visualizations could be created from the estimates provided by the 2017 ACS. Based on what I had learned in the interviews and conversations with my mentor, I decided that maps would be the best visual representations for the census data. Maps would allow staff to pinpoint areas of interest and explore the local neighborhoods around the libraries. I chose R instead of Python to create visualizations for this project because it has well developed visualization packages and was the preferred programming language of my mentor at SPL. I used the package tidy_cenus to read in data from the Census API. This package allowed me to easily access the data I wanted and gave me the polygon coordinates needed to map the census tracts’ shapes. I used census tracts because they were the smallest geographic units I could download with the tidy_census package, which helped create a more granular look at the community like the regional managers wanted.
Since there was so much information that I wanted to include in the maps, I looked for an R package that would allow me to create interactive maps that could limit the amount of information on the screen at any one given time. I found the Leaflet package, which creates colored, interactive maps with popups and layers. Leaflet was able to read the data I downloaded from the API and create maps with colors based on selected values. I was also able to add a layer with location markers for each SPL branch, add different base color layers, add zoom buttons for each region, add legends, and add popups for each census tract and branch marker. In total, I created four different maps, age, median household income, language spoken at home, and school enrollment type (private versus public school). For more information about how I created these maps, feel free to look at my documented R code that I uploaded into this project’s GitHub repository at: https://github.com/OpenDataLiteracy/SPL-KO.
Though the maps looked great with in the R Markdown Notebook that I had created, their format was not very shareable. To share the maps easily with others, I used a R package called Shiny to create a map dashboard. Shiny takes the R code I created and converts it into a browser friendly application that functions exactly as it did in my R notebook, while avoiding the clutter of code. To make the dashboard truly shareable, I uploaded my Shiny app onto shinyapps.io. Shinyapps.io is hosted by RStudio and gives your Shiny app a stable URL that can be easily shared with others. All the maps remain interactive and only require the URL for access by anybody. My map dashboard is available at: https://kostler.shinyapps.io/SPL-Seattle-Census-Data/. With shinyapps.io, I am able to update my code on my computer and reupload the changes to the URL location.
Comparison with Internal Dataset
While the maps provide some insight on the communities around SPL branches, I wanted to find a way to incorporate some of the internal data that is collected by SPL. To compare the census data to the actual population that is using the libraries, I used a dataset with library account use counts divided by each patron age divided by each branch location. For the most accurate comparison, I limited the data to only the counts from 2017 and then grouped the ages into the same ranges I used in the census data analysis. Using R, it was relatively easy to group the data together by each branch and find a median age and the same percentages of population statistics I found for the census data.
Overall, the statistics found with the internal dataset were pretty close to the census statistics for most of the branches. There were a few differences between the two datasets, like one branch location’s internal dataset showed that the population that was using their library cards was on average younger than the median general population that lived around that branch. The user base for this library could be younger because they have a lot of young adult or children’s programs or maybe they have a lot of schools around their branch, so students visit after class. I did not delve too deep into the possible reasoning between the differences between the two datasets, but I provided reports with both the census data and internal datasets for SPL staff in the Northeast and Southeast regions so that SPL staff could investigate possible factors. I also gave the R code for the internal dataset to my SPL mentor for future use and published my code for the census data in this project’s GitHub repository. Hopefully library staff will be able to look at these materials and use the differences in the data to inform them of possible factors in their planning process.
Next Steps
Now that this part of the project is finished, the next step for SPL will be distributing the map dashboard and reports for use by staff at each branch. Though this project focused on the needs for the northeast and southeast regions, the maps provide statistics for all the branches and can be used by anyone interested in Seattle. The map dashboard and project repository both have public URLs that can be shared with anyone and could be used in annual reports about demographics served by SPL, in policy advocacy, or even with the SPL Foundation. In addition to sharing the deliverables of this project, feedback from staff about the maps and reports should be collected to inform future improvements of the maps and other data projects.
As for next steps for ODL, I plan on continuing parts of this project as an independent study this next quarter as a part of my Master of Library and Information Science program. I would like to publish a journal article about the use of census/open data in public libraries, so that other researchers and library workers can build on the work I have done this summer. I also plan on producing a Binder instance or something similar to create an interactive code environment that does not require downloading software to use and play around with the code. This project was a small step in using open data in public libraries and I hope that other libraries will be able to use what I have done in their own workflows. All software I used in this project was open source and anyone can download R, RStudio, and the census data I used or modify the code I created. Publishing open data is a great practice for libraries but using external open datasets can also help create the best services possible for the community.
Seattle Census Data Map Dashboard: https://kostler.shinyapps.io/SPL-Seattle-Census-Data/
Project GitHub Repository: https://github.com/OpenDataLiteracy/SPL-KO