Interviewing the data source: Pew Research Center Local News Survey

Illustration by Anna Golde from Icons8

When working with data as journalists, we don’t always have a hypothesis about the data in mind. In those cases when we are searching for a story in the data, it can be useful to ‘interview’ the data source. As described by NPR, interviewing the data source is a way to explore the data for the first time. We may ask a serious of preliminary questions, describe the context of the dataset, and use exploratory statistics to find data insights.

To have the class practice diving into a dataset, I found some open data and had everyone vote on what they would like to work with. The class picked the Pew Research Center Local News Survey, a nationwide survey on local news reliability, access, and community involvement. This data can be freely downloaded in SPSS file format (.sav) by anyone with a Pew Research Center account. For the purposes of this class exercise, I converted the data into a .csv file.

Some important links:

The result was the following guide, which the VIC-4001 class put together collaboratively over the course of three lectures.

Questions

The first step to take when exploring an unfamiliar dataset is to ask a series of questions about the data collection methodology. Students got into breakout groups on Zoom and brainstormed together. The following questions were written by the students as a starting point for anyone who wishes to start working with the Pew Research Survey data.

Breakout Group 1:

  • Who surveyed the people and what kind of survey did they make?
  • Why was this data collected? How did the Google News Initiative factor into the results? Was there a financial incentive involved?
  • Because statistical models are necessary for the analysis of smaller geographical CBSAs, how does this precise estimate differ from obtaining national estimates? In other words, how do the researchers on a national scale gather the data in comparison to smaller population estimates?
  • Could the change in methodology/distribution channel for the ATP have influenced results or bias?

Breakout Group 2:

  • What made the classifications for the subsections?
  • Who were the three researchers?
  • Is logistics a valid tool when conducting research and gathering data?

Breakout Group 3:

  • How did they get people to participate in the survey? Out of 62,757 randomly sampled candidates, only 34,897 people participated. How can they be sure that the data accurately represents an area if they did not get the same number of people to respond to the survey in each city?
  • How did they organize the demographics into the respective lower, medium and higher groups?

Context Phrases

Next the students put together phrases that disclose more context about the limitations of the dataset. These were phrases that the students imagined could be included in the body of a piece about this data, to let the reader know that the data is not infallible.

Group 3 came up with a great example of a context phrase in the previous lecture: “Out of 62,757 randomly sampled candidates, only 34,897 people participated.”

Breakout Group 1:

  • The rate of local news engagement or consumers’ willingness to engage with paid news outlets may be substantially impacted by 6% of the population (1,317 counties not belonging to any CBSA, as per Pew Methodology) potentially being excluded.
  • Local news engagement was only measured in urban areas, thereby excluding rural zones.

Breakout Group 2:

  • If the news source was incomplete or misspelled, the intended response may not have been entered accurately.

Breakout Group 3:

  • The CBSAs cover approximately 66% of the total population of the United States.

Data Insights

The final step was to look for data insights. For the purposes of this guide, data insights are slices of information derived from exploratory statistics and spreadsheet methods like filtering, sorting, and counting with pivot tables.

Breakout Room 1:

  • More than half (50%) of participants noted that local journalists should be aware of local history.
  • Contrary to our expectations, 56.52% of participants indicated strong involvement in the community when compared to a 39.13% interest in heavy involvement on social media.

Breakout Room 2:

  • From these results, we know that the respondents thought the importance of social media (37.52%) and the importance of internet (37.05%) as their outlet for gathering news was significant, but not the most important. Indicating that the respondents rely on another form of news gathering.

Breakout Room 3:

  • More people (over 19%) selected that they never get their news from local government officials/agencies as opposed to the people who selected that they always do, which is a bit over 13%.

Conclusion

In just a few days, students were able to put together a complete guide to this new dataset, despite not all students having had data analysis experience before. My hope is that this will be helpful to other journalists looking to work with this data, as well as other students and educators who are learning about data!

Student authors:

Aimee Reyes, Giuliana La Mendola, Amanda McKenna, Monica Correa, Brea Jones, Cristina Gonzalez, Elio Diaz, Liana Alexis Martell, Alejandro Andarcio, Gabriella Genao, Ana Soler, Christopher Gomez, Nicole Forero, Andres Arias, Dailen Vera, Jordan Coll, Indira Morales

--

--