Laura Silver
May 14 · 6 min read
( Andronov;; Photo illustration by Pew Research Center)

Survey practitioners know that obtaining a representative sample of the general public requires designing fieldwork hours to ensure that everyone has a chance of being contacted. When doing a face-to-face survey, for example, it’s important that some interviews take place after people typically get home from work. Otherwise, people with a job outside their home will have little chance of being included in the survey, potentially introducing nonresponse bias.

A recent Pew Research Center survey in Jordan encountered exactly this problem. Cultural norms in Jordan generally dictate that interviewer teams consist of only women, and safety concerns related to conducting interviews at night limited the Center’s interview team to working until around 4:30 p.m. every day. This meant the resulting sample ran the risk of overrepresenting those who were home in the middle of the day: homemakers, the unemployed and retirees, for example. (This was the case even though interviewers followed the protocols we had established of visiting each household up to three times — a common practice in the Center’s international in-person surveys, designed to help achieve a randomly selected sample.)

We discovered the skewed sample as we closely monitored our fieldwork for data quality. But given that fieldwork progressed quite rapidly, the sample was disproportionately female, older and more educated than the Jordanian population as a whole by the time we were able to review the contact and partial data. For example, 65% of interviews completed at that time were with adult women, whereas only 46% of Jordanians ages 18 and older are women, according to the country’s 2015 statistical yearbook.

Addressing the problem

At this point, we considered two options. First, we could continue with the same fieldwork protocol and apply post-stratification weights to the data to correct some of the imbalance. This approach is one that Pew Research Center takes with many of its international surveys — whether or not we see imbalances of this degree — and involves adjusting the demographics in our sample to more accurately represent the distribution of key demographic characteristics within the population. Typically, these demographic variables include age, sex, education, region and urbanity. In a case like this, it would mean placing more weight on men and less on women to ensure that the overall sample looks like the adult population in Jordan, among other adjustments.

Post-stratification weights, however, cannot correct for an underlying, fundamental problem: Different sorts of people may be home in the daytime than at night. For example, we might expect that the people home during the day have different views of how the economy is doing in their country than those who are out of the home during those hours. So even if we applied weights to correct for the overall gender or age composition of the survey, we would simply be adding more or less weight to the people who were already included in the sample — and who may see the world in fundamentally different terms than those who were not at home.

The second option was to re-field some or all of the interviews. Re-fielding cases during evening hours would presumably help balance our overall sample and provide evidence of whether our original sample suffered from nonresponse bias. But although it might help correct for potential biases, re-fielding does have potential drawbacks, including additional costs, more time spent in the field and the possibility of an external event (like a global summit, an election, etc.) that could affect survey responses, making the data from before and after the re-field less comparable.

Given our concerns about potential bias, we re-fielded all neighborhoods (technically speaking, our primary sampling unit was a census block) where fieldwork concluded before 5 p.m. and in which at least 80% of respondents were female. In total, 209 neighborhoods — a total of 647 interviews — were replaced with new interviews that took place over the course of the full day, up until 9 p.m. With the exception of lengthened field hours, all protocols and interviewers were the same for these replaced interviews, though female interviewers were now accompanied door-to-door by their sons or husbands to help address the safety concerns.

Did re-fielding matter?

Overall, expanding fieldwork hours helped improve our sample performance. For example, whereas the original respondents were 84% female, the re-fielded cases in those same neighborhoods were 51% female. The re-fielded sample was also more in line with the official Jordanian population statistics in terms of age and education. (We do not have demographic information at the neighborhood level in Jordan, only at the national level.)

Redoing the fieldwork in these census blocks had a positive impact on the overall composition of the survey, too. Comparing the unweighted sample before and after the re-field indicates that replacing interviews in these census blocks helped to correct for some of the overall gender, age and education skews. For example, the original sample was 62% female, but the re-fielded sample was 54% female — much closer to the actual 46% share of Jordanians who are women.

While re-fielding these cases improved our overall gender balance, however, the question remained: Would we have come to the same substantive conclusions if we had avoided the effort of replacing these interviews and simply applied post-stratification weights? Since weighting essentially forced the sample to mirror national demographic distributions, we turned to the impact of weighting on our attitudinal and behavioral questions in both the original and re-fielded samples.

Results indicate that there is little difference between the original sample and the re-fielded sample once post-stratification weights have been applied. Across a variety of measures — from views of the economy to the likelihood of owning a mobile phone to technology use — the point estimates from the original and re-fielded samples were within a couple of percentage points of one another after weighting, resulting in minor differences that are not statistically different from each other.

Weighting the original sample to adjust for the overrepresentation of women, however, comes with its own cost. Because the skews were larger before re-fielding, the weights had to “do more work” to better represent the Jordanian adult population. In survey parlance, this means that the original sample has a larger design effect (1.57) than the re-fielded sample (1.46). The practical implication is that it is more difficult to get precise estimates — and the margins of error around any estimate are larger — when there is a larger design effect. While the difference in precision was not problematic for our reporting, for other purposes it could matter.

Additionally, we might expect that if this were a study where gender is highly related to key outcomes — for example, a study about gender equity, women’s rights or sexual harassment –these substantive differences may have been more consequential. As it stands, Jordanian men and women have relatively similar levels of technology use and differ little in their evaluations of the impact of technology on their society, one of the main research questions of this study.

The lesson from our experience in Jordan is that re-fielding can help improve the representativeness of a sample. Whether an improved sample also increases the accuracy of behavioral or attitudinal measures depends on the degree to which these measures are correlated with key population attributes such as age, gender and education.

This post was written by Laura Silver, Martha McRoy, Kat Devlin and Patrick Moynihan.

Laura Silver is a senior researcher focusing on global research at Pew Research Center.

Martha McRoy is a research methodologist focusing on international survey research methods at Pew Research Center.

Kat Devlin is a research associate focusing on global attitudes at Pew Research Center.

Patrick Moynihan is associate director of international research methods at Pew Research Center.

Pew Research Center: Decoded

The "how" behind the numbers, facts and trends shaping your world.

Laura Silver

Written by

Senior researcher @pewresearch focused on China, political communication. Lover of games and pygmy hippos.

Pew Research Center: Decoded

The "how" behind the numbers, facts and trends shaping your world.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade