Survey sampling methods: how to choose and analyse a representative subset of the population

people.io
people.io
Published in
4 min readJul 23, 2018

The idea of gathering data on a whole population has a long history beyond recent big data trends. In the 11th century, William the Conqueror sent surveyors across England to draw up an inventory of the property, land and animals belonging to people living in towns and villages — the well-known and well-documented Domesday Book. More recently, the development of survey sampling theory and relevant probabilistic methods have taken 20th century ideas and applied them to this age-old practice.

Survey sampling circa 1086

At people.io, we use these techniques to allow us to analyse the information provided by our users’ data to draw insight for brands and advertisers, and also to provide more accurate matches to our users. This is a critical part of our platform and as such, benefits from the following methodical approach.

Aims of sampling

In an ideal world, a survey would collect data on every individual in a population. However, in most cases this is far from feasible, meaning we need to use a subset of the population to draw a conclusion.

At its core, sampling is a fairly clear concept: it is the process of inferring patterns in a population from an often minuscule fraction of it. The main advantage of sampling is that, with the help of some statistics and a sufficient sample size, we can gain insight with high confidence levels about what the response would be from a population-wide study. There are many different sampling methods we can choose from, and which one we choose will depend on the context, our research purpose and a variety of other factors that will be discussed below:

Sampling techniques

Sampling methods can be categorised into two types: probability sampling and non-probability sampling. In probability sampling, an individual has a fixed, non-zero probability of being selected in a sample. By contrast, non-probability sampling techniques provide an unequal chance of an individual being included in the sample. Here, we’ll focus on probability sampling and outline the main types of techniques involved: random, systematic and stratified.

Random sampling

This is potentially the most straightforward sampling approach. Random sampling is used in cases of very large populations where every member of the population is chosen independently from the rest. As it’s pretty infeasible to know the exact structure of a large population, this technique is relatively simple to conduct. However, the sample can easily become biased. For example, suppose a basket has 100 apples and 50 oranges and we want to get a representative sample of its contents when these proportions are unknown. By randomly sampling 5 fruits we might get 2 apples and 3 oranges. However, this would be a misleading result that doesn’t accurately represent the actual ratio of apples to oranges.

What’s the ratio of apples to oranges in this sample?

Systematic sampling

If we decide to use systematic sampling, then we should see our population as logically homogenous — that is, all individuals in the population share an attribute that is of interest to the survey. For example, if the population of interest are customers who visit a coffee shop on weekday mornings, then the characteristics defining this homogeneity would be a. customers of coffee shops and b. customers visiting on weekday mornings.

This technique involves ordering all individuals in a sequence and selecting individuals from regular intervals i.e. every 5th element in the set. Note that, due to the periodic nature of this sampling type, the sample will also be biased.

Stratified sampling

As the name suggests, stratified sampling involves splitting the population into layers according to one or several characteristics. The intention here is to attempt to represent the distribution of the whole population on a smaller scale. This method tries to overcome the disadvantages of the previous two types of sampling, although it does require previous knowledge of the proportions of different subgroups in the population.

As an example: If we know from census data that the population for a given country of men vs. women is split 50:50, but we happen to have collected data for 200 men and 100 women, then we can use stratified sampling to adjust for this difference and maintain the ratio.

How we use sampling

Utilising sampling is essential in many aspects of what we do, as it helps to ensure we have a more accurate representation of users’ preferences and differences. Furthermore, it helps us to customise the user journey within the app.

For example, with a largely millennial audience we naturally get varied proportions of demographics such as income and region, and we need to account for these via sampling approaches when showing summaries of demographics on our Insights platform.

As part of Insights platform, we are also able to show two variants of an advert to two non-overlapping sets of our users to test its effectiveness. If the brand wants to compare the performance across, say Gen-Xers and Gen-Yers, then we need to do more than split them into two groups as they are. In this instance, we use sampling to randomly select half of each group, and perform statistical analysis to see the differences.

If you’re interested in utilising the power of opt-in, first-party data to gain valuable insights about your company, then get in touch with us on hello@people.io

And if this blog post has sparked your interest in working for a data-driven company that puts people first, then check out our current openings

--

--

people.io
people.io

Get rewarded every time you tell us about the things you like, or engage with an update from a brand. It’s simple, we reward you for your time and information.