How to improve SMS survey response quality and completion rates in emerging markets

By Aimee Leidich, Samuel Kamande, Khadija Hassanali, Nicholas Owsley, David Clarance, Chaning Jang

Summary

Mobile penetration has reached 95% in Kenya, making mobile a powerful medium for accessing individuals who were previously hard to reach. SMS is becoming a popular method for data collection in the region, but there is little research investigating which factors influence the effectiveness of SMS as a data collection tool in this market. This paper explores how different SMS survey designs impact completion rates and response quality among Kenyans.

We sent a one-time mobile SMS survey to 3,489 Kenyans familiar with SMS surveys and to 6,279 not familiar. Each sample was randomized into one of 54 cross-cutting treatment combinations with variation across several dimensions: incentive amounts, pre-survey communication, survey lengths, and content variation.

We generated five recommendations for practitioners designing SMS surveys in emerging markets:

1) Recruit a sample that is aware they will be receiving SMS surveys

2) Keep mobile surveys at most 5 questions or provide a higher incentive for longer surveys

3) Randomize questions and response options in order to improve response quality

4) Use open-ended questions where qualitative data supporting the question is lacking

5) Know that males and those under 30 years old are the most likely to complete an SMS survey

Read on for more details about our methods and discussion.


  1. Introduction

Mobile phones are now a staple in most contexts, rendering them an invaluable tool to communicate with and access data from individuals around the world. In Kenya, mobile phone penetration has reached 95.1% with subscribers sending over 14.9 billion short message service (SMS) in the first quarter of 2018 alone [1]. A wide range of organizations including corporations, government entities, hospitals, and non-profits are employing SMS to better understand and reach traditionally inaccessible individuals. In health research for example, SMS has been used as a powerful intervention tool to improve adherence to life-saving medication [2], enhance the efficacy of medical administrators [3], and ensure adequate stock supplies of crucial medical equipment [4].

This past research proves the efficacy of SMS as an access point and data collection tool [6], but fewer studies have explored the specific steps researchers and practitioners should take when designing an SMS mobile survey. For example, research shows that offering phone airtime incentives for survey completion has a positive effect on response rates [4], but does not offer further information about how these rates may differ across incentive amounts and other factors.

To optimize SMS as a data collection tool, we need to better understand the effects of different survey designs and how to optimize those designs to collect the highest quality data at the lowest cost. In this paper, we explore how incentive amount, pre-survey communications, and length of survey impact completion rates, and how content variation — question order, categorical response order, and open-test responses — impact response quality. Findings from this study will be used to inform best practices for researchers and practitioners looking to use SMS as a direct data collection method in the future.

2. Methods

2.1 Study population & sample

This study was a collaboration between mSurvey, a customer feedback and mobile data collection company, and Busara Center, a behavioural science research centre, both based in Nairobi. The sample was derived from two sources: the mSurvey audience, a convenience sample of over 37,000 individuals recruited continuously across Kenya since 2012, and the Busara study sample, a targeted sample of nearly 15,000 individuals recruited through in-person contact and baseline surveying.

The mSurvey audience network is made up of Kenyan residents 18 years and older who a) own their own mobile phone that can send and receive SMS; b) are literate in English and c) consented to receive both research and market surveys. They are well engaged, exhibiting an average completion rate of 65% across previous surveys. mSurvey audience members are considered to be the “known” population in this study, having received at least one SMS survey from mSurvey in the past 5 years.

The Busara sample is made up of low-income individuals in Nairobi who have a cell phone and consented to receive communications from Busara and participate in future studies. This sample was considered to be the “unknown” population since they did not explicitly opt-in to receive SMS surveys to their mobile device from mSurvey.

The final study sample was made up of a random selection of 6,279 Busara participants and 3,489 mSurvey participants for a total sample size of 9,768. We found that 58 of the Busara participants had coincidentally opted-in to the mSurvey audience in the past. Since they are familiar with mobile surveys we re-identified them as part of the “known population” making the final study sample sizes 6,221 for the unknown population and 3,547 for the known population. In these final sub-samples, 45.1% of the unknown population and 74.8% of the known population were under the age of 29 and 54.2% of the unknown population and 41.4% of the known population identified as female (Table 1).

Table 1. Study sample distribution and completion rate overall and by subsample

2.2. Design

This study included four different sets of treatment conditions commonly applied to SMS surveys: incentive amount, pre-survey communication, length of survey, and question content. Each set of treatments consisted of two to three distinct conditions that were assigned with equal probability to participants within each of the four treatments, with the exception of incentives, which were assigned disproportionately toward no incentive due to resource constraints. See Table 2 for a description of the survey treatments.

Table 2. Study treatments, levels and hypotheses

These conditions were then randomly assigned in a cross-cutting design to form a total of 54 distinct groups as described in Appendix 1.

Compensation in the form of phone airtime varied between 0KES, 25KES, and 100KES, contingent on survey completion allowing us to test whether the completion rate significantly increased when offered some incentive (25KES) versus a higher incentive (100KES).

Survey length varied between 5 questions, 10 questions, and 25 questions with equal probability, allowing us to test whether the completion rate significantly decreased as survey length increased. Each pre-survey communication included a consent question asking the participant to opt-in to the survey along with an ex ante disclosure of the survey length to test whether completion rate significantly increased when knowing the survey length ahead of time.

Finally, the content variations included three conditions: standard content (the list of questions in standard order with Likert-style responses), question order reversal (the list of questions in reverse order with Likert-style responses), and response option variation (select questions with Likert-style responses instead asked as open-ended questions) allowing us to test whether there is a significant decrease in completion rate and response quality when a question is asked later in the survey or calls for an open-ended response in lieu of discrete options. For the question content variation, respondents were randomly assigned to complete the questions in the original order of questions with 2/3 probability, and randomly assigned to complete them in reverse order with probability 2/3 allowing us to test if responses differ depending on when in the survey participants complete the question.

The mSurvey system is designed to push each participant an SMS message with a single survey question. Depending on the question format, participants respond with numbers, words, or phrases using the mobile phone keyboard. After responding, mSurvey automatically sends the participant the next survey question for response. See Figure 1 for an example SMS survey sent by mSurvey.

Figure 1. Example SMS survey sent by mSurvey

Survey questions and responses are at no cost to the participants. mSurvey audience members receive an airtime incentive directly sent to their mobile device upon completion to compensate for their time and thoughts. On October 9, 2017, each participant was sent their assigned survey from the mSurvey system at the same time.

2.3. Analysis

This study is primarily concerned with two sets of outcomes of practical importance to researchers designing SMS surveys: survey completion (i.e., the number of surveys completed divided by the number of surveys sent) and differences in survey responses (i.e., differences in discrete response distributions and relative quality of open-text responses compared with discrete responses).

Analysis of Variance (ANOVA) tests were used to analyse the effect of treatment conditions on survey completion rate. An overall model with the complete sample as well as separate models for each sub-sample were built and tested for treatment effect significance.

Likert-type questions, as used in this paper, generate ordinal data that do not take on a standard distribution. To this end, the Mann-Whitney U test was the primary statistical test used to measure differences in survey question responses since this non-Parametric test’s limited distributional assumptions make it possible to account for both the categorical and ordinal nature of the data.

To analyse the difference in responses for when in the survey a question is asked, we applied a Holm correction for multiple comparisons to account for the range of tests conducted.

3. Results

3.1. How to improve completion rate

Overall, 36% of respondents completed their mobile survey. The completion rate was more than four times higher among those familiar with mobile surveys, the “known” population (55%) than those not familiar with mobile surveys, the “unknown” population (17.6%). Males and younger participants aged 25–29 were most likely to complete the survey with completion rate dropping as age increased in both groups (Table 1).

With regards to treatment effects, there was a significant difference in the completion rate as incentive amount and survey length varied. Looking at the incentive treatment, an incentive increase from 0KES to 25KES increased the completion rate by 3.2% and an increase from 0KES to 100KES increased the completion rate by 6% (p-value: 5.449e-06). Broken down by sub-sample, an increase in incentive amount significantly increased the completion rate among the unknowns but not the knowns. An increase from no incentive to 25KES among the unknowns increased the completion rate by 5.8%, from 13.7% to 19.5%, and an increase from no incentive to 100KES increased the completion rate by 8.2% to 21.9% (p-value: 1.334e-08). In contrast, there was no significant difference in the completion rate when knowns received a higher or lower incentive.

For the survey length treatment, a mobile survey with five questions produced the highest completion rate, especially for the known population. Overall, increasing the survey length from five questions to 10 questions significantly reduced the completion rate by 3% and increasing to 25 questions significantly reduced the completion rate by 4% (p-value: 0.001144). Looking more specifically by group, increasing the survey length from five questions to 10 questions among knowns significantly reduced the completion rate by 3.8%, from 58.2% to 54.4%, and increasing to 25 questions significantly reduced the completion rate by 5.6% to 52.6% (p-value: 0.01872). For unknowns the reduction was smaller but still significant: increasing the length from five to 10 questions reduced the completion rate by 3.3%, from 19.9% to 16.6%, and increasing to 25 questions reduced the completion rate by 3.7% to 16.2% (p-value: 0.01311).

There was no statistically significant difference in completion rate across both groups when participants received a pre-survey communication letting them know the length of the survey.

Looking at all treatments together, the optimal treatment combination for knowns is one with five questions and any incentive amount or pre-survey communication. For unknowns the optimal treatment combination is one with 100KES and any survey length or pre-survey communication (Table 3).

Table 3. Completion rate by subsample and treatment combination of incentive, survey length, and pre-survey communication.

3.2. How to improve response quality

Question order reversal

Responses significantly differ according to where in a survey the question is asked, particularly when comparing questions asked near the beginning to questions near the end of the survey. For five ordinal survey questions (question 25, 8 , 7, 3, and 1 whereby question 25 would appear as the 25th question in the standard order condition or 1st in the reverse order condition), responses were significantly more ‘positive’ on the Likert scale (as in more agreeable or likely compared to the converse) when the question came later in the survey. After transforming the dataset to where each observation is a unique participant-question combination — meaning each participant has as many observations as questions completed — and controlling for respondent fixed effects, the question order is very strongly predictive of responses across all questions (p<0.01).

Response option variation

When the order of the strongly disagree to strongly agree Likert option scale was reversed for question 4, as in respondents saw ‘strongly disagree’ first in the response options compared to seeing ‘strongly agree’ first, the proportion of those selecting ‘strongly disagree’ or ‘disagree’ significantly dropped from 16.2% to 8.0% when moved to the bottom of the list (p<0.001). This significant difference was not found for question 9 when reversing the non-Likert response options ‘Yes, Maybe, I don’t know, Not really, No.’ Zero participants selected only the first option for all questions, and less than 1% of participants selected the first option more than half of the time, suggesting the significant difference in selecting the first option for question 4 was from participants providing a substantive response and not just consistently selecting the first option by default to advance the survey.

There was no significant difference in the completion rate if a participant was offered an open-ended response to a question compared to multiple choice options. Overall, 94.1% of those who responded to the open-text options in questions 2 and 6 provided intelligible data. In comparing open-text responses to discreet options for these questions, the most common responses from the discrete response-set also featured prominently in the open-text responses. For example in question 6, “What factors are important to you while boarding a matatu?” , the most common discrete response categories — ‘number of people in a matatu’ and ‘condition of matatu’ — were both among the top three themes in a thematic analysis of open-text responses (Vehicle, Driving, Space/Capacity). A similar pattern was identified for question 2: the most common discrete response categories for the question “What do you think is the greatest problem of matatu conductors?“ were ‘Overloading passengers’ and ‘Rude language’ and the top three themes for this same question were Rudeness and misconduct, Fare, and Overloading.

However, the thematic analysis of open-text responses also yielded consistent themes that were not part of the discrete options. Similarly, some discrete options did not feature prominently in the open-text responses. For Question 6, the third most common discrete response, ‘dishonesty’, did not at all feature in the open-text responses, while the themes ‘abusiveness’, ‘misconduct’ and ‘arrogance’ featured highly in open text. Interestingly, in Question 2 certain themes that we had not thought of such as matatus not having seat belts, and the drunkenness of conductors did not realize were featured prominently.

4. Discussion

In this study, we found that those who were aware they would be receiving SMS surveys were significantly more likely to complete their mobile surveys. Given the high amount of spam sent to mobile phones in the region it is not surprising that participants feel more comfortable responding when they are familiar with the source.

Looking specifically at different treatments by group, those who are not familiar with mobile surveys are more likely to complete a mobile survey when provided an incentive of at least 25KES with the highest completion rates occurring with a 100KES incentive. This difference was not seen among the known population suggesting incentives are better motivators for unknown populations to complete SMS surveys than known populations who may be intrinsically motivated. This may also be linked to the fact that members of the mSurvey audience are accustomed to receive 20KES for completing surveys received outside of this study and as such have a longer-run interest in participating.

Those who are familiar with mobile surveys are more likely to complete a mobile survey when the survey is short (less than five questions). The nature of SMS is to send short and concise messages as suggested in the name SMS: short messaging service. The majority of mSurveys are 10 questions or fewer, therefore those familiar with the service expect their mobile surveys to be short.

With that said, the difference in completion rate across incentive amounts is larger than that for survey length, suggesting that longer surveys may be feasible in the region if accompanied by the appropriate incentive. Still, the decline in completion rate was significant but small: a maximum completion rate decline of 5% for adding 20 survey questions. To this end researchers could choose to maximize content (i.e., include additional questions) and not increase the incentive if they are comfortable with fewer respondents completing the survey.

We also found that those most likely to respond to SMS surveys across both populations are aged 25–29 years and male. This is in-line with evidence from mSurvey in recruiting a convenience sample across Kenya — men and those under 30 more commonly agree to taking SMS surveys on their mobile device.

Across both groups, there was no difference in completion rate if the participant knew the survey length ahead of time, and in turn the amount of their time needed to complete the survey. This suggests that knowing survey length in advance is not necessarily a motivator in SMS surveys.

As for quality of responses, the placement within a survey where a question is asked significantly affects the response quality in several cases. The fact that this was only statistically significant for the questions towards the beginning of a survey (or the end of the survey for the treatment group) is consistent with theoretical explanations: cognitive or physical fatigue from answering many questions affects how subjects respond to questions. Either surveys should be concise or if many questions are necessary, the more pertinent ones should appear at the beginning of the survey to ensure data capture in the case of fatigue.

Respondents selected responses in a strongly agree-strongly disagree likert scale more when the options were near the top of the list. We can speculate that these options are more salient, and thus more likely to be chosen, irrespective of their content. In the question where this finding did not hold, ordinal response options were less close to linear (e.g., Yes, Maybe, No, etc.) suggesting responses don’t vary by order for all sets of response options. Nonetheless, response order should be considered or randomly varied to mitigate bias from response order.

In contrast, we found no difference in completion rate if a participant was offered open-ended questions compared to multiple choice suggesting participants are fine with answering open-ended questions on SMS.

For open-text responses, the richness of the themes derived from the data were underlined by an overall high rate of intelligible response — a potential major limitation of SMS open-text survey questions. Given this, and the comparison of the thematic analysis to the categorical responses, open-text questions are a viable means of soliciting data in SMS surveys and can be powerful when existing trends are too few to reliably form a set of discrete options.

In summary, to collect the best data, someone designing an SMS survey should:

a.) Recruit a sample that is aware they will be receiving SMS surveys

b.) Keep mobile surveys at most 5 questions or provide a higher incentive for longer surveys

c.) Randomize question and response option order to improve response quality

d.) Use open-ended questions where qualitative data supporting the question is lacking

e.) Know that males and those under 30 years old are the most likely to complete an SMS survey


Data availability statement

All data and related materials for this study can be found on Open Science Framework here:

https://osf.io/frqm2/?view_only=692c84fa3a50428082acd703fdee0c58


References

  1. Communications Authority of Kenya (1st January — 31st March 2018), “Third Quarter Sector Statistics Report for the Financial Year 2017/2018.” Accessible online: https://ca.go.ke/wp-content/uploads/2018/07/Sector-Statistics-Report-Q3-2017-18-2.pdf
  2. R. Lester, P. Ritvo, E. Mills, Kariri, S. Karanja, M. Chung, W. Jack, J. Habyarimana, M. Sadatsafavi, M. Najafzadeh, C. Marra, B. Estambale, E. Ngugi, T.B. Ball, L. Thabane, L. Gelmon, J. Kimani, M. Ackers, F. Plummer (2010), “Effects of a mobile phone short message service on antiretroviral treatment adherence in Kenya (WelTel Kenya1): a randomised trial,” Lancet, 376, 1838–1845.
  3. D. Zurovac, R. Sudoi, W. Akhwale, M. Ndiritu, D. Hamer, A. Rowe, R. Snow (2011), “The effect of mobile phone text-message reminders on Kenyan health workers’ adherence to malaria treatment guidelines: a cluster randomised trial,” Lancet, 378, 795–803.
  4. J. Barrington, O. Wereko-Brobby, P. Ward, W. Mwafongo, S. Kungulwe, SMS for Life: a pilot project to improve anti-malarial drug supply management in rural Tanzania using standard technology (2010), Malaria Journal, 9, 1–9.
  5. Mourão, Sandro, and Karla Okada (2010). “Mobile phone as a tool for data collection in field research,” World Academy of Science, Engineering and Technology, 70(43), 222–226.
  6. Nanna, M. J., & Sawilowsky, S. S. (1998). “Analysis of Likert scale data in disability and medical rehabilitation research,” Psychological Methods, 3(1), 55–67.
  7. de Winter, J. C. F. and D. Dodou (2012). “Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon,” Practical Assessment, Research & Evaluation, 15,11. Available online: http://pareonline.net/getvn.asp?v=15&n=11.