Lookit (Part 1): A New Online Platform for Developmental Research
Richard N. Aslin, the Editor of Open Mind with the MIT Press, says “The availability of online data-collection methods (e.g., Amazon’s Mechanical Turk) has revolutionized accessibility to large sample sizes in research on human adults. Scott and Schulz (2017) extend this method to infant populations by developing a parent-monitored custom platform for data collection in the home using standard webcam access to the infant’s face and voice. This new platform, called Lookit, was validated on sample sizes an order of magnitude larger than those in typical published studies of infants. Although care must be taken to render the at-home and in-lab methods comparable in terms of screen-size, experiment timing, and presence of distractions, this new platform has the potential to overcome a limitation inherent to small sample-sizes of in-lab studies and to enable access to more diverse populations of infants.”
Volume 1 | Issue 1 | Winter 2017
Kimberly Scott and Laura Schulz
Many important questions about children’s early abilities and learning mechanisms remain unanswered not because of their inherent scientific difficulty but because of practical challenges: recruiting an adequate number of children, reaching special populations, or scheduling repeated sessions. Additionally, small participant pools create barriers to replication while differing laboratory environments make it difficult to share protocols with precision, limiting the reproducibility of developmental research. Here we introduce a new platform, “Lookit,” that addresses these constraints by allowing families to participate in behavioral studies online via webcam. We show that this platform can be used to test infants (11–18 months), toddlers (24–36 months), and preschoolers (36–60 months) and reliably code looking time, preferential looking, and verbal responses, respectively; empirical results of these studies are presented in Scott, Chu, and Schulz (2017). In contrast to most laboratory-based studies, participants were roughly representative of the American population with regards to income, race, and parental education. We discuss broad technical and methodological aspects of the platform, its strengths and limitations, recommendations for researchers interested in conducting developmental studies online, and issues that remain before online testing can fulfill its promise.
Behavioral research with infants and children stands to illuminate the roots of human cognition. However, many important questions about cognitive development remain unasked and unanswered due to the practical demands of recruiting participants and bringing them into the lab. Such demands limit participation by families from diverse cultural, linguistic, and economic backgrounds; deter scientists from studies involving large sample sizes, narrow age ranges, and repeated measures; and restrict the kinds of questions researchers can answer. It is hard to know, for instance, whether an ability is present in all or only most children, or whether an ability is absent or weakly present. Such small distinctions can have large theoretical and practical implications. Fulfilling the promise of the field depends on scientists’ ability to measure the size and stability of effects in diverse populations.
In adult psychology, online testing through Amazon Mechanical Turk (AMT) has begun to lower barriers to research, enabling scientists to quickly collect large datasets from diverse participants (Buhrmester, Kwang, & Gosling, 2011; Paolacci, Chandler, & Ipeirotis, 2010; Rand, 2012; Shapiro, Chandler, & Mueller, 2013). As the technical hurdles involved in online testing dwindle, we are poised to expand the scope of questions developmental science can address as well. Online testing can allow access to more representative populations, children from particular language groups or affected by specific developmental disorders, and information about children’s behavior in the home. Access to larger sample sizes will also allow researchers to estimate effect sizes with greater precision, detect small or graded effects, and generate sufficient data to test computational models. The motivation to bring studies online is bolstered by growing awareness of the importance of direct replication and reproducible results (Open Science Collaboration, 2015; Pashler & Wagenmakers, 2012).
However, before the potential of online developmental research can be fully realized, we need a secure, robust platform that can translate developmental methods to a computer-based home testing environment. Here we present a new online developmental research platform, Lookit. Parents access Lookit through their web browsers, participate at their convenience in self-administered studies with their child, and transmit the data collected by their webcam for analysis. To follow, we address broad ethical, technological, and methodological issues related to online testing, describe the demographics of our online participant population, and offer recommendations for researchers seeking to adapt studies to an online platform. For an empirical report of our user case studies, including raw data and analysis code, please see Scott, Chu, and Schulz (2017). For technical and methodological details regarding the platform itself and video coding procedures, please see the Supplemental Materials (Scott & Schulz, 2017).
Testing children online raises a number of ethical concerns specific to the online environment, including providing fair and noncoercive reimbursement, ensuring the validity of informed consent, and protecting parents’ privacy. We addressed these issues through the recruitment and registration procedures described below.
Recruitment, Reimbursement, and Registration
Participants were recruited via AMT and linked to the Lookit site (https://lookit.mit.edu). Participants were paid three to five dollars (depending on the study) for participation to ensure payment of at least the minimum wage nationally, even in cases where parents encountered technical difficulties and contacted the lab. This policy is in accordance with guidelines for researchers regarding fair payment (http://guidelines.wearedynamo.org/). To ensure that parents did not feel any pressure to complete a study, especially if their child was unwilling to continue, parents were paid if they initiated a study, regardless of completion and of any issues in compliance or implementation. Participation required creating a user account and registering at least one child. As in the lab, parents provided their child’s date of birth to determine study eligibility. A demographic survey was available for parents to fill out at any point.
Consent and Privacy
At the start of each study, a consent form and the webcam video stream was displayed. The parent was instructed to record a brief verbal statement of consent (see Supplemental Materials for examples), ensuring that parents understood they were being videotaped. Parents were free to end the study at any point. After completing a study, the parent selected a privacy level for the video collected. Across our three test studies, 31% of sessions were marked “private” (video can be viewed only by our research team), 41% “scientific” (video can be shared with other researchers for scientific purposes), and 28% “free” (video can be shared for publicity or educational purposes). Parents also had the option to withdraw their data from the study at this point; this option was chosen by less than one percent of participants and was treated as invalid consent. A coder checked each consent video before looking at any other video associated with the session. Valid consent was absent for 16% of participants overall (N = 961) due to technical failures in the video or audio transmission, parents not reading the statement, or subsequent withdrawal from the study.
Multiple short clips were recorded in each study during periods of interest. Video quality varied due to participants’ upload speed. Our primary concern for coding looking measures was the effective framerate of the video. Because the putative framerate of the video was unreliable due to details of the streaming procedure, we estimated an effective framerate based on the number of “changed frames” in each clip. A “changed frame” differed from the previous changed frame in at least 20% of pixels; note that this underestimates higher framerates since frames close in time without major movement may actually differ by fewer than 20% of pixels. Videos with an effective framerate under 2 frames per second (fps) were excluded as unusable for looking studies (see SI for examples of video at various effective framerates). The median effective framerate across sessions with any video was 5.6 fps (interquartile range = 2.9–8.6 fps).
Starting webcam video introduced a small (generally less than 1 s) variable delay in the start of the recorded clips. In the preferential looking study, where spoken questions accompanied test videos, onset of audio playback was used to more precisely determine when test periods began. Future versions of this platform will avoid this delay and improve framerates.
Before completing study-specific coding, one coder checked that video from the study was potentially usable. Video was unusable 35% of the time (282 of 805 unique participants with valid consent records). The most common reasons for unusable video were absence of any study videos (44% of records with unusable video), an incomplete set of study videos (20%), and insufficient framerate (15%). Rarely, videos were unusable because a child was present but generally outside the frame (3%) or there was no child present (1%).
To test the feasibility of online developmental research across a variety of methods and age groups, we conducted three studies: a looking-time study with infants (11–18 months) based on Téglás, Girotto, Gonzalez, and Bonatti (2007), a preferential looking time study with toddlers (24–36 months) based on Yuan and Fisher (2009), and a forced choice study with preschoolers (ages 3 and 4) based on Pasquini, Corriveau, Koenig, and Harris (2007). These allowed us to assess how online testing affected coding and reliability, children’s attentiveness, and parental interference. For details on the specific studies, see Scott et al. (2017).
Coding and Reliability of Looking Time Measures
Each session of the looking time study was coded using VCode (Hagedorn, Hailpern, & Karahalios, 2008) by two coders blind to condition. Looking time for each of eight trials per session was computed based on the time from the first look to the screen until the start of the first continuous one-second lookaway, or until the end of the trial if no valid lookaway occurred. Differences of 1 s or greater, and differences in whether a valid lookaway was detected, were flagged and those trials recoded. Agreement between coders was excellent; coders agreed on whether children were looking at the screen on average 94.6% of the time (N = 63 children; SD = 5.6%). The mean absolute difference in looking time computed by two coders was 0.77 s (SD = 0.94 s).
Measuring until the first continuous lookaway of a given duration introduces a thresholding effect in addition to the small amount of noise induced by a reduced framerate. The magnitude of this effect depends on the dynamics of children’s looks to and away from the screen. We examined a sample of 1,796 looking times, measured until the first one-second lookaway, from 252 children (M = 13.9 months, SD = 2.6 months) tested in our lab with video recorded at 30 Hz. Reassuringly, in 68% of measurements, the lookaway that ended the measurement was over 1.5 s. We also simulated coding of these videos at framerates ranging from 0.5 to 30 Hz; the median absolute difference between looking times calculated from our minimum required framerate of 2 Hz vs. original video was only .16 s (interquartile range = 0.07–0.29 s; see Figure S1 (Scott, Chu, and Schulz, 2017)).
Coding and Reliability of Preferential Looking Measures
Each session of the preferential looking study was coded using VCode (Hagedorn et al., 2008) by two coders blind to the placement of test videos. Looks to the left and right are generally clear; for examples, see Figure 1. Three calibration trials were included in which an animated attention getter was shown on one side and then the other. During calibration videos, all 138 participants coded looked on average more to the side with the attention getter. For each of nine preferential looking trials, we computed fractional right/left looking times (the fraction of total looking time spent looking to the right/left). Substantial differences (fractional looking time difference greater than .15, when that difference constituted at least 500 ms) were flagged and those clips recoded. A disagreement score was defined as the average of the coders’ absolute disagreement in fractional left looking time and fractional right looking time, as a fraction of trial length. The mean disagreement score across the 138 coded participants was 4.44% (SD = 2.00%, range 1.75–13.44%).