Preparing for Automation and Evaluating Expert Curation in a Baby Milestone Tracking App

Kayla Jacobs
ACM CHI
Published in
4 min readMay 7, 2019

Dr. Ayelet Ben-Sasson*, Dr. Eli Ben-Sasson**, Kayla Jacobs**, Elisheva Rotman Argaman**, Eden Saig** (*University of Haifa, **Technion)

This article summarizes a paper that will be presented at CHI 2019 on Tuesday 7th May 2019 at 11:00 in the session Kids And Health.

Machine learning techniques for automation are plentiful and powerful — but it is important to assess whether and how it is possible to use them well. Here we explore our assessments for automation-readiness in a baby milestone tracking app currently relying on human experts, with lessons for any projects considering integrating automation with high requirements for classification accuracy.

Background: Why Child Development Tracking Matters and the babyTRACKS Solution

One in six children has a developmental delay that impairs attainment of critical life skills in motor, language, cognitive, and/or social-emotional abilities. Early childhood developmental screening is critical for timely detection and intervention, which leads to better outcomes for kids and their families. The problem is especially acute in socioeconomically disadvantaged communities where regular access to healthcare is limited, but even in countries with adequate medical resources, 70% of children with a developmental delay are diagnosed late.

To empower parents to better partner with healthcare professionals to monitor their children’s development, we created babyTRACKS, a free, live, interactive developmental tracking app (available at the Apple App Store, and Google Play) with over 3,000 children since 2015. Parents write or select short milestone texts, like “began taking first steps,” to record their babies’ developmental achievements, and receive crowd-based percentiles to evaluate development and catch potential delays.

Screenshots from the babyTRACKS app

Scaling Curated Crowd Intelligence (CCI)

Behind the scenes in babyTRACKS, an expert-based Curated Crowd Intelligence (CCI) process manually groups incoming novel parent-authored milestone texts according to their similarity to existing milestones in the database (for example, “starting to walk”) or determining that the milestone represents a new developmental concept not seen before in another child’s diary.

To help CCI scale, we want to use machine learning to automate part or all of the current manual process. We stepped back to assess our automation readiness through three studies investigating:

(1) the scalability limitations of our CCI process, by analyzing the human cost of CCI, how the work is currently broken down, and which areas are (and are not) ripe for automation.

(2) the consistency of our dataset by testing the inter-rater reliability of curators and hence the validity of our milestone data for algorithmic training and evaluation purposes; and

(3) the value of the dataset, by appraising the “real world” clinical value of milestones when assessing child development.

We conclude that automation can indeed be appropriate and helpful for a large percentage, though not all, of CCI work. We further establish realistic upper bounds for algorithm performance; confirm that the babyTRACKS milestones dataset is valid for training and testing purposes; and verify that it represents clinically meaningful developmental information.

Pre-Automation Lessons Learned

Our work illustrates several important benchmarks to check prior to plunging in to developing a machine learning algorithm to automate an existing manual process (adapted of course to the technical specifics of the task):

  • Assess if / how much automation can actually help
  • Establish target for algorithm performance, based on best human agreement, to know what to aim for (may be less than 100%).
  • Ensure your training/evaluation gold-standard dataset is indeed valid and meaningful.
  • Remember that automation need not be all or nothing: algorithms can aid humans (for example, through narrowing down options), substantially speeding up manual tasks even if not able to fully replace people

Learn More

Citation:

Ayelet Ben-Sasson, Eli Ben-Sasson, Kayla Jacobs, Elisheva Rotman Argaman, Eden Saig. 2019. Evaluating Expert Curation in a Baby Milestone Tracking App. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ‘19). ACM, New York, NY, USA, Paper 553, 12 pages. DOI: 10.1145/3290605.3300783

--

--