Boss Baby

Scope Staff
The Scope
Published in
3 min readMay 10, 2024

By: Hien Tran

Can we teach a machine to learn the same way a baby does? Researcher Wai Keen Vong from New York University pushed the limits of machine learning by exploring this question. The study, conducted with Vong and his colleagues, used sixty-one hours of recordings of a baby boy named Sam, capturing one percent of his waking hours from when he was six months to around two years old. The study focused on teaching language learning and object recognition to the machine, using Sam’s real experiences as the template.

A neural network is a machine learning algorithm that takes in raw inputs, like video frames and natural language, and learns how to update its internal representations based on a pre-defined objective. Vong and his team created a trained neural network based on frames of Sam’s videos in which words were spoken. The scientists were curious about how humans learn their first words and wanted to see how far this ability could be applied in machine learning. “Past word learning models have particular assumptions that wouldn’t translate to working with real natural language,” Vong said. A key example of what the neural network was capable of doing was contrastive learning — where the machine learned to identify which images and text were related and which were not. The videos heavily showcased this type of learning in Sam’s own life. “The [neural network] was exposed to a generic model of one kid’s experience so it can learn a whole bunch of concepts using real data,” Vong said.

The success of this study suggests that machine learning could even be used to explore the foundational concept of learning through a nature versus nurture lens. As data collection continues, Vong and his team are looking for a universal set of grammar rules for babies to explain common speech patterns and language-learning techniques of children across the globe. Their next steps are focused on mapping problems to expand concept learning. Their current data only cover a small slice of language, so many other aspects such as pronouns and names are difficult for AI to learn. Vong and his team are working on ways to capture other distinctive concepts in AI neural networks and increase the amount of data they can collect.

Although their results were promising, Vong highlighted several challenges that the technology is continuing to address. Shortcomings of this model stem from the fact that the recorded experiences only represent one percent of the baby’s waking activities. As such, the data collected do not include the full learning experience. Crawling, walking, and interacting with the environment, activities that were all omitted from this study, can heavily impact the way that a child learns. “[Given] the fact that the training model was from passive videos, there are boundaries to determining what can be learned with movement and actions,” Vong said. Another shortcoming was the model’s inability to recognize the hand as part of the body in the video recordings, which babies can easily do. Additionally, machine learning has yet to recognize sounds of speech that are not words; for example, babies can learn the sound “uh oh,” but the machine cannot.

Ultimately, the study raises the question: How human can machine learning get? The implications of the study are endless. Language learning and concept recognition could be combined with current technology: imagine Alexa with the power of ChatGPT — except this time, ChatGPT could learn from our input and behaviors. The product would be a personalized technology that consumers can teach and customize, which might one day allow us to tailor the perfect piece of technology that suits an individual’s specific needs.

--

--