Do Infants hold the key to developing superior Artificial Intelligence?

Published in

Bits and Neurons

9 min readAug 7, 2022

Zaadnoordijk et al.’s article discusses the importance of why we must look at how we ourselves learn at very early stages of life in order to construct unsupervised algorithms that model ‘true intelligence’. The next generation of Machine Learning algorithms need to take more appropriate inspiration from human cognition to reach its full potential.

“A picture of an infant reading a book with a glowing key in his hand looking away from the camera.png” — via DALL-E

Context

We have had Machine Learning (ML) for a while now (it’s around 59 years old this year!) and consequently, we have reaped its benefits of simulating intelligent behaviour for a while as well. This has also sparked quite an interest to constantly keep improving it. We have had tremendous breakthroughs in Supervised Learning which is a branch of ML algorithms that learn with the help of very large datasets associated with the problem the algorithm is trying to solve. For example, if you want the algorithm to predict whether the image you are showing is a cat or a dog, you will need to provide it with a large labelled dataset of images of cats and dogs. This dataset will help the supervised algorithm to check if it’s predictions are correct or not and readjust itself accordingly to try and be correct the next time during its training phase.

Supervised learning is the most prominent way of solving ML problems and it has been quite impressive so far. One problem with supervised learning, that you may have already noticed, is the requirement of large labelled datasets. These datasets require a tremendous amount of man hours to label which can be quite expensive — especially if the data is from a niche field like pathology or genomic sequences.

This problem is exactly what the branch of Unsupervised Learning algorithms solves. This set of algorithms do not need any pre-labelled data to start learning. Their secret to success is the fact they analyse and find patterns between each datapoint and cluster them into classes based on intrinsic similarities For example, an unsupervised algorithm that is fed a dataset of unlabelled images of cats and dogs will find pixel and feature similarities between each image and use that to form two clusters — one for dogs and one for cats, without knowing which is which. These algorithms are far less expensive to train as no labelling is required. Although unsupervised learning overcomes the cost of labelling data, it suffers from poor accuracy as it is not transparent or consistent in how the clusters in the data form.

picture that highlights the difference between supervised and unsupervised learning — via researchgate

Now let’s get to the meat of this article and talk about babies as promised in the title. Due to prolonged reliance on supervised learning and its high costs, researchers started looking at unsupervised learning with a lot more interest. Further, a lot of cognitive neuroscientists and others from the field of AI chasing ‘true intelligence’ that does not require large datasets also turned their eyes to unsupervised learning. After all, we do not come into the world with a requirement of a large and greatly curated dataset to start exhibiting intelligent behaviour. In fact, it turned out that studying how infants learn may just be the key to getting one step closer to recreating ‘true intelligence’ through unsupervised algorithms.

Why Infants?

This is where Zaadnoordijk et al.’s Perspective Nature article comes in. Machine Learning has taken a significant inspiration from cognitive neuroscience with the dependence on artificial neural networks (that is modelled directly after the brain). However, a lot of this inspiration for learning form neuroscience is based on adults — who already possess vast amounts of data and labels from their experiences in life. This sounds a lot like supervised learning, doesn’t it? This is why the authors urge the field to look at cognitive development of infants to elevate the paradigm of unsupervised learning. Since infants do not possess such labelled data in the form of memories, unlike their adult counterparts, they are a much more suitable source of inspiration for unsupervised learning algorithms. The authors hope that developmental research on infants will open the doors to the next generation of unsupervised learning algorithms.

How do Infants learn?

Humans are required to learn useful representations from unlabelled data in our initial years on this planet. As infants, we learn to visually (and otherwise) perceive a plethora of stimuli and respond in appropriate ways — move around, ingest things, make associations and so many other complex tasks. The interesting part is that infants learn a lot of these complex tasks in a very rapid manner using very little examples to learn from (training data). This is quite a distinction from state-of-the art ML algorithms that are regularly employed today. In the article, Zaadnoordijk et al. outline three crucial factors that enable infants to learn so effectively in hopes of it being transferred to ML algorithms.

In-built traits and Guided information processing

The infant brain comes into the world with a lot of its structures already present despite popular beliefs that the brains start off immature and underdeveloped and grow along with the infant itself. Even brain regions that are thought to be involved in complex cognitive functions like the hippocampus (memory, learning and associating the two)and frontal lobes (executive functions like planning and organising) are active and almost fully formed by infancy. Despite almost being whole, the average infant brain is a lot more plastic than the adult brain which leads them to change dramatically based on the type of inputs they observe. This great extent of plasticity could be at least partly responsible for the rapid nature of infant learning. Therefore pursuing self-rewiring and correcting neural networks to mimic the plastic nature of the infant brain is a promising pathway.

Additionally, developmental scientists emphasise (although the specifics are greatly debated) that many of the building blocks of cognition may already be present at birth. There is general consensus that babies are born with inductive biases which shape and clearly impact their learning processes. Examples include a preference for focussing on human faces and biological motion. Another important prior includes the ability of finding strong continuous lines and contiguous surfaces (that help us stand up and find our way around early in life) which are attributed to visual cortical micro-circuitry in our early brains. Interestingly, this same micro-circuitry was translated to Neural Networks which solved contour and edge detection tasks with significantly better efficiency than the state-of-the-art. This is simply one example that studying early neural architecture has a lot of promise for advancing Artificial Intelligence.

An additional factor to focus on is the effect of parental instructions on infant learning. How parents might help us associate an apple as edible food and how they teach us in learning how to stand up and walk can be viewed as hyperparameter tuning of a neural network, in our analogy of infant brains as ML algorithms. Although the level of tuning conducted in today’s neural networks is a minute fraction of the level occurring in the infant brain. This is directly related to significantly more complex and connected infant brain which AI researchers have simply not been able to replicate. Therefore it is fruitful to draw from a far richer and larger hyperparameter space to streamline learning.

Diverse and multimodal inputs

A main reason why infants are hypothesised to learn more effectively than ML models is their ability to process multimodal inputs. From a very early age, we are able to combine auditory and visual information (and sometimes tactile) to follow instructions and handle objects. Infants also show stellar performance in tasks where they are able to associate voices to faces of people they are familiar with. This ability to constantly employ multiple sensory modalities in tandem is hypothesised to be crucial to effective learning. This is is because having diverse inputs leads to richer representations of the information in the infants’ minds which then leads to improved task performance. These richer representations arise from the simple fact that multimodal information can either support or reject any ambiguation received from just a single sensory stream. And of course, this ability gives us the edge over machines on tasks that require input from multiple sensory streams.

Since multimodal perception enables richer and more conceptual representations of information during processing, it follows that AI needs to address this ability. Traditionally, most models are trained on unimodal data — image classification models only learn the hidden semantics of pixels and features in the image (visual modality). Ideally, a model should be able to process any information from any modality given to it in a way to produce a favourable output. There has already been quite a bit of research on bi and multimodal networks. For example, Open-AI’s CLIP (in fact the thumbnail of this article was generated by DALL-E which is partially powered by CLIP!) uses Contrastive Learning to learn the semantic relations of the image and its features along with the textual features of its labels to improve efficacy of image classification. Despite such rare efforts, multimodal models need to be treated as the standard for solving problems that require intelligence.

Curricula and active learning

The final component of infant learning that is outlined by the authors of the article is the contents of what they learn. Inputs are given to infants in an almost phased manner. For example: new-born babies only see what’s presented to their visual fields by other humans; a crawling infant’s visual stimuli are mostly limited to what is on the level of the floor; a walking infant is exposed to quite a lot more visual stimuli. Furthermore, learning to walk and move around enables the infants to perceive height and space. Similarly, sensory capabilities also increase in a phased manner — new-borns have very low visual acuity (the ability to distinguish shapes and objects from a distance) which increases as they get older. These incremental additions to stimuli and inputs act almost like a natural curriculum which facilitates the learning process for infants just like how our school curriculums are designed to streamline our own educational learning processes.

On a parallel note, parents and caregivers also impose a certain curriculum on the infants. This is done in several ways. To highlight the importance of an object, adults may point or gesture to it continuously that will indicate the infant on where to pay attention. Parents may also speak in “baby-talk” with extremely simple words and pronunciations in an enunciated manner to converse with the infants — making it easier for them to understand. Slowly, they start using longer and faster speech with the infants as they grow older which is like the removal of training wheels for auditory perception. There are many other means of parental support to the infant learning process like guided play and knowledge transfers that again ease the process of learning.

These curriculum-based approaches have been attempted in ML algorithms. In analogy to the natural curriculum, there have been attempts at training Neural Networks first with blurred images and then moving on to higher resolutions in phases. This network displayed improved accuracy and generalisability compared to traditional networks. On the other hand, the humans in Human-in-the-Loop algorithms mimic the role of caregivers to some extent where humans try and guide the learning process of the model without being given the same level of control as parents would have for their infants. However, not enough work has been done to enforce a curriculum (natural or otherwise) for ML models.

Finally, infants do not engage in learning passively based on the inputs available or given to them (unlike ML models), but rather they actively go out and direct their attention to stimuli that they desire. This is curiosity-driven active learning. Infants are drawn to stimuli that excite them as long as they are somewhat familiar or comfortable with them. Infants tend to actively explore and consequently learn the workings and properties of these stimuli. This intrinsic curiosity is wonderous and paramount to our learning process (not just as infants) and arguably difficult to replicate artificially. Algorithms that are able to somehow implement a willingness to explore certain novel or familiar data or processes that enables them to learn more about it is a step in the right way for the next unsupervised learning algorithms.

The next generation of unsupervised learning algorithms

The articles raises a few great factors that directly affect the superior efficacy of human infants which point to ways to improve ML algorithms. Algorithms that do not require large labelled datasets and have features like plasticity, useful biases and priors, multimodal information processing, curriculum-based approaches and a tendency to explore what is ‘interesting’ to it (this definition could be made malleable into what the algorithm thinks is important in the data) are the future of unsupervised learning. These algorithms, in theory, will take us a step closer to true intelligence as we are drawing more (and this time, more appropriate) inspiration from true intelligence.