A must-read! “The Worlds I See,” by Dr. Fei-Fei Li

Delbourg-Delphis
5 min readJan 3, 2024

--

The book “The Worlds I See: Curiosity, Exploration, and Discovery at the Dawn of AI” by Fei-Fei Li begins with the author’s apprehension about testifying on artificial intelligence before the House Committee on Science, Space, and Technology. It concludes with her sense of relief after successfully doing so. Spanning approximately 300 pages, the book provides a compelling dual narrative. It details her personal experiences as a Chinese immigrant who arrived in New Jersey at the age of 15 with her impoverished parents, and chronicles her professional journey from an early interest in physics to creating ImageNet in 2006 — a significant visual database instrumental for research in visual object recognition software and the training of AI algorithms.

The book is exceptionally well-written and a storytelling masterpiece. It displays a profound emotional connection with the outside world and an extraordinary literacy across various disciplines. Furthermore, it showcases her relentless pursuit of knowledge and commitment to the societal role of scientists. She co-founded AI4ALL in 2017 and the Stanford Human-Centered AI Institute (HAI) in 2018. Dr. Li holds the title of Sequoia Capital Professor of Computer Science at Stanford University. During her sabbatical, she spent two years at Google.

Originally trained as a physicist at Princeton, Fei-Fei Li grew “interested in understanding how the mind works and what constitutes intelligence” after a summer at UC Berkeley. This led her to pursue both neuroscience and computation, culminating in her dissertation on “Visual Recognition: Computational Models and Human Psychophysics” as a PhD student at Caltech. Doing so, she was stepping into the decades-long complicated history of “artificial intelligence,” a term that first appeared in the 1956 Dartmouth Summer Research Project on Artificial Intelligence. The project aimed to realize Alan Turing’s vision of machines capable of human-like reasoning and perception. This vision had previously motivated Warren McCulloch and Walter Pitts in 1943 to create the first computational model of neuronal activity. It was further advanced by Frank Rosenblatt’s introduction of the three-layer perceptron network on the Mark I Perceptron machine in 1957, which learned iteratively from sample images.

The concept of multi-layered processing was validated by neurophysiologists David Hubel and Torsten Wiesel in 1959, who showed that sensory processing occurs across many layers of neurons arranged hierarchically. This discovery influenced Kunihiko Fukushima’s development of the neocognitron in 1979, a multi-layered neural network model for visual pattern recognition that paved the way for convolutional neural networks (CNN). That said, despite significant advancements, such as David Rumelhart and Geoffrey Hinton’s demonstration in 1986 that artificial neural networks could learn by back-propagating errors, and Yann LeCun’s successful recognition of handwritten zip codes for the U.S. Postal Service in 1989, the “AI winter” persisted. So, almost a decade later Fei-Fei Li focused on the importance of data sets for training algorithms, understanding that algorithms’ “vision” of the world hinges on the data they are initially provided.

The book emphasizes Li’s “ambient interest in visual minutiae,” and sensory information processing, a fascination deeply rooted in her personal experiences. As a child, she would hike the mountains around Chengdu, China, with her father, searching for butterflies, and as an adolescent, she would watch him scour garage sales in the United States, “as if he wanted to catalog the world.” This translated into Li’s ambition to create “a true ontology of the world, as conceptual as it was visual, curated from the ground up by humans for the sole purpose of teaching machines.”

Li’s project began with Caltech 101, a 2003 dataset comprising 9,146 images across 101 categories, then the most extensive image collection for machine learning. Her vision expanded after meeting Christiane Fellbaum, a computational linguistics researcher who, along with cognitive psychologist George Armitage Miller, had developed WordNet — a vast lexical database also mapping semantic word relationships. Inspired, Li and her team started ImageNet, an ambitious project akin to WordNet, categorizing images not just by labels but also within a structured framework of related concepts. ImageNet became the largest manually curated dataset in AI history, setting the stage for the ImageNet Challenge, which significantly advanced the training of new algorithms. Notably, AlexNet, a convolutional neural network (CNN), triumphed in the 2012 Challenge, as did Microsoft’s Deep Residual Network (ResNet) in 2015. These successes were part of the resurgence of neural networks, now more sophisticated and potent than before, which were capturing widespread media attention, particularly with Google’s acquisition of AI startups, including DeepMind in 2014. In practice, “the study of vision was an outgrowth of artificial intelligence itself.”

As her adventure expanded with the integration of Google Street View, the American Community Survey, and multiple websites, Fei-Fei Li explored the full scope of what “vision” truly entails. It’s more than just identifying objects; it’s about transforming images into stories, gestures into scenes, all infused with societal, cultural, and even political colorations. Like many researchers, she revisited millennia of philosophical research on how we acquire knowledge and learn, from Aristotle’s categories to Ludwig Wittgenstein’s assessment of the contextual meaning of “meaning.” While noting how AI “revealed our world from perspectives we’d never imagined,” she also came to the realization that such perspectives could also embed long-standing biases — which led her to co-found AI4ALL and to adopt as her new North Star, the “reimagining of AI from the ground up as a human-centered practice.”

I highly recommend this book for numerous reasons, among them:

· An exceptionally sensitive portrayal of how immigrants can transform their destinies by re-envisioning the world, eloquently expressed through her mother’s words: “Learning a new language is like opening a door to a new world.”

· A deep commitment to aesthetic authenticity, alongside her skill in creative improvisation and intellectual grit in a sphere typically dominated by men.

· A comprehensive analysis of the maturation of what is often labeled as “disruptive innovation,” which is, in fact, the culmination of many decades of efforts, marked by both breakthroughs and setbacks.

--

--

Delbourg-Delphis

Serial Technology CEO. Board Member. Strategy Consultant. Author: Everybody Wants to Love Their Job (2018); Beyond Eureka! The Rocky Roads to Innovating (2024)