Stories by Alessandro Lamberti on Medium

How the Gemini Robotics family translates foundational intelligence into physical action

Alessandro Lamberti — Sun, 28 Sep 2025 19:49:56 GMT

This story was originally published here: https://newsletter.caffeinatedengineer.dev/p/how-the-gemini-robotics-family-translates

The modern trajectory of artificial intelligence has been a story of rapid ascent, but one largely confined to the digital sphere. We have witnessed immense computational power unlock complex reasoning across text and imagery. However, the path to creating truly general-purpose autonomous AI — systems capable of operating robustly and reliably in the physical world — demands a fundamental transformation. This transition requires overcoming the crucial challenge of embodied reasoning (ER): the complex set of world knowledge encompassing spatial understanding, intuitive physics, and inter-object relationships that are foundational for physically grounded agency.

src. https://arxiv.org/pdf/2503.20020

The latest iteration of this effort, the Gemini Robotics 1.5 family of models, represents a cohesive architectural step toward addressing this challenge head-on, significantly extending the capabilities of prior systems. This family, comprising the Gemini Robotics-ER 1.5 (VLM) and Gemini Robotics 1.5 (VLA), takes a definitive step toward enabling robots to perceive, reason, and act to solve highly complex, multi-step tasks in unstructured environments.

src. https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world/

This essay explores the core technical innovations — the dual agentic architecture, the thinking VLA framework, and the multi-embodiment motion transfer mechanism — that underpin this push toward generalist physical agents.

I. The dual architecture for intelligence and action

The physical world demands adaptability and long-horizon planning, requirements that strain monolithic robotic architectures. The Gemini Robotics approach solves this by implementing a Dual Agentic System Architecture, separating the roles of high-level intellect (orchestration) and low-level execution. This framework is critical for handling complex, multi-step tasks that require contextual information and sequential completion.

The orchestrator: Gemini Robotics-ER 1.5 (The VLM brain)

The Gemini Robotics-ER 1.5 model functions as the high-level brain, or orchestrator, controlling the overall flow of the task. This Vision-Language-Model (VLM) is optimized for complex embodied reasoning problems such as task planning, reasoning for spatial expertise, and task progress estimation.

High-level planning and tool use: GR-ER 1.5 excels at planning and making logical decisions within physical environments. To tackle tasks that require external information — such as determining local recycling guidelines based on location — the orchestrator can natively call tools like Google Search or any third-party user-defined functions.
Adaptive orchestration: the orchestrator processes user input and environmental feedback. It breaks down complex tasks into simpler steps that the VLA can execute. For example, asked to “Pack the suitcase for a trip to London,” the orchestrator might access a travel itinerary or weather forecast to decide which clothes are appropriate to pack, then produce a high-level instruction like “pack the rain jacket into the luggage”.
Advanced sensing: GR-ER 1.5 achieves state-of-the-art performance on spatial understanding and is the first thinking model optimized for embodied reasoning. It evaluates task progress and detects success to determine when to advance to the next step.

src. https://storage.googleapis.com/deepmind-media/gemini-robotics/Gemini-Robotics-1-5-Tech-Report.pdf

The Action model: Gemini Robotics 1.5 (The VLA hand)

The Gemini Robotics 1.5 model is the Vision-Language-Action (VLA) model responsible for execution. It translates instructions issued by the orchestrator into direct, low-level robot actions. GR 1.5 is a derivative of Gemini fine-tuned to predict robot actions directly and enables general-purpose robot manipulation across different tasks, scenes, and multiple robots.

II. Thinking before acting

A critical architectural breakthrough in Gemini Robotics 1.5 is the implementation of Embodied Thinking — the ability for the model to explicitly reason or “think” before taking physical action. Traditionally, VLA models translated instructions or linguistic plans directly into movement. The Thinking VLA (GR 1.5, with thinking mode ON) now interleaves actions with a multi-level internal monologue of reasoning and analysis articulated in natural language.

Mechanism and performance gains:

Task decomposition: this process simplifies the challenging cross-modal translation (mapping complex language goals to low-level actions) into two easier stages. The model converts complex tasks into sequences of specific, short-horizon, language-based steps. For instance, when asked to “Sort my laundry by color,” the Thinking VLA first understands the semantic goal (”putting the white clothes in the white bin”), and then plans the detailed motion (”moving a sweater closer to pick it up more easily”).
Robustness to complexity: this decomposition dramatically improves the model’s capacity to handle multi-step tasks, resulting in a sizable improvement in the progress score for multi-step benchmarks compared to the model without thinking enabled.
Situational awareness and recovery: the Thinking VLA gains an implicit awareness of its progress, eliminating the need for a separate success detector. This enables sophisticated recovery behaviors; if an object slips from the gripper (e.g., a water bottle lands near the left hand), the next thinking trace instantly generates a self-correction (e.g., “pick up the water bottle with the left hand”).
Transparency: by generating its internal analysis in natural language, the Thinking VLA makes the robot’s decisions and plan execution transparent and more interpretable to human users.

src. https://storage.googleapis.com/deepmind-media/gemini-robotics/Gemini-Robotics-1-5-Tech-Report.pdf

III. Scaling the physical world: generalization and motion transfer

General-purpose robotics has long been hampered by the data scarcity problem and the sheer difficulty of transferring skills between robots of different forms and sizes. Gemini Robotics 1.5 addresses this by integrating a Motion Transfer (MT) mechanism and novel architecture within its pre-training process.

Multi-Embodiment Learning

GR 1.5 is designed as a multi-embodiment VLA model, trained on heterogeneous data from various robot platforms. This foundational approach allows the model to learn a unified understanding of motion and physics.

Universal control: the same model checkpoint can successfully control dramatically different form factors, including the ALOHA robot, the Bi-arm Franka robot, and the Apollo humanoid robot, without requiring robot-specific post-training.
Zero-shot transfer: the MT mechanism is crucial for enabling the model to learn from diverse robot data sources and facilitating zero-shot skill transfer from one robot to another. For instance, skills identified in ALOHA data, such as closing a precise pear-shaped organizer, can be transferred and executed successfully by the Bi-arm Franka robot. The MT training recipe is specifically noted for amplifying the positive effect of multi-embodiment data.
Rapid adaptation: this learned foundational knowledge enables rapid task adaptation for new, short-horizon tasks, requiring as few as 50 to 100 demonstrations for fine-tuning to reach high success rates.

Robust generalization capabilities

The high-capacity VLM backbone combined with diverse training data yields strong generalization performance across multiple axes:

Visual generalization: the system is robust to changes in the visual scene that do not affect the task, such as adding novel distractor objects, replacing the background (e.g., with a blue-white cloth), or changing lighting conditions.
Instruction generalization: the model understands the intent behind language even when instructions contain typos (”Put the top lft gren grapes…”), are rephrased, or are expressed in a new language (e.g., Spanish/Castilian, such as “Coloque las uvas verdes…”).
Action generalization: it can adapt learned motions to handle variations in object instances (e.g., folding different dress sizes) or unusual initial conditions.
Task generalization: this is the most comprehensive form of generalization, demonstrating the ability to successfully execute entirely new tasks in new environments, requiring robustness across all other axes simultaneously.

IV. Specialized dexterity and real-world agency

The synthesis of advanced reasoning and generalized action capability enables Gemini Robotics to achieve mastery over tasks demanding extreme dexterity and long-horizon execution.

Long-horizon dexterity: Gemini Robotics can tackle notoriously challenging, multi-step tasks requiring precise manipulation, such as origami folding or packing a snack into a Ziploc bag. When specialized through fine-tuning, the model demonstrates exceptional performance, including achieving a 100% success rate on the full long-horizon lunch-box packing task, which typically takes over two minutes to complete.
Advanced semantic reasoning: the system demonstrates sophisticated contextual understanding necessary for agency. For example, GR-ER 1.5 can successfully execute instructions involving novel semantic concepts like identifying the “Japanese fish delicacy” (sushi) among distractors, or understanding relative spatial size concepts like packing the “smallest coke soda”.
Physical constraint reasoning: GR-ER 1.5 can follow complex pointing prompts that require reasoning about physical constraints, such as identifying objects a robot is physically able to pick up based on a given payload (e.g., 10lbs). It can also generate trajectories that actively avoid collisions.

V. Embodied reasoning and safety

The power of a generalist physical agent necessitates a robust and comprehensive safety framework. The development of the Gemini Robotics models strictly adheres to the Google AI Principles.

State-of-the-Art Embodied Reasoning (GR-ER 1.5)

GR-ER 1.5 significantly advances the state-of-the-art for reasoning capabilities critical for robotics. It was evaluated on 15 academic benchmarks, including Embodied Reasoning Question Answering (ERQA) and Point-Bench. ERQA specifically measures abilities like spatial reasoning, trajectory reasoning, and action reasoning.

Reasoning-enhanced performance: GR-ER 1.5 establishes a new state-of-the-art on these benchmarks. Crucially, its performance is enhanced when incorporating Chain-of-Thought (CoT) prompting, which encourages the model to output step-by-step reasoning traces before committing to an answer. This thinking process scales better with inference-time compute for embodied reasoning tasks compared to generic models like Gemini 2.5 Flash.
Success and progress estimation: GR-ER 1.5 excels at capabilities like task planning, progress estimation, and success detection — essential functions for robot autonomy.

src. https://storage.googleapis.com/deepmind-media/gemini-robotics/Gemini-Robotics-1-5-Tech-Report.pdf

The holistic safety framework

The hybrid digital-physical nature of these models requires a specialized, multi-layered safety perspective. The holistic approach includes:

Safe human-robot dialogue: by building on the base Gemini checkpoints, the models inherit safety training ensuring alignment with Gemini Safety Policies, preventing the generation of harmful conversational content (e.g., hate speech, inappropriate advice).
Physical action safety: this addresses traditional robotics concerns, ensuring the VLA models are interfaced with classical, low-level safety-critical controllers for hazard mitigation, collision avoidance, and force modulation.
Semantic action safety: this addresses the “long-tail” of common-sense rules essential for operating in human-centric environments. Examples include preventing the robot from placing a soft toy on a hot stove or serving peanuts to an allergic person. This is improved through explicit safety reasoning (Thinking about Safety).
Evaluation: semantic safety is evaluated using the specialized and upgraded ASIMOV-2.0 benchmark. This benchmark incorporates data based on real-world injury scenarios sourced from the NEISS records. GR-ER 1.5 demonstrates improved performance in recognizing risks and understanding the safety consequences of actions compared to earlier models.

src. https://storage.googleapis.com/deepmind-media/gemini-robotics/Gemini-Robotics-1-5-Tech-Report.pdf

VI. Conclusion: the critical path to physical AGI

The Gemini Robotics 1.5 family represents a significant push toward unlocking general-purpose robotics. The integration of the highly specialized Gemini Robotics-ER 1.5 orchestrator and the multi-embodiment Gemini Robotics 1.5 executor validates an important design philosophy: reliable physical agents require the combination of high-level, generalized embodied reasoning with robust low-level control.

By pioneering the Thinking VLA for superior task decomposition and error recovery, and implementing the Motion Transfer mechanism to accelerate learning across platforms like ALOHA, Franka, and Apollo, this work systematically addresses the fundamental challenges of generalization and data scarcity that have historically plagued the field.

The capabilities demonstrated — state-of-the-art embodied reasoning performance, rapid task adaptation with few demonstrations, and robust, multi-layered safety mechanisms grounded in semantic understanding — define the critical path toward deploying truly general and capable AI agents in the physical world.

References

Gemini-Robotics-Team et al. (2025). Gemini Robotics: Bringing AI into the Physical World. arXiv preprint arXiv:2503.20020

Gemini-Robotics-Team et al. (2025). Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer. Technical Report, Google DeepMind

Sermanet, P., et al. (2025). Generating Robot Constitutions & Benchmarks for Semantic Safety. Conference on Robot Learning (CoRL) 2025

Gemini-Team et al. (2023). Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805

You can follow me on:

The Caffeinated Engineer — Personal notes on engineering, machine learning, and software design — drawn from the field, not the whiteboard.
LinkedIn

How the Gemini Robotics family translates foundational intelligence into physical action was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

The work you do in the dark

Alessandro Lamberti — Fri, 12 Sep 2025 14:53:18 GMT

The original newsletter issue can be found at https://newsletter.caffeinatedengineer.dev/p/the-inevitable-chaos-embracing-failure

The worst advice we give young people is, “Do what you love, and you’ll never work a day in your life.” It’s a beautiful, seductive lie. It suggests that the right path is a frictionless glide, and that if you feel resistance, you must be on the wrong one.

This is perhaps the most damaging myth of modern work. It’s the source of the anxiety that plagues people in their twenties, the feeling that they are perpetually off-track because their job, even one they chose, sometimes feels like… well, work. The myth suggests that passion is a magical substance you either find or you don’t, and when you find it, it provides a perpetual-motion machine of motivation.

The reality, as anyone who has ever built anything of value knows, is that everything worth having lives on the other side of effort. A good relationship isn’t a discovery; it’s a construction. It requires tending. Artistry isn’t a gift; it’s the result of a thousand frustrating practice sessions. Even deep friendships demand maintenance and the occasional uncomfortable conversation.

We’ve mistaken motivation for discipline. Motivation is weather: changeable, unpredictable, often absent when you need it most. You can’t build a life on it. Discipline is climate: the steady, reliable conditions you create for yourself regardless of how you feel on any given day. The most prolific writers don’t write when they’re inspired; they write until they’re inspired. The most successful engineers don’t solve problems when they feel brilliant; they sit with the problem, patiently, methodically, until a solution reveals itself. They show up.

This isn’t to say work should be a joyless slog. That’s the other side of the same bad coin. The goal isn’t to find work that is effortless, but to find a struggle you can fall in love with. The right kind of work isn’t suffering; it’s building. It’s the kind of difficulty that, when you push against it, pushes back and makes you stronger.

Lately, a new piece of advice has joined the pantheon of well-meaning but dangerous ideas: “Protect your peace.” On its surface, it’s sensible. But in practice, it has made a generation allergic to necessary friction. True peace isn’t the absence of problems; it’s the presence of a purpose that makes problems worth solving. The happiest, most engaged people aren’t those who have eliminated all difficulty from their lives. They are the ones who have found difficulty worth enduring.

In the 1980s, scientists built a self-contained ecosystem called Biosphere 2. Inside, they grew trees. But they noticed something strange: the trees grew quickly, but they would collapse under their own weight before reaching maturity. They had forgotten to include wind. Without the stress of wind, the trees never developed the “stress wood” that gives them strength and resilience. They were weak because they had never been challenged. We are becoming Biosphere 2 trees.

So if the goal isn’t to find an effortless passion, what is it? It’s to find enjoyment. And enjoyment is not the same as pleasure.

Pleasure is the feeling you get from a good meal, a warm bath, or watching a movie. It’s restorative. It brings the self back to a state of equilibrium. But it doesn’t create growth. Enjoyment, on the other hand, is what happens when you push yourself beyond your limits. As Mihaly Csikszentmihalyi described it, enjoyment is characterized by “forward movement: by a sense of novelty, of accomplishment.” It’s the feeling of stretching your capabilities, of achieving something unexpected.

This is the state Csikszentmihalyi called “flow.” You’ve almost certainly felt it. It’s that state of total absorption where you are so involved in an activity that nothing else seems to matter. Your sense of self dissolves. Time warps, hours feeling like minutes. The experience is so enjoyable that you do it for its own sake, not for some external reward.

Flow has specific preconditions. It happens at the boundary of your abilities, where a high challenge meets an adequate skill level. There have to be clear goals and immediate feedback, so you can adjust your performance in real time. A surgeon performing a complex operation experiences flow. A rock climber navigating a difficult face experiences flow. But so does a welder finding the perfect seam, or a farmer learning the rhythms of her land and animals.

The examples don’t have to be glamorous. Csikszentmihalyi studied an assembly-line worker named Joe who transformed his monotonous job into a complex mental game of trying to beat his own records. He found flow. He studied Serafina, an elderly peasant in the Italian Alps who found flow in tending to her cows and making cheese, a craft that required a deep, almost mystical understanding of her environment.

The strange paradox is that people report experiencing flow far more often at work than during leisure. At work, goals are usually clear and challenges are abundant. In our free time, we often resort to passive, low-skill, low-challenge activities like scrolling social media or watching TV. We are more likely to be in a state of apathetic boredom on the couch on a Sunday afternoon than at our desks on a Tuesday morning. Yet, we culturally frame work as the burden and leisure as the prize. We wish we were on the couch. This reveals a profound disconnect between what we think makes us happy and what actually does.

The real reward of flow isn’t just the feeling itself. It’s what happens after. Following a flow experience, Csikszentmihalyi writes, “the organization of the self is more complex than it had been before.” You grow. You become more capable, more differentiated. You integrate new skills and ideas into your identity. This is how you build an “autotelic personality”-the ability to create enjoyment and find intrinsic rewards regardless of the external conditions. It’s the psychological equivalent of stress wood.

This kind of deep, immersive engagement used to be the default mode for serious work and learning. It required focus, patience, and the ability to tolerate the initial discomfort of not knowing. It required a quiet mind.

That is a state that is becoming increasingly alien to us.

The philosopher Marshall McLuhan famously said, “The medium is the message.” The content of what we consume matters, but the medium through which we consume it matters more, because it fundamentally shapes how we think. And the medium of our age, the internet, is actively reshaping our brains.

Nicholas Carr, in his book The Shallows, described an uncomfortable feeling that many of us recognize: “someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory.” He found, as many of us have, that deep reading-the kind of sustained, linear concentration a book demands-had become a struggle. His brain wanted to jump around, to click, to skim. He had gone from being a “scuba diver in the sea of words” to a “Jet Skier along the surface.”

This isn’t just a feeling; it’s a cognitive reality. Our working memory, the scratchpad of consciousness, is notoriously small. It can only hold two to four pieces of new information at a time. To move that information into long-term memory and build the rich, interconnected schemas that constitute true knowledge, we need to focus. We need to rehearse the information, turn it over, connect it to what we already know.

The internet, by design, overwhelms this process. It presents a “swiftly moving stream of particles,” a relentless barrage of notifications, hyperlinks, and competing stimuli. This creates an enormous “cognitive load.” We become so busy managing the firehose of information that we have no mental resources left for the deep processing required for retention and comprehension. We become mindless consumers of data, not thoughtful synthesizers of knowledge.

Psychologist Daniel Kahneman explains this through the lens of our two modes of thinking: System 1, which is fast, automatic, and intuitive; and System 2, which is slow, deliberate, and effortful. System 2 is powerful, but it’s also lazy. It will gladly let the impulsive System 1 run the show to conserve energy. The internet is a playground for System 1. It thrives on cognitive ease, rewarding quick, superficial judgments and punishing sustained, difficult thought.

This leads to a dangerous cognitive bias Kahneman calls “What You See Is All There Is” (WYSIATI). Our minds construct the most coherent story possible from the limited information available, without stopping to consider what information might be missing. We see a headline, a 280-character hot take, a 30-second video, and our System 1 confidently forms a complete narrative. We develop an illusion of understanding based on a dangerously incomplete picture. This doesn’t just make us more prone to error; it makes us less interesting. An interesting person has depth. They have a mind populated with rich, nuanced, and interconnected models of the world. WYSIATI creates minds that are wide but shallow, full of disconnected facts and unexamined opinions.

We compound this problem by actively outsourcing our memory. The argument goes that by offloading data to the cloud, we free our minds for more creative tasks. But memory isn’t just a filing cabinet for facts. It is the very fabric of our intelligence. The knowledge stored in our own long-term memory is what allows for inductive analysis, critical thinking, and imagination. You can’t have a new idea if your mind is empty. When we rely on Google as an external hard drive, we aren’t just storing information; we’re preventing our brains from building the very structures of thought. We risk, as Carr puts it, “emptying our minds of their riches.”

The brain’s neuroplasticity is a double-edged sword. The same adaptability that allows us to learn new skills also means that our brains are being physically rewired to favor the shallow mode. We are becoming better at skimming and multitasking, and worse at concentrating and contemplating. The playwright Richard Foreman described the unsettling result: we are turning into “pancake people-spread wide and thin as we connect with that vast network of information.”

These two forces-the cultural fantasy of effortless work and the technological reality of shallow thinking-are locked in a vicious feedback loop.

A mind conditioned for the constant, low-grade dopamine hits of the digital stream becomes less tolerant of the patient, often frustrating work required for flow. If we can get a facsimile of accomplishment by clearing our inbox or scrolling through a feed, why would we endure the hours of struggle it takes to truly master a skill or understand a complex problem? The culture of shallow consumption reinforces the myth of effortless passion.

Conversely, a belief that work should feel easy makes us prime targets for the internet’s distractions. The moment we hit a difficult patch in our work-a bug in the code, a tricky paragraph to write-our brain, trained by the passion myth, interprets this friction as a sign we’re on the wrong path. It seeks an escape. And the escape is always one click away, offering the cognitive ease and superficial stimulation our rewired brains now crave.

The result is a hollowing out. We become less competent because we avoid the deep practice that builds real skill. And we become less interesting because our inner world, built on a foundation of disconnected snippets and outsourced memories, lacks complexity and depth. This may even affect our capacity for emotion. The subtlest and most distinctively human forms of empathy and compassion require sustained attention and deep reflection-the very mental habits we are losing.

So what is the truth we should tell young people? What is the antidote to this cycle?

It begins with redefining our relationship with work and effort.

The solution is not to smash our devices and retreat to the woods. The solution is intentionality. The final frontier of personal freedom is the command over your own attention. We have to consciously choose to do hard things. We have to carve out and fiercely protect blocks of time for deep, uninterrupted focus. We have to choose the book over the browser, the complex problem over the easy distraction.

This means embracing the initial phase of discomfort. It means recognizing that the feeling of struggle isn’t a sign to stop; it’s the sign that you are on the verge of growth. It is the wind shaking the tree.

The reward for this deliberate effort isn’t just better work. The reward is a better self. It’s the quiet satisfaction of mastery, the joy of a mind that can make its own connections, and the richness of a life lived with purpose. It’s the difference between being a passive consumer of the world and an active builder within it. True fulfillment doesn’t come from avoiding the struggle, but from choosing a struggle that serves something larger than yourself. That is the work that makes you not just more competent, but more human. And in the end, that is far more interesting.

You can follow me on:

The Caffeinated Engineer — Personal notes on engineering, machine learning, and software design — drawn from the field, not the whiteboard.
LinkedIn

The work you do in the dark was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Inevitable Chaos: Embracing Failure for Resilient Distributed Systems

Alessandro Lamberti — Sun, 31 Aug 2025 09:33:46 GMT

Order vs Interconnected Chaos — AI generated

The original newsletter issue can be found at https://newsletter.caffeinatedengineer.dev/p/the-inevitable-chaos-embracing-failure

Engineers, by their very nature, are optimists. They are trained to build, to solve, to perfect. From the first bridge to the latest microchip, the implicit goal has always been to eliminate failure. In civil engineering, this makes sense: a bridge that fails is a catastrophe, a lesson etched in concrete and lives lost. The discipline evolves by making structures stronger, margins wider, tolerances tighter. Perfection, or at least its relentless pursuit, is a necessary creed.

But what if this very optimism, this drive for flawlessness, becomes a critical vulnerability? In the interconnected world of distributed systems, this is precisely the case. Here, perfection is not merely elusive; it’s a dangerous fantasy. These systems are not monolithic structures of steel and stone. They are intricate webs built from fallible networks, unreliable processes, and constantly shifting, unpredictable dependencies. In this environment, failure isn’t an anomaly to be stamped out. To pretend otherwise isn’t just naive; it’s a direct path to fragility.

The foundational assumptions that once underpinned system design — “the network is reliable,” “latency is zero,” “bandwidth is infinite,” “topology doesn’t change,” “machines never fail” — have, by now, been disproven so often they’ve become industry punchlines. Yet, a ghost of this optimistic worldview lingers, leading engineers to design as if these fictions were facts. The result? Brittle systems, meticulously crafted but destined to shatter. The fundamental question we must confront is no longer “How do we prevent failure?” but rather how do we live with it.

Order vs Interconnected Chaos — AI generated

David D. Woods, a luminary in resilience engineering, provides a crucial framework, articulating resilience through four distinct qualities: robustness, rebound, graceful extensibility, and sustained adaptability. Traditional engineering, fixated on preventing failure, has historically obsessed over robustness — the ability to withstand shocks. But distributed systems, by their very nature, demand an equal, if not greater, emphasis on the other three. Resilience isn’t just about enduring; it’s about the rapid recovery (rebound), the capacity to stretch and adapt under unanticipated stress without snapping (graceful extensibility), and the continuous evolution in response to new surprises (sustained adaptability).

This profound shift in mindset is the crucible from which powerful techniques like Chaos Monkey emerge. Netflix’s infamous chaos engineering tool, which deliberately terminates production servers, appears, on the surface, to be an act of corporate self-sabotage. But this perspective only holds if you cling to the illusion of perpetual uptime. Once you accept the undeniable truth — that those servers will die eventually, whether by your hand or by fate — the logic becomes clear. The only remaining question is whether you will be ready. Chaos engineering isn’t a juvenile exercise in breaking things for the sake of it; it’s a training regimen for both systems and the human teams that manage them, preparing them to expect, confront, and overcome the unexpected.

How Systems Learn to Live With Failure: A Technical Breakdown

To truly “live with failure,” we must re-architect our systems with a pessimistic, fault-tolerant mindset. This involves weaving specific patterns and practices into the very fabric of our distributed designs, transforming potential points of collapse into mechanisms of resilience.

Fault Tolerance Basics: Understanding the Enemy

Before we can build resilient systems, we must precisely define what we are resisting. It’s crucial to distinguish between faults and failures. A fault is an imperfection or defect within a system (e.g., a network cable gets unplugged, a server runs out of memory). A failure is the observable manifestation of that fault, where the system deviates from its expected behavior (e.g., a service becomes unavailable, data is corrupted). Our goal isn’t necessarily to eliminate every fault — an impossible task in a large distributed system — but to design fault-tolerance mechanisms that prevent faults from escalating into full-blown failures.

Consider the five classic classes of failures in Remote Procedure Call (RPC) systems, which are foundational to distributed communication:

Client unable to locate server: the service discovery mechanism fails, or the server simply isn’t there.
Lost messages: network congestion, hardware errors, or routing issues prevent request or response packets from reaching their destination.
Server crashes: the process or machine hosting the service unexpectedly terminates.
Lost replies: the server processes the request but its response is lost on the way back to the client.
Client crashes: The client itself fails before it can process the server’s response or retry.

Each of these scenarios, seemingly simple, can cascade into wider system collapse without careful design.

Stability Patterns

Building resilience requires a deliberate application of battle-tested patterns:

Time-outs: in a distributed system, a slow service can often be worse than a completely broken one. A service that hangs indefinitely consumes valuable resources (threads, memory, network connections) on the calling client, potentially leading to resource exhaustion and cascading failures. Timeouts ensure that clients don’t wait forever, freeing up resources and allowing them to fail fast. They draw a line in the sand: if a response isn’t received within X milliseconds, assume failure and move on. This prevents a single, ailing dependency from dragging down an entire application.
Retries and Exponential Backoff: when a transient fault occurs (e.g., a momentary network glitch, a database deadlock), simply trying the operation again often succeeds. However, naive retries can be disastrous. Rapid-fire retries for an overloaded or failing service can create a “thundering herd” problem, exacerbating the load and preventing recovery. This is where exponential backoff becomes critical: gradually increasing the delay between retry attempts. This gives the struggling service time to recover and prevents the retrying clients from overwhelming it further. Crucially, operations designed for retries must be idempotent — meaning performing them multiple times has the same effect as performing them once. Sending the same email twice is not idempotent; re-saving a user’s profile might be.
Circuit Breakers: imagine a fuse box in your home. When a fault occurs, the circuit breaker “trips,” cutting off power to prevent further damage. Circuit breakers in software operate on a similar principle. They monitor calls to a dependency. If a certain number or percentage of calls fail within a configured timeframe, the circuit “trips” open. For a period, all subsequent calls to that dependency are immediately rejected without even attempting to reach the downstream service. This prevents further load on an already struggling service, allowing it to recover, and protects the calling service from wasting resources on doomed requests. After a set “half-open” period, the circuit allows a small number of test requests through. If these succeed, the circuit closes; if they fail, it re-opens.

src. https://martinfowler.com/bliki/CircuitBreaker.html

Bulkheads: inspired by ship construction, where watertight compartments prevent a breach in one section from sinking the entire vessel. In software, bulkheads isolate failures by partitioning resources. For example, using separate connection pools for different downstream services ensures that a flood of requests or a hung connection to one service doesn’t exhaust the pool and starve other, healthy services. This can also apply to thread pools, queues, or even separate instances of a microservice, ensuring that the failure of one component doesn’t bring down the entire application.

src. AI generated

Load Shedding: there comes a point when a system is simply overwhelmed. Rather than struggling to process every request poorly, or crashing outright, load shedding (also known as rate limiting or throttling) allows a system to gracefully reject requests. This might involve returning specific error codes, queueing requests, or simply dropping them. The goal is to protect the core functionality and prevent catastrophic collapse, even if it means some users experience degraded service or temporary unavailability. It’s a pragmatic acceptance that survival sometimes means triage.

These patterns are not patches; they are architectural choices rooted in a pessimistic realism. They operate on the assumption that every remote call might fail, every network might glitch, every resource might vanish, and every client might misbehave. And by assuming the worst, they equip our systems to be profoundly resilient when the worst inevitably materializes.

Practicing Failure: The Art of Chaos Engineering

Theoretical resilience is an oxymoron. Resilience, like any muscle, must be exercised. This is where Chaos Engineering enters the scene, evolving from the initial concept of Netflix’s Chaos Monkey into a mature discipline. Its premise is simple: if you don’t deliberately break your system, it will break on its own terms, likely at the most inconvenient time.

src. AI generated

Chaos Engineering is about systematically injecting faults into production environments to validate resilience mechanisms and, crucially, to train teams.

Hypothesize: define a steady state for your system (e.g., “users should be able to add items to their cart”).
Experiment: introduce a controlled fault (e.g., “take down a specific instance of the inventory service”).
Observe: monitor the system’s behavior. Did the system remain in a steady state? Did the resilience patterns (circuit breakers, fallbacks) kick in as expected?
Learn: if the system deviated from the steady state, understand why and implement fixes.

These experiments are often conducted during planned Game Days — dedicated events where teams simulate outages and practice their incident response. Injecting faults could involve:

Killing servers/processes: directly terminating instances of services.
Causing traffic spikes: overloading services with synthetic load.
Slowing responses: introducing artificial latency into network calls.
Resource exhaustion: depleting CPU, memory, or disk space.
Network partitioning: isolating parts of the network to simulate outages.

The objective of Chaos Engineering is not to achieve “uptime at any cost” but to build confidence. Confidence that when failures inevitably occur, both the automated systems and the human operators behind them possess the knowledge, tools, and muscle memory to respond effectively.

Graceful Degradation: The Art of the Less-Than-Perfect

True resilience also demands a commitment to graceful degradation. A system cannot always be at 100% functionality. When critical dependencies are unavailable, the intelligent system doesn’t simply crash; it offers alternative, reduced functionality. This is about prioritizing core user journeys and acknowledging that a partially functioning system is infinitely superior to a completely dead one.

Fallback strategies include:

Serving cached or static content: if a real-time data source is down, display the last known good data or generic content rather than an error page.
Switching to reduced functionality: an e-commerce site might allow browsing products but disable adding to cart if the inventory service is unavailable, or switch to a read-only mode if the primary database is experiencing issues.
Communicating transparently: rather than ambiguous “server error” messages, inform users what’s happening and what functionality might be affected.

Observability’s Role: Seeing in the Dark

None of these resilience mechanisms function effectively in a black box. Observability is a non-negotiable prerequisite for building, validating, and operating resilient distributed systems. When chaos inevitably strikes, detailed insights into system behavior are the only way to diagnose, understand, and rectify issues.

The pillars of observability — logs, metrics, and distributed traces:

Logs: provide discrete, timestamped events. They tell you what happened at a specific point in time (e.g., “Circuit breaker tripped for payment service,” “Retry attempt #3 initiated”).
Metrics: aggregate numerical data over time. They tell you how much or how often something is happening (e.g., “Error rate for service X,” “Latency of database queries,” “Number of open circuit breakers”). Metrics are crucial for identifying trends and detecting anomalies.
Distributed Traces: visualize the flow of a single request across multiple services. They tell you where a request spent its time, which services it called, and where it failed. This is invaluable for pinpointing bottlenecks and cascading failures in complex microservice architectures.

Without robust observability, resilience patterns are just theoretical constructs. You won’t know if your timeouts are firing, if your retries are creating a thundering herd, or if your circuit breakers are effectively protecting downstream services. Observability provides the feedback loop essential for continuous improvement and the hard data needed for post-incident analysis.

The Cultural Layer: Beyond the Code

Ultimately, resilience is profoundly cultural. The most robust technical patterns will crumble under a dysfunctional team dynamic. Teams that resort to individual blame after outages learn nothing. Instead, they foster fear and inhibit the sharing of critical information.

The hallmark of a resilient culture is the blameless post-mortem. This practice shifts the focus from “who caused the failure?” to “what were the systemic factors that allowed this failure to occur, and how can we prevent similar incidents in the future?” By documenting assumptions, challenging existing mental models, and treating every failure as a rich source of data, teams create powerful feedback loops. This is where Woods’s fourth pillar, sustained adaptability, truly lives: not in lines of code, but in the collective learning and evolving practices of a high-performing engineering organization.

src. AI generated

Conclusion

The old engineering dream of eliminating failure, while noble in some domains, is not only inapplicable but actively harmful in distributed systems. Here, failure is not the enemy; fragility is. By embracing the inevitability of chaos, through the deliberate application of defensive patterns, the rigorous practice of chaos engineering, the thoughtful design for graceful degradation, the presence of observability, and the cultivation of a resilient culture, we transform chaos from a threat into a teacher.

True resilience is not about constructing systems that never fail. It is about building systems — and, more importantly, the teams that operate them, that emerge stronger, wiser, and more capable every single time they do.

You can follow me on:

The Caffeinated Engineer — Personal notes on engineering, machine learning, and software design — drawn from the field, not the whiteboard.
LinkedIn

The Inevitable Chaos: Embracing Failure for Resilient Distributed Systems was originally published in Data Science Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.

Stop Copying — Start Building Reusable Knowledge

Alessandro Lamberti — Sat, 12 Jul 2025 10:14:05 GMT

Stop Copying — Start Building Reusable Knowledge

This article was born as a letter from https://newsletter.caffeinatedengineer.dev.

Hey, A.

I know the loop too well. Half a dozen Colab notebooks open, each walking you through yet another “state-of-the-art” model. A new framework tutorial bookmarked. Today it’s LangChain, last week it was BentoML. Another architecture diagram promising to “simplify production ML.” It feels like learning, but more often it’s inertia disguised as progress.

Each new tab gives you a quick dopamine hit. A clean example, a pretrained model, an end-to-end pipeline that runs — until it doesn’t. Until you need to adapt it. Until the shape of your real-world problem no longer fits the neat assumptions of the demo.

That’s when it becomes clear: you didn’t learn the system. You borrowed it. You replicated someone else’s solution without internalizing the tradeoffs, the context, or the constraints it was designed for.

The truth is: copying from tutorials is not the same as learning. At best, you’re borrowing someone else’s context. At worst, you’re training yourself to assemble systems you don’t understand.

Real knowledge — the kind that stays with you and grows over time — doesn’t come from passively following along. It comes from building, breaking, debugging, and asking hard questions. From getting stuck and learning why. That’s what I mean by reusable knowledge: insight you’ve earned through friction, not just absorbed through repetition.

A while ago, I was working on a vision pipeline for an edge device with limited memory. I followed the usual path: search the docs, skim some blog posts, run a few example scripts. The pipeline worked — barely — but I couldn’t explain how. It felt fragile. Every time it failed, I had to start over, guessing at causes.

So I paused. Stripped things down to basics. Measured memory usage, traced latency, re-implemented small pieces. Slowly, the whole system became clear. I stopped depending on tutorials because I had built a mental model that made sense — one I could use again in other projects. That’s the point. Knowledge that sticks isn’t tied to a single problem. It becomes part of how you think.

Reusable knowledge has three key traits:

It helps you understand systems, not just tools.
It applies across projects and domains.
It gives you confidence in the face of complexity.

Copying teaches you none of that. It’s like learning to cook by watching someone else stir the pot. You might know the steps, but not the reasons behind them.

What’s worse: tutorials can give a false sense of competence. You feel like you’ve learned something, but when it’s time to build from scratch, the gaps show up fast. And that’s the real cost: time spent re-learning instead of building forward.

So what should you do instead?

Go deeper, not wider. Pick one problem and work through it until you really understand what’s going on. Write things down. Break them apart and rebuild. Explain what you learned, even just to yourself. If you still need a tutorial, use it as a reference, not a guide.

Learning isn’t about covering more ground. It’s about building stronger foundations.
You’ll get there. You already did.

-Yours, a few years down the road.
The version of you that finally stopped copying, and started building for real.

Originally published at https://newsletter.caffeinatedengineer.dev.

Mental Models of Great Engineers — Focus, Friction, Feedback

Alessandro Lamberti — Sat, 05 Jul 2025 08:44:54 GMT

Mental Models of Great Engineers — Focus, Friction, Feedback

This article was born as an essay from https://newsletter.caffeinatedengineer.dev/

There’s a kind of engineering mind you encounter rarely. Not necessarily the loudest, nor always the fastest to answer. But when they speak, everything slows down. You feel less fog, more structure. Their words feel inevitable — like they’ve seen around a corner you didn’t know existed.

What distinguishes these engineers — the senior ones in spirit, not just in title — isn’t a fixed set of knowledge, tools, or even experience in years. It’s how they see. The lens they use to model the complexity of systems, tradeoffs, and people. If you could look inside their head, you’d find three dominant forces shaping their mental architecture: focus, friction and feedback.

These are not vague virtues. They are constructs. Lenses. Each enables a kind of clarity that accumulates and compounds over time. Together, they form the cognitive foundation of engineers who can both build robust systems and reason clearly under pressure.

Let’s dissect each.

I. Focus: The Physics of Attention

“ The skill of deep work is becoming rare at exactly the same time it is becoming more valuable. “ — Cal Newport

The Scarcity of Depth

We begin with focus, because it governs everything that follows. Without focus, there is no attention. Without attention, there is no modeling. Without modeling, there is no clarity.

Cal Newport calls this deep work — the ability to work deeply on hard problems, while resisting distraction. But in real engineering environments, this isn’t just a productivity technique. It’s survival logic. Systems thinking demands stack-depth. You must trace behaviors across abstraction layers — from process scheduling to API guarantees to team incentives. You can’t do this between meetings or in 12-minute pomodoros.

Senior engineers protect cognitive continuity. They architect their days, communication habits, and toolchains to enable extended states of reasoning. This isn’t hustle culture or monk-mode extremism — it’s a systemic reaction to the complexity gradient. The deeper you go into a problem, the more expensive context-switching becomes.

They also have an internal radar for signal. Ask a junior developer to describe a bug, and you get a wall of logs. Ask a senior, and you get a model: “This seems like a distributed lock starvation issue — I suspect contention is spiking in the leader election code.” Focus reveals itself as selectivity — the ability to suppress noise and home in on what matters.

Paul Graham wrote that great hackers are able to “tune out everything outside their own heads”. But I think it’s more precise to say they have an appetite for epistemic solitude — a state where ambiguity is metabolized in peace, without the clutter of cheap opinions. Focus gives them the bandwidth to build models, not just solutions.

Their bandwidth is finite — and they treat it as capital, not charity.

Working Memory, Mental Caching, and State

Cognitively, focus is bounded by working memory. You cannot hold more than a few layers of abstraction in your head without degrading your judgment. Great engineers know this, and so they architect both code and team environments to preserve mental state. They favor:

Stateless tooling: tools that don’t leak state between runs.
Defensive architecture: systems that fail loudly and early instead of rotting silently.
Interrupt-resilient workflows: think commit discipline, modular branches, codified deployment paths.

In a world where “10x engineering” is largely a myth, clarity retention across sessions becomes the real multiplier.

II. Friction: The Feel for Resistance

Friction is not the enemy. It’s where the system reveals its structure.

Most Engineers Fight Friction; Great Ones Listen to It

Most engineering organizations think about velocity. Great engineers think about friction.

Friction is the felt resistance between intent and outcome. It’s the drag coefficient in the system — both in code and in process. You try to build X, but spend 70% of your time wrestling with Y. You attempt to ship a fix, but the CI pipeline silently fails for 15 minutes. You try to coordinate with two teams and realize they both use different definitions of “done.”

Where junior engineers feel frustration, great engineers detect texture. They learn to sense structural resistance. They know when an abstraction leaks too often. When a codebase punishes exploration. When an interface is semantically brittle, even if the tests pass. This friction is not a bug — it’s a signal.

A standout trait among senior engineers is how quickly they stop blaming themselves when things “feel wrong.” Instead, they probe: Why does this workflow create cognitive dead-ends? Why is this bug so hard to isolate? Often, the answer lies not in one line of code, but in a design misfit — a place where assumptions silently diverged from reality.

There’s a passage in Eliezer Yudkowsky’s writing on rationality where he describes “noticing confusion.” Most people experience confusion as discomfort and move on. A rationalist treats it like a fire alarm. Senior engineers operate the same way: friction is not something to tolerate — it’s something to model.

One example: in distributed systems, retry logic often hides failure modes — the system appears “resilient,” but in reality, it’s just noisy-silent. Great engineers develop a taste for invisible friction: systems that “mostly work” until they don’t. They know that debuggability is not an afterthought — it’s a first-class design constraint.

Imagine a payments microservice that’s become the bottleneck for a multi-product company. Every new product line wants to hook into it. Suddenly, latency balloons, on-call burns out, and cross-team PRs become a negotiation minefield.

An average engineer might start optimizing queries.
A good one might suggest sharding by tenant or product.
A great engineer also asks: Why did this boundary absorb so many responsibilities in the first place?

They go upstream:

This engineer isn’t just fixing the bottleneck.

III. Feedback: Epistemic Humility in Action

If you can’t tell when you’re wrong, you’ll keep getting better at being wrong.

Software is a Belief System Under Test

No model is perfect. But some are calibrated. That’s where feedback comes in.

Engineering is applied epistemology. You’re making bets on how a system will behave under real-world constraints — load, failure, misuse, entropy. And like any map, your internal model must be regularly updated with reality checks. Great engineers have a tight “feedback loop hygiene”. They seek out deltas between belief and behavior.

Perell talks about the concept of idea sex — the combinatorial creativity that comes from crossing domains. But feedback is how ideas meet resistance, and thus, reality. A tight feedback loop is what turns intuition into informed intuition.

Great engineers don’t just ship and forget. They instrument, observe, and revisit. Not because they don’t trust their work — but because they do trust their curiosity. Feedback enables something subtle: regret minimization. When a decision proves wrong, they want to understand why — so the next model has fewer blind spots.

They also build systems with explainability in mind. Not AI explainability in the fashionable sense, but causal explainability — being able to answer: Why did this behave this way? Feedback isn’t just external (metrics, bugs, failures), but also internal: the system gives off affordances that make it intelligible to future readers.

This reflects a deep shift in mindset: from output to iteration. From “Did it work?” to “How does it evolve?” Feedback makes the system legible to itself.

This shows up as:

Writing postmortems that critique thinking patterns, not just root causes.
Building feedback-rich tools: tests that cover failure modes, dashboards that narrate system health.
Favoring instrumentation over guesswork — not just metrics, but diagnostic observability.

IV. Organizational Inheritance: Scaling These Models

While individual engineers can internalize these mental models, the real leverage comes when teams and orgs absorb them. That means:

Creating onboarding that teaches reasoning patterns, not just stack knowledge.
Promoting engineers who model clarity under ambiguity, not just throughput.
Codifying systems design reviews that reward epistemic humility, not architectural ego.

A team’s culture is downstream of what it optimizes attention for, what it treats as normal friction, and how it processes failure. Teams that model focus, friction, and feedback at the system level don’t just scale better — they decay slower.

Closing Thought: The Compass, Not the Map

When these three mental models are stacked — Focus → Friction → Feedback — something larger emerges: a self-improving system. A kind of internal DevOps loop for cognition.

Focus lets you perceive deeply. Friction lets you perceive honestly. Feedback lets you perceive accurately.

The best engineers I know aren’t infallible. They just recover faster.
They don’t guess better. They observe sooner.
They don’t over-architect. They zoom out just long enough to see what’s really going on — before it hurts.

And then they build from that place — grounded, systemic, and clear-eyed.

As you grow in your own practice, don’t just chase knowledge. Develop taste. Taste for what focus feels like when it clicks. Taste for friction that’s not accidental. Taste for feedback that sharpens, not flatters.

Because in the end, software engineering is not just about building things. It’s about building systems that hold up under pressure, uncertainty, and time. And that requires mental models that do the same.

Originally published at https://newsletter.caffeinatedengineer.dev.

Hey everyone — I’m writing again

Alessandro Lamberti — Mon, 30 Jun 2025 19:55:48 GMT

Hey everyone — I’m writing again

After a long hiatus, I’m back to publishing essays, breakdowns, and notes around the themes I live and breathe: systems, software, and the edges of machine learning.

The project’s called The Caffeinated Engineer — no fluff, no buzzwords. Just thoughtful writing for people building real things.

Essays & Breakdowns: deep dives, patterns, and hard-earned insights
Letters & Notes: short, reflective pieces — often conversations with fictional counterparts, like a senior engineer or curious peer

I’ve seen too many shallow takes dominate the space. This is my attempt to offer something grounded, useful, and independent. I’ll share what I’ve learned building in the real world — the kind of material I always wished I had early in my career.

Thanks to everyone who stuck around. If you’re into this kind of writing, give it a read. If you know someone who might appreciate it, pass it on.

See you soon.

Originally published at https://newsletter.caffeinatedengineer.dev.

Sound Bytes Part 1: The ABCs of Sound and Digitization

Alessandro Lamberti — Tue, 31 Oct 2023 08:28:05 GMT

Diving into the world of sound and how Deep Learning enhances our audio experience

Continue reading on Medium »

The Future is Self-Supervised: An Introduction to DINOv2

Alessandro Lamberti — Tue, 27 Jun 2023 09:18:58 GMT

Self-supervised learning is a type of machine learning where systems are trained to predict or solve tasks using only a raw dataset…

Continue reading on Medium »

Unveiling U-Net++: A Hands-On Guide on Image Segmentation

Alessandro Lamberti — Fri, 28 Apr 2023 12:13:16 GMT

Imagine looking at an image and being able to decipher distinct regions, each representing a unique object or area of interest.

Continue reading on Medium »

Going Deep: An Introduction to Depth Estimation with Fully Convolutional Residual Networks

Alessandro Lamberti — Mon, 27 Feb 2023 08:46:50 GMT

Have you ever looked at a two-dimensional image and wished you could know the depth of the objects in the scene?

Continue reading on Medium »