Stories by Kunal Sawarkar on Medium

Its the rise of other AI that all should worry about.. Artificial Intimacy!

Kunal Sawarkar — Sun, 24 Aug 2025 19:05:08 GMT

The real threat isn’t AI taking your job — it’s AI taking your partner…

Whenever I am speaking at an event, I am often asked about the AI impact on jobs. I’ve never truly believed AI is a job killer. In fact, I see it as a powerful booster, reshaping and elevating job profiles. Thanks to AI, people can truly realize their potential rather than having to slog hours in a low-end work. There will always be something for people to do — for those who want to do it. The real challenge now is: what if people no longer want to do anything? What if they lose the excitement for life itself… thanks to artificial intimacy? That is what came out of the recent discussion with Gen Z, and the discussion to “try AI as a partner” worried me!!

Credit- HBO

First question is why fall in love with AI?

Because it is so eloquent!! Words feel powerful, but doesn’t the tone & touch convey more than words ever could?

Often, language itself is a barrier to truly expressing deep emotion. Written or spoken language has only existed for a few thousand years, while humans have been evolving for over a million. The fact that someone articulates in words can convince you it’s a real relationship only highlights how fragile our sense of intimacy is. Words can express feelings, but only up to a point. They can’t express empathy or true presence. That comes through the eyes, the touch, the presence of another human. Just like true art, is the one that touches one deeply and which often leaves us speechless, a real relationship exists beyond words.

Can AI ever give you that?

If there isn’t even a single moment that one has experienced that goes beyond words, that’s a shame. Even animals form relationships without language — and often, language is more of a barrier to emotion than a conduit for it.

Humans are not made of Lego pieces!

Then there’s the problem of customization. You can walk into any store and customize your coffee the way you want… from beans… to sugar… to type of sugar… to milk… to additives… to cup… to straw and whatnot!! That creates a sense that “I can get anything the way I want it” and often builds a belief that — because I deserve the best, another human must be the way “I want it”!

And then the digital world feeds into this personalization mindset by taking it stratosphere, serving you the content exactly what you want at all times. But in real life, you can’t “customize” a partner. They are a monolith..as stones and plants in nature..moldable but customizable!

They simply are who they are, and from limited choices, you make a commitment — then work to make it last. That’s why AI intimacy feels so seductive. It gives you what you want with no effort, no compromise, and no hard work. That is just like a drug!! Temporary pleasure at bare minimum cost…

Intimacy is at the core of what makes us human!

And what is the true purpose of a relationship? Is it someone who always supports you, praises you, and tells you that you’re perfect? That kind of endless positive feedback doesn’t even work in real life. A real partner is a mirror: someone who helps you grow, tells you when you’re wrong, and still stands by you despite all the flaws.

As they say, infatuation is when you find another person perfect. Love is when you know the flaws and still choose the same person!

Therapy is different from partnership. A partner cares deeply enough to challenge you, not just comfort you. It is also a lack of real and meaningful growth over a long period of time! How can someone want a partner who they can lose with a poor internet connection or a low battery?

And that’s a real problem… many not having anyone to talk to… anyone to share emotions with… It’s a social isolation that’s a problem, and AI is just a drug to escape it or feel good momentarily… just like social media or Xbox One.

Loss of Innocence

And lastly, the innocence part of a relationship.

The most beautiful part in any relationship is discovering… not just a perfect person or imperfect ones!! Coz the dynamism, the curiosity, the commitment tested by challenges — that’s what makes relationships meaningful. The loss of losing one’s innocence to not another human but to a machine will create ever more unrealistic expectations from relationships, almost making AI an equivalent to emotional porn!

Super smart, super supportive AI can be very boring. All the growth in nature, as the theory of evolution has proven, is based on the need to evolve in response to a challenge; but when the challenge is gone from life… will this be the end of evolution in itself?

Not just a social but an existential threat

Despite all this, half of Gen Z who spoke with me still said: “Yeah, it’s true… but it feels like the writing is on the wall.” Meaning: AI may soon replace boyfriends or girlfriends. At the very least, relationships built the premise of the future, growth, and family are at risk. And those are the foundations of society and the economy. Without a real human connection, what even motivates consumption, creativity, or progress? AI may add a temporary economic boost, but if people don’t “desire another human,” the true purpose of a consumer-based economy will vanish. A more Japan-like situation where desire has vanished from society.

My view? Cocaine is better than an AI companion — at least you won’t be deluded into thinking you’re in a real relationship. Preferring machines over humans is simply a cop-out at best and hallucinations at worst! (pun intended)

Everything worthwhile in life is hard — and that’s exactly why it’s worth it.

Its the rise of other AI that all should worry about.. Artificial Intimacy! was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

What Training AI on Math Teaches Us About Hiring the Right People?

Kunal Sawarkar — Wed, 06 Aug 2025 21:32:05 GMT

AI models generalize well if it works well on math benchmarks and is key part of training process. As we train AI models on mathematical datasets — from arithmetic to algebra to competition-level reasoning — we’re not just teaching machines. We’re learning how humans should be evaluated, hired, and developed.

LLaMA Training on Math & Reasoning Tasks

LLaMA shows major improvement when fine-tuned on structured math datasets.
Meta researchers (and others who used open LLaMA weights) discovered that even base LLaMA models struggle with multi-step reasoning unless fine-tuned on:
• GSM8K: Grade school math problems requiring step-by-step logic.
• MATH Dataset: High school and competition-level math.
• Code and formal logic datasets: Help it reason and structure its answers better.

📌 Result: LLaMA models trained with these datasets learn to compose solutions, not just guess answers — performing far better than models trained solely on language tasks.

Here’s what we’ve learned:

Insight #1: Structured learning makes generalization possible

• A LLaMA model with training on math + logic generalizes better to other domains like code generation, reasoning tasks, and even legal document summarization.

• Similarly, a human with strong logical and analytical foundations can be reskilled across roles faster than someone trained narrowly in a single tool or domain.

Insight #2: Shallow pretraining limits deep reasoning

• LLaMA-2 or LLaMA-3 without any fine-tuning will often fail to reason, hallucinate, or shortcut answers.

• Analogously, a candidate with surface-level experience (e.g., only copy-pasting code or following instructions) underperforms in novel or ambiguous environments.

Insight #3: Chain-of-Thought (CoT) training enhances step-by-step reasoning

• When LLaMA models are trained to explain their steps (via CoT), performance jumps significantly.

• For hiring, this is like valuing candidates who explain their thought process, not just give the right answer. It reflects deep understanding.

Key Takeaways

Core Skills Enable Generalization
AI models trained on fundamentals solve new problems better than those trained only on task-specific data.

➡️ Hire for strong foundations — logic, reasoning, and structured thinking — not just narrow experience.

2. Memorization Doesn’t Scale

Models that “memorize” patterns fail when faced with variations.

➡️ Don’t hire based on resume keywords. Look for people who learn, not just those who have “seen it before.”

3. Compositional Thinking Wins

The best AI models solve problems by composing smaller building blocks of logic.

➡️ Hire people who can break down problems, recombine ideas, and adapt.

4. Diverse Training = Better Abstraction

Models trained on varied examples abstract better and generalize across domains.

➡️ Value diverse experience paths, not just linear ones.

The Big Insight:

Building great teams is like building great AI — train for depth, not just coverage.

Core reasoning scales. Superficial knowledge breaks.

Would love to hear from others building hiring practices or talent programs. What core skills do you screen for?

What is AI SuperIntelligence — and How Would We Know If We’ve Reached It?

Kunal Sawarkar — Sun, 03 Aug 2025 16:47:59 GMT

What is AI SuperIntelligence — and How Would We Know If We’ve Reached It?

And is Zuckerberg’s vision hype or the near future?

Superintelligence is the hypothetical point when AI becomes smarter than humans in every domain — not just memory or math, but reasoning, creativity, emotional intelligence, and even social manipulation. It’s not just an evolution of today’s tools — it’s a fundamental shift in who (or what) leads the frontier of intelligence on Earth.

How is SuperIntelligence defined?

The term comes primarily from philosopher Nick Bostrom, who described it in his book Superintelligence: Paths, Dangers, Strategies (2014). He defines superintelligence as:

“An intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills.”

Though the credit of making it popular and cool in recent times has to go to OpenAI chief scientist Ilya Sutskever, who started his company with the name “Safe SuperIntelligence” ..quite obviously to find a new term than AGI, which was sort of hijacked by OpenAI and Microsoft. The company is still in pre-product stage but got amazing $30B valuation.

Types of AI on the spectrum

To understand superintelligence, it helps to place it on the AI capability spectrum:

Narrow AI (ANI) — What we have now (e.g., GPT-4, AlphaFold, self-driving software)
General AI (AGI) — Human-level intelligence across tasks
Superintelligence (ASI) — Beyond human in all cognitive tasks

But how would we even recognize it?

We know we’ve reached it when an AI consistently outperforms humans in all intellectual tasks, demonstrates general problem-solving ability, and exhibits self-improvement without human intervention, with measurable outcomes like solving previously unsolvable problems or creating novel advancements independently.

It might look like:

AI solving problems we can’t even articulate (like reconciling quantum gravity and general relativity),
Creating its own scientific frameworks,
Optimizing economies, policies, or infrastructure with godlike foresight,
Or improving itself recursively — learning how to learn better, on its own.

There might not be a single “Eureka” moment. The signs could be subtle — or intentionally hidden. That’s part of the risk.

How Close Are We?

We’re still in the narrow AI and early proto-AGI era. Today’s most advanced models (like GPT-4, Claude, Gemini) show impressive capabilities — but they lack persistent reasoning, autonomous goal-setting, or deep situational awareness.

What still stands in the way?

Memory & agency: Current models don’t “think” in continuous time — they react in sessions.
Grounded understanding: They’re brilliant mimics, but often guess instead of know.
Alignment & safety: Even if we could build it, how do we make sure it shares our goals?

Building ASI needs to overcome challenges like

Compute Power: Superintelligence needs vastly more processing than today’s systems, requiring breakthroughs in hardware efficiency.
Algorithms: We need new paradigms for AI to learn broadly and reason like humans, not just mimic patterns.
Energy Costs: Scaling AI demands unsustainable energy unless greener solutions emerge.
Safety and Ethics: Uncontrolled superintelligence could pose risks, requiring robust safeguards.

Zuckerberg’s Vision: Empowerment or Hype?

Mark Zuckerberg recently framed AI as a “universal assistant” — personal, embedded, always-on. It’s a compelling vision: AI not replacing you, but amplifying you.

Is it Realistic?

Yes — at the narrow level. I already speculated almost 2 years back, having an “AI Buddy” who manages all the software and tools i use daily is game game-changer. Though it turned out harder, it‘s i’s still technically doable. We’re already seeing AI copilots across design, coding, learning, and personal productivity. Meta’s Llama models are a play to own that personal layer.

But as we approach AGI (and eventually superintelligence), the conversation has to shift.

Zuckerberg envisions “personal superintelligence” via devices like smart glasses, empowering individuals to achieve goals and enhance daily life. It’s a compelling idea — AI as a tailored assistant — and Meta’s focus on open-source models like Llama and billions in R&D shows commitment, but their pivot to closed models suggests commercial control over altruism.

Will We Ever Get There?

I’m doubtful that, with current technology or our understanding of intelligence, we’ll get there.

We still don’t fully understand the brain or have a clear mathematical model of how it functions — or how electrical signals translate into thoughts. Human (or natural) intelligence has been shaped by survival. What would motivate a superintelligence?

I am also skeptical of true creativity using ASI that can solve problems like Riemann’s Hypothesis.

I’m skeptical of the commonly cited “self-preservation” instinct in AI — it doesn’t hold up under scrutiny.

It may be possible in the far future, but not now.

Superintelligence isn’t science fiction — it’s a design question. The sooner we get serious about alignment, the better shot we have at making it a feature of the future, not a bug. We’re inching toward superintelligence, but it’s not around the corner.

Zuckerberg’s vision blends possibility with self-serving spin At this point, it feels more like great marketing — and possibly a hijacking of the term “superintelligence” to suit commercial product goals. That’s not necessarily a bad thing. Any AI that meaningfully enhances how we interact in the digital world will feel like a superintelligence. But once disconnected, it won’t hold up.

Now, of course, the real question is: can today’s human live a life without a digital connection?

Follow Towards Generative AI for the latest advancements in AI.

What is AI SuperIntelligence — and How Would We Know If We’ve Reached It? was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

What is Context Engineering? And Why Is Everyone Talking About It?

Kunal Sawarkar — Sun, 03 Aug 2025 15:46:25 GMT

Here is a primer on understanding how it differs from prompt engineering

At a high level, Context Engineering is the systematic design and manipulation of everything you feed into an LLM to control its reasoning and output, without touching the model weights.

It is the LLM equivalent of programming, but instead of code + API, your inputs are natural language + structured context. It’s an improved version of prompt engineering but less than fine tuning.

Overview

CE includes:

Prompts (instructions, examples, few-shot demos)
Retrieval results (via RAG — Retrieval Augmented Generation)
Conversation history
Metadata and tool state (e.g., functions, documents, sensor data)
Agent memory and planning state

🧠 Let’s break down the technical pillars:

1.Prompt Design

Includes prompt templates, instruction tuning, and few-shot examples.
Now evolving into dynamic prompting where prompts are assembled based on user, task, and system state.

Examples:

You are a legal expert. Answer the following legal question step-by-step using citations from relevant documents.

This is the lowest-level part of context engineering but still foundational.

2. Retrieval-Augmented Generation (RAG)

Key innovation: enrich the context window with relevant external data.
Typical architecture:
User query → embedding → vector DB search → top-k docs → appended to prompt
Then → LLM generates answer using both user input + retrieved docs

Engineering challenges remains ( see our paper — https://arxiv.org/abs/2404.07220)

Chunking & encoding
Context window limits
Relevance scoring
Overlap/resolution of retrieved content

3. Context Composition

Now moving beyond static prompts to composable context stacks:
Conversation history
Search results
Current tool usage
Agent goals & plans
User profile/preference
Engineers decide what to include, exclude, highlight, rephrase, or abstract.

This is similar to designing the state representation in classical AI systems.

4. Memory and Long-Term Context

This has been the biggest challenge in working with the LLMs

LLMs have fixed token windows (e.g., 8K, 128K, etc.)
Long-term memory systems inject past conversations, facts, or embeddings relevant to the current task.
Context engineers decide how and when to inject memory, and what format (text vs. structured).

5. Tool & API Context

In agentic systems, the LLM sees tools (e.g., code interpreter, browser, calculator) as part of context.
You design how tool outputs are injected:
Tool traces
Intermediate steps
Function outputs as inline context

Example:

{

“tool”: “python”,

“input”: “plot a sine wave”,

“output”: “”

}

This becomes part of the next LLM step’s context.

🧱 Key Design Patterns

Chain-of-Thought + Tool Invocation: e.g., “Think step-by-step. If calculation is needed, use the calculator tool.”
Reflexion Loops: Let the model critique its previous answer and improve.
Role Prompting: “You are a financial analyst with 20 years of experience.”
Scratchpad Contexts: Store intermediate state as it works through
As models get smarter, the bottleneck moves to context quality and orchestration. That’s why context engineering is becoming one of the most in-demand skills in AI systems development.

Follow Towards Generative AI for more on the latest advancement in AI.

What is Context Engineering? And Why Is Everyone Talking About It? was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

What you need to know on Agentic Protocols MCP vs A2A, vs ACP

Kunal Sawarkar — Sun, 27 Apr 2025 18:46:19 GMT

There are so many agentic protocols now. But what is the difference between MCP vs A2A vs ACP, and when to use which one?

Here’s a concise bullet list comparing and contrasting the Model Context Protocol (MCP), Agent-to-Agent Protocol (A2A), and Agent Connect Protocol (ACP) in agentic AI, focusing on usability:

1. MCP from Anthropic

- Standardizes how AI agents access tools and data, providing context to LLMs.Simplifies integration with external tools (e.g., databases, APIs) via a client-server model, reducing custom coding needs.

- Reusable and secure connections enhance developer efficiency and scalability.

- Intuitive for developers familiar with LLM workflows, as it acts like a “universal adapter” for resources.

Challenges:

- Primarily focused on data access, not agent-to-agent communication, limiting its scope for collaborative tasks.

- Component naming (Hosts, Clients, Servers) can be confusing, potentially steepening the learning curve.

2. A2A (Agent-to-Agent Protocol) from Google

Enables seamless communication and collaboration between AI agents across vendors and platforms. Streamlines agent orchestration with clear task delegation and capability discovery (via Agent Cards), making multi-agent workflows intuitive.

- Supports dynamic, human-like interactions (e.g., bidirectional streaming), enhancing user and developer experience.

Challenges:

Requires developers to manage complex agent interactions, which can increase setup time for smaller projects.

Overlaps with MCP in some use cases, potentially causing confusion

3. ACP (Agent Connect Protocol) from IBM

Purpose: Facilitates an “internet of agents” by enabling interoperability among diverse AI agents, like LangChain.

Usability Strengths:

- Designed for cross-organizational agent networks, offering high interoperability for enterprise-scale systems.

- Simplifies integration of heterogeneous agents, reducing setup complexity in distributed environments.

Challenges:

Focus on broad interoperability may sacrifice depth in specific use cases, requiring additional customization.

Key Contrast:

MCP excels in tool integration for individual agents, prioritizing data access simplicity. Best for developers needing quick, secure tool integration.

A2A focuses on agent collaboration, offering robust communication for multi-agent systems. Best for collaboration.

ACP emphasizes broad interoperability, but its early stages compared to the more established MCP and A2A. Best for enterprises who want to take their Agentic AI Solution to production.

Follow Towards Generative AI for more content related to the latest advancement in AI.

What you need to know on Agentic Protocols MCP vs A2A, vs ACP was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Just How Did They Discover Life on an Alien Planet Far Far Away?

Kunal Sawarkar — Sun, 20 Apr 2025 17:45:47 GMT

The discovery of DMS on Alien Planet K2–18b has a lot to do with statistical modelling (or AI as companies want to call it now)! Here is a simplified version from the paper on how it was done.

In case you have missed we have detected a sign of alien life for the first time ever, on a planet 120 light years away. This first time we discovered anything like it, and I was more curious to know what exact process and algorithms were used to deduce this. Here is the summary of the published paper.

Unveiling Alien Atmospheres: How Scientists Detected DMS Gas on K2–18b

Comparison Between Two

Imagine peering into the atmosphere of a planet 124 light-years away and finding a clue that could hint at alien life. That’s exactly what scientists have done with K2–18b, an exoplanet where the James Webb Space Telescope (JWST) detected signs of dimethyl sulfide (DMS) — a molecule linked to life on Earth. In this blog post, we’ll dive into the cutting-edge data analysis and algorithms that made this discovery possible, breaking down the science behind this cosmic detective work.

The Basics of observing a distant planet

Source Nature magzine

We can't really “see” the distant planet light years away through the traditional telescope. So what we instead do is watch for the shadow and gravitational pull of the planet when it orbits around its star (or sun). It leaves a trail while passing through the star, and we can then observe the spectrum of the star.

The beauty of the spectrum is that every element and every molecule has unique spectral lines in spectral analysis, thus allowing us to detect what is in the atmosphere of the alien planet.

This method is so simple that even a 5th grader can do it. So now let's jump into how this discovery was done.

1. Capturing Starlight: The Power of Transit Spectroscopy

To study K2–18b, a sub-Neptune in the habitable zone of a red dwarf star, scientists used JWST’s (James Webb Telescope) advanced instruments:

In 2023 Observations, the Near-Infrared Imager and Slitless Spectrograph (NIRISS) and Near-Infrared Spectrograph (NIRSpec) captured spectra from 0.9 to 5.2 micrometers, revealing methane, carbon dioxide, and a tentative DMS signal.
So in 2025 Observations the Mid-Infrared Instrument (MIRI) targeted 6–12 micrometers, boosting the DMS signal

(Credit: NASA)

The technique, called transit spectroscopy, involves analyzing starlight as it passes through the planet’s atmosphere during a transit. Molecules like DMS absorb specific wavelengths, creating a spectral “fingerprint” that scientists can decode.

2. Cleaning Up the Data: From Raw to Refined

Raw JWST data is messy, filled with noise from the telescope’s detectors and cosmic rays. Here’s how scientists polished it:

Pipeline Processing: The JWST Science Calibration Pipeline removed noise, corrected for thermal background, and produced calibrated spectra.
Spectral Binning: Spectra were binned to improve signal-to-noise ratio, with resolutions of ~100 for NIRISS/NIRSpec and ~50–100 for MIRI.
Wavelength Calibration: Spectra were aligned with lab references for DMS, methane, and CO₂ to ensure accurate feature identification.

Raw (noisy) vs. calibrated (smoothed) spectra, showing the impact of data reduction. (Adapted from JWST documentation)

This preprocessing turned chaotic data into a clear signal, ready for analysis.

3. Decoding the Atmosphere: Retrieval Algorithms at Work

To identify DMS, scientists used atmospheric retrieval algorithms, which compare observed spectra to models of the planet’s atmosphere. These are AI algorithms which belongs to class of Bayesian Modelling. It combines Bayesian with Markov Chain Monte Carlo Simulation to find estimates. (It is also my fav class of models, having used it years prior to measure plastics in the world’s oceans. Full Blog here- https://www.ibm.com/think/insights/toward-a-world-of-plastic-free-beaches)

NEMESIS: A Bayesian modelling tool that optimizes molecular abundances and temperature profiles.
SCARLET: Tailored for hydrogen-rich atmospheres, using Markov Chain Monte Carlo (MCMC) to estimate uncertainties.

These algorithms treat the atmosphere as layered, solving radiative transfer equations to predict spectral features. They also used molecular cross-sections to model how DMS, methane, and CO₂ absorb light. The challenge? DMS and methane spectra overlap, requiring careful analysis across wavelengths.

4. Measuring Confidence: Statistical Significance

How sure are scientists about the DMS detection? They used statistical model to find out:

Bayesian Evidence: Nested Sampling (via MultiNest/Dynesty) compared models with and without DMS, yielding a 3-sigma (99.7%) confidence in 2025, up from 1–2.4-sigma in 2023.
Goodness-of-Fit: The chi-squared statistic ensured models matched data, while AIC/BIC prevented overfitting.
Degeneracy Handling: Combining NIRISS, NIRSpec, and MIRI data helped separate DMS from methane signals.

While 3-sigma is promising, a 5-sigma threshold is needed for certainty, driving plans for more observations.

5. Ensuring Reliability: Robustness Tests

To confirm the DMS signal wasn’t a fluke, scientists ran rigorous tests:

Monte Carlo Simulations: Synthetic DMS signals were added to noisy spectra to check for false positives.
Cross-Validation: Multiple retrieval codes (NEMESIS, SCARLET) produced consistent results.
Systematic Error Correction: Gaussian Process Regression modeled noise, enhancing accuracy.

These tests bolstered confidence in the 2025 findings.

Monte Carlo simulation showing detection significance vs. noise, with 3σ threshold. (Adapted from exoplanet studies)

6. Modeling the Chemistry: Biological or Not?

Could DMS indicate life? Can it come from non-biological sources? That is a hard question to answer. Scientists used photochemical models to explore its origins:

KINETICS/VULCAN: These models simulated atmospheric chemistry, suggesting a biological source for the high DMS levels (~10 ppm).
Cloud and Haze Models: Mie scattering models indicated a clear atmosphere, ideal for detecting DMS.

While promising, abiotic sources (like comets) remain a possibility, requiring further study.

7. Overcoming Challenges

Spectral Overlap: DMS and methane features blend, complicating identification.
Low Signal Strength: The 2023 signal was weak; 2025 improved but needs confirmation.
Abiotic Sources: DMS in comets suggests non-biological origins, challenging its biosignature status.

MIRI’s mid-infrared data helped, but more observations are crucial.

8. What’s Next?

This still needs more data and more analysis. We will now have to wait another 2 years to collect more data when the planet will pass through the star. Few causes of false positives

James Webb Telescope

More JWST Observations: 16–24 hours of data aim for a 5-sigma detection.
Neural Networks: Advanced algorithms may improve sensitivity.

With JWST’s MIRI leading the charge, we’re closer than ever to answering whether K2–18b harbors life.

Conclusion

The detection of DMS on K2–18b is a thrilling step in the search for extraterrestrial life. Through transit spectroscopy, sophisticated algorithms, and rigorous testing, scientists are peeling back the layers of an alien atmosphere. While challenges remain, the promise of JWST’s future observations keeps us on the edge of our seats.

Could K2–18b be home to life? Stay tuned! The quest for life continues..

Follow Towards Generative AI for more content related to the latest in AI advancement.

Citations: Madhusudhan et al., 2023 & 2025, The Astrophysical Journal Letters; NASA/ESA press releases; X posts on JWST’s MIRI.

Just How Did They Discover Life on an Alien Planet Far Far Away? was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why Grok 3 is best LLM out there !!

Kunal Sawarkar — Mon, 31 Mar 2025 21:59:28 GMT

Grok3, built by xAI, is a groundbreaking AI that didn’t get as much attention as Deepseek or others. But it is making a clear claim to be the best AI so far.

Here are my notes after trying to break it for over a month

It’s got a sharp wit and humor that other models can’t touch, making it stand out in a sea of dull AI. For the first time, this is an AI that comes to you — popping up on a vibrant platform like Twitter (X), tagging users, and dishing out opinions with a bold, active personality. Unlike the passive AIs of the past, Grok 3 doesn’t just sit there — it engages. Humor generation is really hard problem in AI as it has to connect with humans and surprise them.
It seamlessly blends cutting-edge knowledge with its training, delivering answers that feel fresh and relevant. What’s truly jaw-dropping is its knack for cutting through the noise.
In a place like India, where Twitter drowns in misinformation and hate, Grok 3 digs out the political truth — offering clear, factual takes instead of treating every source like it’s equal. That’s a game-changer in dealing with hallucinations.
And it’s not just smart — it’s versatile. Grok 3 can write in Hinglish or Hindi, switching between scripts and transliteration effortlessly. It even tosses in slang and cuss words, nailing the vibe. A truly multilingual model like this? We’ve been chasing that for years.
Plus, it can draft contracts with stunning precision — practical and powerful.
On the reasoning side, I have given it NP-hard problems in a cryptic coded format, and it was able to detect it is an NP-hard problem. Further, some mathematical conjecture, which I asked it to solve Goldbach’s Conjecture, without specifying it and it attempted to solve it while other LLMs simply cheated by detecting it. This is much more advanced reasoning capability than any other model I have seen so far!

With all those 200K GPUs to train Grok 3….. They weren’t kidding around. This thing is a beast, and it’s rewriting what we thought AI could do.

Why Grok 3 is best LLM out there !! was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

NVIDIA GTC 2025- Key Takeaways

Kunal Sawarkar — Wed, 19 Mar 2025 16:36:33 GMT

What you need to know about Nvidia GTC 2025

NVIDIA GPU Technology Conference (GTC) 2025, , brought forth several key updates and breakthroughs, primarily highlighted by CEO Jensen Huang’s keynote on March 18. Here’s a rundown of the most significant announcements and advancements based on the latest available information:

Key Updates from GTC 2025

1. Blackwell Ultra GPUs Launch

- NVIDIA unveiled the Blackwell Ultra GPU series, an enhanced version of its Blackwell architecture, set to release in the second half of 2025. These GPUs promise a substantial leap in performance, offering up to 40 times the inference capabilities of the previous Hopper GPUs. This upgrade targets the growing demands of AI factories and reasoning AI systems, positioning NVIDIA to maintain its lead in high-performance computing.

2. Next-Generation Rubin Architecture

- Huang introduced the Vera Rubin platform, featuring a custom NVIDIA-designed CPU (Vera) and tens of terabytes of memory. The Rubin GPUs are slated for release in the second half of 2026, with an even more powerful Rubin Ultra (combining four GPUs) planned for 2027. This roadmap underscores NVIDIA’s focus on future-proofing AI and cloud computing infrastructures.

3. AI and Robotics Innovations

Groot N1:

An open-source AI model for collaborative robots was debuted, developed in partnership with Google DeepMind and Disney. Groot N1 enhances robots’ tactile and learning capabilities, marking a step toward more adaptive and interactive robotics.

Newton Physics Engine:

NVIDIA, alongside Disney Research and Google DeepMind, introduced Newton, a physics engine designed to simulate robotic movements in real-world settings. Disney plans to leverage this for entertainment robots, such as its “Star Wars”-inspired BDX droids.

4. Automotive Partnerships

- NVIDIA announced a collaboration with General Motors (GM) to advance autonomous driving technology. This partnership integrates NVIDIA’s AI infrastructure, including the Halos system, to enhance self-driving car capabilities, reflecting a deepening focus on automotive applications.

5. Data Center and Infrastructure Advances

*Dynamo OS**:
A new software solution to optimize custom data centers for AI workloads, aimed at reducing costs and boosting efficiency.
**Blackwell NVLink**:
A dynamic interconnect technology offering 40x inference performance, designed to scale AI computing across massive data center setups.
**NVIDIA Photonics and Laser Data Transmission**:
The Vera Rubin platform incorporates photonics for faster data speeds, addressing the trillion-dollar data center boom Huang emphasized during the keynote.

6. AI Reasoning and Software

- NVIDIA introduced the Llama Nemotron models, optimized for multi-step AI reasoning, further enhancing generative AI and agentic architectures for enterprise applications.

7. Personal AI Supercomputers

- Huang teased “Project Digits,” a personal AI supercomputer based on the Blackwell platform, delivering 1 petaflop (1,000 TOPS) of performance. Priced at a minimum of $3,000, it’s set for release in May 2025, targeting offline AI inference for individual users.

Main Breakthroughs

1. Performance Leap with Blackwell Ultra

- The 40x inference performance increase over Hopper GPUs represents a monumental breakthrough, enabling more complex AI models and real-time reasoning applications. This positions NVIDIA to meet the escalating computational needs of AI-driven industries.

2. Robotics Revolution with Groot N1 and Newton

- The combination of Groot N1’s open-source AI and Newton’s physics simulation capabilities marks a significant advancement in robotics. These tools enable robots to learn, adapt, and operate in dynamic environments, with immediate applications in entertainment and beyond.

3. Photonics Integration for Data Speed

- Incorporating photonics into the Rubin platform is a pioneering move to accelerate data transmission, addressing bottlenecks in AI cloud computing and supporting the massive scale of modern data centers.

4. AI Infrastructure Scalability

- The Dynamo OS and Blackwell NVLink breakthroughs enhance the scalability and efficiency of AI infrastructure, catering to a projected $50 billion market. This reflects NVIDIA’s strategic pivot to dominate the trillion-dollar shift toward AI-driven computing.

5. Consumer AI Accessibility

- Project Digits brings high-end AI capabilities to the desktop, democratizing access to petaflop-level performance. This could reshape how developers and enthusiasts engage with AI locally.

Context and Impact

Huang emphasized that AI is at an “inflection point,” driven by a 100x increase in computational demand over recent years. These updates and breakthroughs reinforce NVIDIA’s dominance in AI hardware and software, while expanding its influence into robotics, automotive, and personal computing. The collaboration with industry giants like GM, Disney, and Google DeepMind further amplifies the real-world applicability of these innovations, setting the stage for transformative changes across multiple sectors.

These insights are drawn from various reports and posts circulating around the event reflecting the immediate takeaways from GTC 2025.

Follow Towards Generative AI for more content related to the latest advancement in AI.

NVIDIA GTC 2025- Key Takeaways was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Understanding TITANS: New Architecture to Redefine Transformers with Long-Term Memory

Kunal Sawarkar — Tue, 04 Mar 2025 19:33:03 GMT

How does the new paper from Google handle the well-known limitations of Transformers, and what are its applications?

In the ever-evolving world of artificial intelligence, transformers have emerged as one of the most significant breakthroughs in LLMs. From chatbots to content generation, these models have revolutionized how machines understand and generate human-like text. However, as the demand for more sophisticated applications grows, so do the limitations of traditional transformer architectures. Enter TITANS — a novel approach designed by Google to overcome these challenges by introducing a groundbreaking concept: persistent memory with a focus on surprising information.

Current Limitations of Transformers

Despite their success, transformers are not without flaws. One of the most notable limitations is their lack of persistent memory. While some premium models attempt to address this issue through developer heuristics, metadata tagging, and human feedback, these solutions are far from perfect. Transformers often struggle to retain context over extended interactions, making them less effective for tasks requiring long-term memory or nuanced understanding.

This limitation becomes especially apparent in applications where context continuity is critical. For example, conducting deep research or assisting users over prolonged conversations requires models to “remember” past interactions effectively. Traditional transformers fall short in such scenarios, paving the way for innovations like TITANS.

Introducing TITANS: A New Paradigm in Memory

TITANS (Transformers with Integrated and Tunable Attention for Novel Surprises) is a next-generation architecture that addresses the memory limitations of traditional transformers. What sets TITANS apart is its ability to retain and prioritize surprising information, mimicking how humans process and store memories.

How TITANS Differs from Transformers?

TITANS operates on a simple yet powerful principle:

Memory = Past Memory + Surprise.

The model assigns higher importance to surprising events or information, ensuring they are remembered more vividly than mundane details. This approach mirrors human cognition — think back to a surprising event in your life, and you’ll likely recall not just the event itself but also the surrounding context in vivid detail.

The “surprise” factor is quantified using a gradient-based mechanism:

Surprise = Most Recent Surprise — b × (Momentary Surprise)
Here, “b” represents the gradient of the loss with respect to memory parameters, determining how much surprise can be encoded from subtle changes in memory.

By integrating this dynamic memory mechanism, TITANS achieves a longer context window and enhanced recall capabilities, making it ideal for tasks requiring persistent memory.

The Titans architecture incorporates the below steps to differ from Transformers

Titans incorporate a neural long-term memory module that can store and retrieve historical information dynamically. This allows the model to maintain context over much longer sequences than traditional Transformers.
Titans include a persistent memory component that contains task-specific knowledge in learnable, static parameters. This is absent in standard Transformers.
Titans use a brain-inspired memory system with short-term, long-term, and persistent memory, mimicking human cognitive processes. Traditional Transformers rely solely on their attention mechanism for information retention.
Titans can process sequences of up to 2 million tokens without increasing computational complexity. Traditional Transformers are typically limited to context windows of about 4–128K tokens.
Titans achieve O(n) linear scaling through a hierarchical memory architecture, compared to the quadratic O(n²) scaling of Transformer’s self-attention mechanism.
Titans can update their persistent memory during inference, enabling real-time learning without retraining the entire model. Traditional Transformers require full retraining to incorporate new information.
Titans use a “surprise” metric to prioritize memorization of unexpected or significant events. This allows for more efficient use of memory capacity compared to Transformers’ uniform attention distribution.

How TITAN works?

Titans introduce three architectural variants: Memory as a Context (MAC), Memory as a Gate (MAG), and Memory as a Layer (MAL). These offer different ways to integrate the memory components, providing flexibility not present in standard Transformers.

Memory Types in TITANS

TITANS incorporates three distinct memory types to cater to different use cases:

Memory as Context (MAC):
A separate database of value-context pairs that the transformer can reference as needed.
Memory as Gates (MAG):
Gated memory with clear cutoff criteria, similar to Long Short-Term Memory (LSTM) networks.
Memory as Layer (MAL):
An additional layer atop the attention layers that models surprises based on abstract connections within the data.

Each memory type offers unique advantages, allowing TITANS to adapt to diverse requirements while maintaining efficiency.

These memory modules allow Titans to overcome some of the limitations of traditional Transformers, particularly in handling long-term dependencies and adapting to new information without full retraining.

Applications of TITANS

Titans excel in tasks requiring extensive context, such as “needle-in-haystack” problems and genomic sequence analysis. They also show improved performance in language modeling, common-sense reasoning, and time series forecasting compared to traditional Transformers.The introduction of TITANS opens up a world of possibilities across various domains:

Needle-in-a-Haystack (NIAH) Tasks

Discovering additional indications for pharmaceuticals
Identifying fraudulent activities in complex datasets

2. General Persistent Memory

AI assistants capable of maintaining long-term conversational context
Deep research transformers for academic or industrial use

These applications highlight TITANS’ potential to revolutionize industries that rely on extensive data analysis and contextual understanding.

Experimental Results

Titans outperform Transformers and modern recurrent models across various tasks, including language modeling, commonsense reasoning, genomics, and time series forecasting.

They scale effectively to context window sizes exceeding 2 million tokens, achieving higher accuracy in recall-intensive tasks like “needle-in-haystack” problems compared to baseline models.

Challenges

While TITANS represents a significant leap forward, it is not without challenges. The increased memory and parameter requirements result in higher computational costs — ranging from 30% to 100% more than traditional transformers. This cost factor poses a barrier to widespread deployment, particularly for smaller organizations with limited resources.

Currently, no large-scale deployments of TITANS exist. However, its potential has sparked interest across industries, with discussions around integrating TITANS into platforms like DeepSeek gaining momentum.

Old wine in new bottle? RNN->

RNN (recurrent neural networks), which were a precursor to transformers, come to my mind when I read the TITANS paper. I worked on it in the first generation of GenAI (read old blog here on generating poems https://medium.com/towards-generative-ai/generating-poems-for-any-given-image-using-multi-modal-machine-learning-2be35b72f50a ) The architecture seems a tweak on RNN architecture.

RNN was capable of holding memory but did not scale well. It is also a big issue with exploding or vanishing gradient. Getting rid of recurrent unit and focusing only on attention made things really impactful in LLMs. (hence the title of that famous paper “Attention is all you need”). It would be important to see how this scales.

Moving Forward

As we look ahead, the development and deployment of TITANS could redefine what transformers are capable of achieving. By addressing the limitations of traditional architectures and introducing human-like memory capabilities, TITANS opens new doors for AI applications that were previously unattainable.

In conclusion, TITANS represents not just an incremental improvement but a paradigm shift in transformer technology. By prioritizing surprising information and extending context windows, it bridges the gap between human cognition and machine learning — paving the way for smarter, more capable AI systems.

Follow Towards Generative AI for more content related to the latest advancement in AI.

Citations:

https://arxiv.org/abs/2501.00663 from Google
All images credit to Google

Understanding TITANS: New Architecture to Redefine Transformers with Long-Term Memory was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why FOBO around DeepSeek is unjustified?

Kunal Sawarkar — Mon, 03 Feb 2025 17:40:46 GMT

DeepSeek AI seems to have caused a crazy level of FOBO ( Fear of Better Options) in the GenAI industry. Here is a short summary of why the hype is exceeding the reality and this may not really be a “ Sputnik moment” after all.

Sputnik

Imitation is always cheaper than innovation. Deepseek used llama from AI at Meta as base architecture and traffic tokens from others ( possibly OpenAI) as data. And then built upon proven architecture. This saves a lot of costs if you don’t have to pay for good quality data, instruct sets, and finding the right architecture or innovating the right metrics.

Deepseek vs OpenAI

2. Thus it should be seen as a “refreshed model” or a “refined model” of existing models rather than a true pre-trained model. Something more than a distilled version with tweaks to architecture but a lot of things already baked in.

3. The reasoning capabilities are overblown .. it’s not true reasoning. the right way is to think of it as smaller models outperforming larger ones on the same tasks ( which was a trend in open source for a while)

Nvidia

4. What was interesting and spooked markets was their claim ( which needs verification) that they trained the model on the old A100 and not the H100, this means all the hyped-up demand for Nvidia H100 and h200 and tsmc chips is overblown .. and you can train at a much lower footprint. And hence that rout in the stock market I guess. Also, I look at that as a much overdue correction as valuations for some stocks were anyway way off any EBITDA projections for chipmakers.

5. I for one don’t believe this will dominate the enterprise AI space. The model has China-specific guardrails like Tiananmen Square or other blockers which most countries and companies will find hard to govern and adopt in the long term. Llama will still be a much better option to build a reliable architecture.

6. What is good in DeepSeek is that it’s not just an open model but open weights .. which means others can quickly distill this model to even smaller footprints ( shifting focus on smaller models that can go to prod rather than large chip investment ). This can be a game changer for enterprises as the cost of taking GenAI to prod is still exorbitant.

7. Or the hype can cause severe disruption with geopolitical implications and affect the entire open-source ecosystem by government intervention. The hawks like Vinod Khosla always argued against open source models in SoTA AI ( with vested interests of course )

Credit- Google

I still don’t think this is a “Sputnik Moment” for a simple reason. Remember who won the space race in the end. The joke in the space race was that Americans spent a million dollars to invent zero-gravity pens but Russians used pencils. This missed the whole point of innovation. It’s about the flywheel effect. US space program resulted in countless innovations from remote control of the TV to de-icing of jet planes to nonstick cookware. The USA still is the favorite to win the race in the long term.

Follow Towards Generative AI for more content related to the latest advancement in AI.

Why FOBO around DeepSeek is unjustified? was originally published in Towards Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Stories by Kunal Sawarkar on Medium

Its the rise of other AI that all should worry about.. Artificial Intimacy!

The real threat isn’t AI taking your job — it’s AI taking your partner…

First question is why fall in love with AI?

Humans are not made of Lego pieces!

Intimacy is at the core of what makes us human!

Loss of Innocence

Not just a social but an existential threat

What Training AI on Math Teaches Us About Hiring the Right People?

What is AI SuperIntelligence — and How Would We Know If We’ve Reached It?

What is AI SuperIntelligence — and How Would We Know If We’ve Reached It?

And is Zuckerberg’s vision hype or the near future?

How is SuperIntelligence defined?

Types of AI on the spectrum

Zuckerberg’s Vision: Empowerment or Hype?

Is it Realistic?

Will We Ever Get There?

What is Context Engineering? And Why Is Everyone Talking About It?

Here is a primer on understanding how it differs from prompt engineering

What you need to know on Agentic Protocols MCP vs A2A, vs ACP

There are so many agentic protocols now. But what is the difference between MCP vs A2A vs ACP, and when to use which one?

1. MCP from Anthropic

2. A2A (Agent-to-Agent Protocol) from Google

Challenges:

3. ACP (Agent Connect Protocol) from IBM

Challenges:

Key Contrast:

Just How Did They Discover Life on an Alien Planet Far Far Away?

The discovery of DMS on Alien Planet K2–18b has a lot to do with statistical modelling (or AI as companies want to call it now)! Here is a simplified version from the paper on how it was done.

Unveiling Alien Atmospheres: How Scientists Detected DMS Gas on K2–18b

The Basics of observing a distant planet

1. Capturing Starlight: The Power of Transit Spectroscopy

2. Cleaning Up the Data: From Raw to Refined

3. Decoding the Atmosphere: Retrieval Algorithms at Work

4. Measuring Confidence: Statistical Significance

5. Ensuring Reliability: Robustness Tests

6. Modeling the Chemistry: Biological or Not?

7. Overcoming Challenges

8. What’s Next?

Conclusion

Why Grok 3 is best LLM out there !!

Grok3, built by xAI, is a groundbreaking AI that didn’t get as much attention as Deepseek or others. But it is making a clear claim to be the best AI so far.

NVIDIA GTC 2025- Key Takeaways

What you need to know about Nvidia GTC 2025

Key Updates from GTC 2025

1. **Blackwell Ultra GPUs Launch**

2. **Next-Generation Rubin Architecture**

3. **AI and Robotics Innovations**

**Groot N1**:

**Newton Physics Engine**:

4. **Automotive Partnerships**

5. **Data Center and Infrastructure Advances**

6. **AI Reasoning and Software**

7. **Personal AI Supercomputers**

Main Breakthroughs

1. **Performance Leap with Blackwell Ultra**

2. **Robotics Revolution with Groot N1 and Newton**

3. **Photonics Integration for Data Speed**

4. **AI Infrastructure Scalability**

5. **Consumer AI Accessibility**

Context and Impact

Understanding TITANS: New Architecture to Redefine Transformers with Long-Term Memory

How does the new paper from Google handle the well-known limitations of Transformers, and what are its applications?

Current Limitations of Transformers

Introducing TITANS: A New Paradigm in Memory

How TITANS Differs from Transformers?

How TITAN works?

Memory Types in TITANS

Applications of TITANS

Experimental Results

Challenges

Old wine in new bottle? RNN->

Moving Forward

Citations:

Why FOBO around DeepSeek is unjustified?

DeepSeek AI seems to have caused a crazy level of FOBO ( Fear of Better Options) in the GenAI industry. Here is a short summary of why the hype is exceeding the reality and this may not really be a “ Sputnik moment” after all.

1. Blackwell Ultra GPUs Launch

2. Next-Generation Rubin Architecture

3. AI and Robotics Innovations

Groot N1:

Newton Physics Engine:

4. Automotive Partnerships

5. Data Center and Infrastructure Advances

6. AI Reasoning and Software

7. Personal AI Supercomputers

1. Performance Leap with Blackwell Ultra

2. Robotics Revolution with Groot N1 and Newton

3. Photonics Integration for Data Speed

4. AI Infrastructure Scalability

5. Consumer AI Accessibility