Member-only story

Featured

OpenAI Achieved AGI With Its New o3 model?

Vishal Rajput
AIGuys
Published in
13 min readDec 23, 2024

--

This week OpenAI showed a demo of their latest model, and the results seem almost impossible. It has crushed many benchmarks that were supposed to last for decades. Many people on Twitter and other social media claim that OpenAI just achieved AGI with their new o3 model. Why is it ground-shattering or is it just hype? So, in today’s article, we will try to throw more light on whatever information is available to us. So, without further ado, let’s start.

Table Of Contents

  • The Promise Of o3: Crushing Benchmarks?
  • What is ARC-AGI?
  • Let’s Go Deeper Beyond Benchmarks
  • How o3 Is Doing All This?
  • My Take On This Supposed New AGI !!!
Fail cases. There were 34 tasks that even it couldn’t solve with 16 hours of thinking.

The Promise Of o3: Crushing Benchmarks?

The reason why o3 has become the talk is because of the two benchmarks it crushed. Frontier math and ARC-AGI.

SWE benchmarks

The SWE benchmark (SWE-bench) is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories…

--

--

AIGuys
AIGuys

Published in AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research papers. We at AIGuys believe in quality over quantity and are always looking to create more nuanced and detail oriented content.

Responses (2)