Introducing OpenAI Strawberry šŸ“ o1-preview

Shravan Kumar
4 min readSep 13, 2024

--

OpenAI has actively started releasing its new series of reasoning models for soving hard and complex problems.

Now OpenAI Strawberry šŸ“ (o1) is out!

What are OpenAIā€™s new o1 models and how do they ā€œthinkā€? OpenAI has released two new preview models yesterday, o1-preview and o1-mini, designed to spend more time ā€œthinkingā€ before responding, claiming to improve their reasoning capabilities for complex tasks. Unlike older models, o1 pauses to ā€œthinkā€ before it responds. That brief moment is all about delivering thoughtful, accurate answers ā€” especially for tough questions in areas like math, science, and coding. After all the initial buzz about a limited rollout, o1 is now available ā€” at least in part. OpenAI is managing user feedback and refining the model based on real-world interactions. Expect continued improvements as it gets more widely adopted. This model isnā€™t just about generating text ā€” itā€™s about reasoning. With o1, AI is stepping into more advanced roles in problem-solving across industries, paving the way for innovative breakthroughs.

šŸ“ Is test-time compute all you need šŸ¤£? This new OpenAI o1 is here, reportedly surpassing human PhD-level accuracy on benchmarks in physics, biology, and chemistry!

šŸ§  The model uses a ā€œhiddenā€ chain-of-thought process, enabling it to think through problems in a more human-like way (whatever that means šŸ˜… )
šŸ•°ļøTurns out that this deeper reasoning significantly boosts performance during test time, allowing for better, more accurate results with prolonged processing (10ā€“20 seconds).
ā›³ The more time the model takes to analyze a task, the stronger and more precise its outcomes tend to be.

šŸ“ˆ Performance and Benchmarks
ā›³Programming: Ranked in the 89th percentile in Codeforces, showcasing advanced problem-solving and coding abilities. Itā€™s not just generating code ā€” itā€™s tackling complex problems like a pro. Imagine having an AI partner with real-world problem-solving skills!
ā›³Mathematics: Placed in the top 500 in the USA Math Olympiad, solving 74% of problems, exceeding GPT-4oā€™s performance.
ā›³Science: Surpassed PhD-level experts on physics, biology, and chemistry benchmarks (GPQA).

šŸ’°Pricing Alert: For developers, accessing o1 will cost $15 per 1 million input tokens and $60 per 1 million output tokens via API. Why so steep? Itā€™s specialized for complex problem-solving ā€” think of it as paying for premium AI intelligence.

šŸ”’ ā€œreasoning output tokensā€ are hidden from the user in the UI and ChatGPT but billed (you pay for something you donā€™t see)

šŸš« At present ā€” No system prompts, streaming, tool usage, batch calls, or image inputs support

šŸ’° API access limited to high-tier accounts (min. $1,000 spent)

šŸ“Š Increased output token limits (32,768 for o1-preview, 65,536 for o1-mini) probably for thinking.

OpenAI has also introduced o1-mini, a smaller, faster, and more affordable version of the o1-preview model thatā€™s especially good at coding tasks. Itā€™s 80% cheaper, making it a great option for developers who need powerful reasoning abilities without breaking the bank.

A good visualization from Tom Yeh is shown below. How does OpenAI train the StrawberryšŸ“ (o1) model to spend more time thinking? This is just for illustration and guess work done by Tom on how this model would have trained. I believe this is done in the similar way below.

šŸ’”In RLHF+CoT, the CoT tokens are also fed to the reward model to get a score to update the LLM for better alignment, whereas in the traditional RLHF, only the prompt and response are fed to the reward model to align the LLM.

šŸ’”At the inference time, the model has learned to always start by generating CoT tokens, which can take up to 30 seconds, before starting to generate the final response. Thatā€™s how the model is spending more time to think!

There are other important technical details missing, like how the reward model was trained, how human preferences for the ā€œthinking processā€ were elicitedā€¦etc.

Finally, as a disclaimer, this animation represents Tom Yehā€™s best educated guess. We canā€™t verify the accuracy at present. We do wish someone from OpenAI can jump out to correct this chart animation. Because if they do, we will all learn something useful! šŸ™Œ

References:

--

--

Shravan Kumar

Indian | AI Leader | Associate Director @ Novartis | Alumnus, IIT Madras & IIM Bangalore Follow me for more on AI, Data Science