Introducing OpenAI Strawberry š o1-preview
OpenAI has actively started releasing its new series of reasoning models for soving hard and complex problems.
Now OpenAI Strawberry š (o1) is out!
What are OpenAIās new o1 models and how do they āthinkā? OpenAI has released two new preview models yesterday, o1-preview and o1-mini, designed to spend more time āthinkingā before responding, claiming to improve their reasoning capabilities for complex tasks. Unlike older models, o1 pauses to āthinkā before it responds. That brief moment is all about delivering thoughtful, accurate answers ā especially for tough questions in areas like math, science, and coding. After all the initial buzz about a limited rollout, o1 is now available ā at least in part. OpenAI is managing user feedback and refining the model based on real-world interactions. Expect continued improvements as it gets more widely adopted. This model isnāt just about generating text ā itās about reasoning. With o1, AI is stepping into more advanced roles in problem-solving across industries, paving the way for innovative breakthroughs.
š Is test-time compute all you need š¤£? This new OpenAI o1 is here, reportedly surpassing human PhD-level accuracy on benchmarks in physics, biology, and chemistry!
š§ The model uses a āhiddenā chain-of-thought process, enabling it to think through problems in a more human-like way (whatever that means š
)
š°ļøTurns out that this deeper reasoning significantly boosts performance during test time, allowing for better, more accurate results with prolonged processing (10ā20 seconds).
ā³ The more time the model takes to analyze a task, the stronger and more precise its outcomes tend to be.
š Performance and Benchmarks
ā³Programming: Ranked in the 89th percentile in Codeforces, showcasing advanced problem-solving and coding abilities. Itās not just generating code ā itās tackling complex problems like a pro. Imagine having an AI partner with real-world problem-solving skills!
ā³Mathematics: Placed in the top 500 in the USA Math Olympiad, solving 74% of problems, exceeding GPT-4oās performance.
ā³Science: Surpassed PhD-level experts on physics, biology, and chemistry benchmarks (GPQA).
š°Pricing Alert: For developers, accessing o1 will cost $15 per 1 million input tokens and $60 per 1 million output tokens via API. Why so steep? Itās specialized for complex problem-solving ā think of it as paying for premium AI intelligence.
š āreasoning output tokensā are hidden from the user in the UI and ChatGPT but billed (you pay for something you donāt see)
š« At present ā No system prompts, streaming, tool usage, batch calls, or image inputs support
š° API access limited to high-tier accounts (min. $1,000 spent)
š Increased output token limits (32,768 for o1-preview, 65,536 for o1-mini) probably for thinking.
OpenAI has also introduced o1-mini, a smaller, faster, and more affordable version of the o1-preview model thatās especially good at coding tasks. Itās 80% cheaper, making it a great option for developers who need powerful reasoning abilities without breaking the bank.
A good visualization from Tom Yeh is shown below. How does OpenAI train the Strawberryš (o1) model to spend more time thinking? This is just for illustration and guess work done by Tom on how this model would have trained. I believe this is done in the similar way below.
š”In RLHF+CoT, the CoT tokens are also fed to the reward model to get a score to update the LLM for better alignment, whereas in the traditional RLHF, only the prompt and response are fed to the reward model to align the LLM.
š”At the inference time, the model has learned to always start by generating CoT tokens, which can take up to 30 seconds, before starting to generate the final response. Thatās how the model is spending more time to think!
There are other important technical details missing, like how the reward model was trained, how human preferences for the āthinking processā were elicitedā¦etc.
Finally, as a disclaimer, this animation represents Tom Yehās best educated guess. We canāt verify the accuracy at present. We do wish someone from OpenAI can jump out to correct this chart animation. Because if they do, we will all learn something useful! š
References:
- Credits (Linkedin family) : Jim Fan , Sonu Kumar, Philipp Schimid, Tom Yeh, Aishwarya
- Blog: https://lnkd.in/ezAzb-Fp
- https://openai.com/index/introducing-openai-o1-preview/