Announcing AI2 OLMo, an Open Language Model Made by Scientists, for Scientists
Today, the Allen Institute for AI is excited to announce that we are embarking on the creation of an open, state-of-the-art generative language model: AI2 OLMo (Open Language Model). OLMo will be comparable in scale to other state-of-the-art large language models at 70 billion parameters, and is expected in early 2024.
OLMo will be a uniquely open language model intended to benefit the research community by providing access and education around all aspects of model creation. AI2 is developing OLMo in collaboration with AMD and CSC, using the new GPU portion of the all-AMD processor powered LUMI pre-exascale supercomputer — one of the greenest supercomputers in the world.
OLMo will be a new avenue for many people in the AI research community to work directly on language models for the first time. We will be making all elements of the OLMo project accessible — not only will our data be available, but so will the code used to create the data. We will release the model, the training code, the training curves, and evaluation benchmarks. We will also openly share and discuss the ethical and educational considerations around the creation of this model to help guide the understanding and responsible development of language modeling technology.
This broad availability of all aspects of OLMo will allow the research community to directly take what we create and work to improve it. We believe that millions of people want to better understand and engage with language models, and we aim to create the environment where they actually can, leading to faster and safer progress for everyone. Our goal is to collaboratively build the best open language model in the world — follow along with us on Twitter, our blog, and our newsletter to become a part of this important undertaking.
“With the scientific community in mind, OLMo will be purpose-built to advance the science of language models,” says Hannaneh Hajishirzi, an OLMo project lead, a senior director of NLP research at AI2, and a professor at the University of Washington’s Allen School of Computer Science & Engineering. “OLMo will be the first language model specifically designed for scientific understanding and discovery.”
“AI2’s deep heritage in natural language processing (NLP) with AMD’s history of supporting the scientific community through our high-performance computing efforts are a perfect match for OLMo,” said Ian Ferreria, senior director, AI Solutions, AMD. “With the new OLMo initiative from AI2, which is geared for science, we have the capability to extend our knowledge into generative AI using the impressive capabilities from the LUMI Supercomputer powered by AMD EPYC™ CPUs and AMD Instinct™ accelerators.”
A truly open model
As a transparent, collaborative, nonprofit institution, we are well-positioned to build a language model that is truly open and uniquely valuable to the AI research community. Our OLMo endeavor will include more than just building an open language model — we’re purposely building a platform that will allow the research community to take each component we create and either use it themselves or seek to improve it. Everything we create for OLMo will be openly available, documented, and reproducible, with very limited exceptions and under suitable licensing. The artifacts released as part of the OLMo project will include training data, code, model weights, intermediate checkpoints, and ablations. A release strategy for the model and its artifacts is in development as part of the project. We also plan to build a demo and release interaction data from consenting users.
Furthering AI research
As we build OLMo, we will make decisions that make the final model as usable and efficient as possible without sacrificing performance. Our aim is to make our model accessible to the full breadth of the AI research community, increasing the diversity of perspectives and pace of improvement in language model development. We will also build and release the most rigorously studied and documented model training dataset to date — this will include pretraining data, instruction data, and human interaction data.
Ethical and educational
With OLMo, we are taking a pragmatic approach to ethics and openness. We will lead with transparency by documenting the decisions, considerations, and trade-offs we make in considering the ethical and societal impacts of creating and releasing the OLMo model. Along the way, we will promote AI knowledge and understanding by sharing our progress, describing our challenges, and explaining our discoveries. The OLMo team is working closely with AI2’s legal department and outside legal experts and has included multiple checkpoints in the model-building process to assess and reassess privacy and intellectual property rights issues.
Partnerships and support
In addition to the collaboration on hardware and computing resources with AMD and LUMI, AI2 is partnering with organizations including Surge AI and MosaicML for data and training code. We have created an ethics review committee that includes both internal and external advisors to provide feedback throughout the process. The OLMo model and API will be a powerful new resource for the broader community to better understand and participate in the generative AI revolution. AI2 welcomes support and partnership from organizations aligned with our values of AI for the common good and invested in building responsible, beneficial artificial intelligence technologies — please let us know of your interest here.
“OLMo will be something special,” notes Noah Smith, also an OLMo project lead, a senior director of NLP Research at AI2, and a professor at the Allen School. “In a landscape where many are rushing to cash in on the business potential of generative language models, AI2 has the unique ability to bring our world-class expertise together with world-class hardware from AMD and LUMI to produce something explicitly designed for scientists and researchers to engage with, learn from, and use to create the next generation of safe, effective AI technologies.”
Pekka Manninen, Director of Science and Technology at CSC, adds: “Generative AI carries the potential of being the breakthrough technology of this decade, analogous to how search engines and smartphones penetrated our society in the previous decades. Open, transparent, and explainable LLMs are vital for the democratization of this technology. We are proud to be part of this collaboration for its great societal impact and technological ambition level, and happy that we can contribute to it with the LUMI supercomputer and our expertise. Supercomputers like LUMI can accelerate LLM training by an order of magnitude, and many other features of the LUMI infrastructure position it as a leading platform for natural language processing.”
AMD, the AMD Arrow logo, EPYC, AMD Instinct, and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Check out our current openings, follow @allen_ai on Twitter, and subscribe to the AI2 Newsletter to stay current on news and research coming out of AI2.