The AI Landscape in 2024: Training Costs, Open Source, and Running Out of Data
The 2024 AI Index Report from the Stanford Institute for Human-Centered Artificial Intelligence (HAI) provides a comprehensive overview of the AI landscape. In a series of articles, we highlight key findings of the report, focusing on trends and insights that are particularly relevant for business leaders.
In this article we’ll dive into the rising costs of training AI models, the potential for data depletion, the evolution of foundation models, and the shift towards open-source AI.
Skyrocketing Training Costs and Compute Trends
One of the most striking findings from the report is the exponential increase in the cost of training state-of-the-art AI models. In 2017, the original Transformer model cost around $900 to train. Fast forward to 2023, and the estimated training costs for OpenAI’s GPT-4 and Google’s Gemini Ultra are $78 million and $191 million, respectively. This trend is driven by the growing complexity of AI models and the vast amounts of data they require.
Key Takeaway: As AI models become more sophisticated, the financial and computational resources required to train them are becoming a significant barrier to entry. This could lead to a concentration of AI capabilities among a few well-resourced companies and institutions.
Will Models Run Out of Data?
The report highlights concerns about the potential depletion of data for training AI models. Researchers estimate that high-quality language data could be exhausted by 2024, with low-quality language data lasting up to two decades and image data running out by the mid-2040s. While synthetic data generated by AI models themselves could potentially address this issue, recent research suggests that models trained predominantly on synthetic data may suffer from reduced output diversity and quality.
Key Takeaway: The potential scarcity of training data could become a significant constraint for the development of AI models in the coming years. Businesses should consider strategies for efficiently using and preserving high-quality data.
The Evolution of Foundation Models
Foundation models, which are large AI models trained on massive datasets and capable of performing a wide range of tasks, have seen rapid growth in recent years. The number of foundation models released annually has more than doubled since 2022, with the majority now originating from industry rather than academia. Notably, the United States leads in the development of foundation models, followed by China and the European Union.
Key Takeaway: Foundation models are becoming increasingly important in the AI landscape, with industry players taking the lead in their development. Businesses should keep a close eye on advancements in foundation models and consider how they could be leveraged for their specific use cases.
The Shift Towards Open-Source AI
The report shows a significant shift towards open-source AI models. In 2023, 65.8% of newly released foundation models were open-source, compared to only 44.4% in 2022. This trend is also reflected in the explosive growth of AI-related projects on GitHub, with the number of projects increasing by 59.3% in 2023 alone.
Key Takeaway: The growing availability of open-source AI models and tools lowers the barrier to entry for businesses looking to adopt AI. However, it also means that AI capabilities are becoming more widely accessible, potentially leveling the playing field for competitors.
Conclusion
The 2024 HAI AI Index Report reveals a rapidly evolving AI landscape characterized by rising training costs, potential data constraints, the dominance of foundation models, and a shift towards open-source AI. Business leaders must stay informed about these trends to make strategic decisions about AI adoption and investment. By understanding the challenges and opportunities presented by these developments, businesses can position themselves to harness the power of AI in the coming years.