Top Takeaways from the 2024 Artificial Intelligence Index Report

Published in

Version 1

10 min readApr 26, 2024

Image by Peace,love,happiness from Pixabay

The 2024 AI (Artificial Intelligence) Index Report, recently released by Stanford University, provides a wealth of data that helps us grasp the magnitude of technological advancements and their impact. This comprehensive report covers various aspects of artificial intelligence, including research and development, technical performance, ethics, the economy, education, policy, governance, diversity, and public opinion.

It sheds light on critical topics such as foundation models, training costs, environmental implications, K-12 AI education, and global legislation related to AI.

In this context, we will delve into the key insights highlighted by the researchers in this publication.

1. AI beats humans on some tasks, but not on all.

As of 2023, AI has achieved levels of performance that surpass human capabilities across a range of tasks. Over the years, AI has surpassed human baselines on a handful of benchmarks, such as image classification in 2015, basic reading comprehension in 2017, visual reasoning in 2020, and natural language inference in 2021.

However, there are still some task categories where AI fails to exceed human ability. These tend to be more complex cognitive tasks, such as visual common sense reasoning and advanced-level mathematical problem-solving (competition-level math problems).

Despite remarkable achievements, LLMs (Large Language Models) remain susceptible to factual inaccuracies and content hallucination — creating realistic, yet false, information. The presence of real-world instances where LLMs have produced hallucinations — in court cases, for example — underscores the growing necessity of closely monitoring trends in LLM factuality.

While existing research has aimed to understand the causes of hallucinations, less effort has been directed toward assessing the frequency of LLM hallucinations and identifying specific content areas where they are especially vulnerable.

One research indicates that ChatGPT fabricates unverifiable information in approximately 19.5% of its responses, with these fabrications spanning a variety of topics such as language, climate, and technology. Additionally, the study examines how well current LLMs can detect hallucinations across various tasks, including question-answering, knowledge-grounded dialogue, and text summarisation. The findings reveal that many LLMs struggle with these tasks, highlighting that hallucination is a significant ongoing issue.

2. Industry continues to dominate frontier AI research.

In 2023, industry produced 51 notable machine learning models, while academia contributed only 15. There were also 21 notable models resulting from industry-academia collaborations in 2023, a new high. While the academic sector predominantly contributes to AI publications, accounting for 81% of published articles, the level of contribution significantly diminishes when discussing AI models.

These findings underscore that the expensive nature of model training poses a significant barrier for universities aiming to create state-of-the-art models. Even prestigious institutions in the United States, like MIT, Harvard, and Stanford, frequently encounter infrastructure gaps and must collaborate with large technology companies. Unfortunately, this reliance can potentially jeopardize scientific and academic autonomy.

This represents a configuration that impacts even the most advanced nations. Last year, the US Congress proposed the “Create AI Act,” aiming to establish an infrastructure that grants universities computational resources for maintaining autonomy in AI initiatives. It’s a crucial consideration that many countries should integrate into their strategic planning.

Learn more about AI in industry by checking out Version 1’s AI Labs and Innovation pages.

3. Frontier models get way more expensive.

According to AI Index estimates, the training costs of state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated $78 million worth of computing to train, while Google’s Gemini Ultra cost $191 million for computing.

These figures are just the computational cost. We must also consider costs to attract and keep the best talent and AI maintenance running.

Nowadays, the primary systems are centralized within a handful of organizations, predominantly situated in the United States. This dominance stems from the extensive data and traffic ecosystem already established by Big Tech. However, it’s not solely about data — it’s also about financial resources.

This surge of capital is essential due to the escalating costs associated with model training. Fierce competition drives companies to pursue larger and more intricate models, ensuring favourable standings at the forefront of AI performance rankings.

4. The United States leads China, the EU, and the U.K. as the leading source of top AI models.

In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European Union’s 21 and China’s 15.

There are two main reasons why the European Union ranks second in the evaluation of leading AI models. Firstly, the European Union benefits from a high level of research development due to its hosting of cutting-edge research institutions and development centres. These entities foster innovation and collaboration, playing a crucial role in advancing AI technologies and models.

The second factor involves the widespread adoption of the open-source movement within the bloc. Numerous companies within the European Union develop their AI solutions using open-source frameworks, fostering a dynamic ecosystem of collaborative knowledge sharing and development.

On the other hand, China dominates the field of industrial robotics. In 2013, China surpassed Japan to become the leader in deploying robots for constructing factories and future products. During that year, China accounted for 20.8% of global industrial robot installations, a figure that surged to an impressive 52.4% by 2022.

China’s remarkable growth in the robotic industry reflects its commitment to technological advancement and automation. The widespread adoption of robots across various sectors underscores the nation’s strategic focus on innovation and efficiency.

5. Robust and standardized evaluations for LLM responsibility are seriously lacking.

New research from the AI Index reveals a significant lack of standardization in responsible AI reporting. Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and limitations of top AI models.

The AI Index examined a selection of leading AI model developers, specifically OpenAI, Meta, Anthropic, Google, and Mistral AI. The Index identified one flagship model from each developer (GPT-4, Llama 2, Claude 2, Gemini, and Mistral 7B) and assessed the benchmarks on which they evaluated their model. A few standard benchmarks for general capabilities evaluation were commonly used by these developers, such as MMLU, HellaSwag, ARC Challenge, Codex HumanEval, and GSM8K.

Unlike general capability evaluations, there is no universally accepted set of responsible AI benchmarks used by leading model developers. TruthfulQA, at most, is used by three out of the five selected developers. Other notable responsible AI benchmarks like RealToxicityPrompts, ToxiGen, BOLD, and BBQ are each utilized by at most two of the five profiled developers. Furthermore, one out of the five developers did not report any responsible AI benchmarks, though all developers mentioned conducting additional, non-standardized internal capability and safety tests.

The inconsistency in reported benchmarks complicates the comparison of models, particularly in the domain of responsible AI. The diversity in benchmark selection may reflect existing benchmarks becoming quickly saturated, rendering them ineffective for comparison, or the regular introduction of new benchmarks without clear reporting standards. Additionally, developers might selectively report benchmarks that positively highlight their model’s performance. To improve responsible AI reporting, a consensus must be reached on which benchmarks model developers should consistently test.

Benchmarks play an important role in tracking the capabilities of state-of-the-art AI models. In recent years there has been a shift toward evaluating models not only on their broader capabilities but also on responsibility-related features. This change reflects the growing importance of AI and the growing demands for AI accountability. As AI becomes more ubiquitous and calls for responsibility mount, it will become increasingly important to understand which benchmarks researchers prioritize.

6. Generative AI investment skyrockets.

Despite a decline in overall AI private investment last year, funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in the generative AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial fundraising rounds.

In recent years, investments in generative AI have surged due to lofty expectations and ambitious promises. Between 2022 and 2023, investments increased eightfold, culminating in a staggering investment of US$ 25.2 billion.

Generative AI models, such as GANs (Generative Adversarial Networks) and VAEs (Variational Autoencoders), excel at creating new content. They can generate realistic images, text, music, and even entire video sequences. This versatility appeals to various industries, including entertainment, design, and marketing.

Considering that generative AI enhances recommendation mechanisms and creates personalized content, such as product recommendations, news articles, and social media posts, it drives user engagement and revenue, triggering broader investor interest.

7. The data is in: AI makes workers more productive and leads to higher quality work.

In 2023, several studies assessed AI’s impact on labour, suggesting that AI enables workers to complete tasks more quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge the skill gap between low- and high-skilled workers. Still, other studies caution that using AI without proper oversight can lead to diminished performance.

Evidence is crucial for us to consider public policies on Artificial Intelligence literacy. These policies are needed to foster local AI development and prepare the population for the future. The skills needed for the workforce — such as data literacy, critical thinking, and adaptability — are closely tied to AI literacy. Public policies should bridge this gap by emphasizing lifelong learning and upskilling.

Ensuring that citizens understand AI concepts, applications, and implications is vital. Crafting effective policies involves promoting AI education, addressing ethical concerns, and fostering equitable access. By integrating AI literacy into educational curricula and professional development, countries can prepare their workforce for the AI-driven future.

8. Scientific progress accelerates even further, thanks to AI.

In 2022, AI began to advance scientific discovery. 2023, however, saw the launch of even more significant science-related AI applications — from AlphaDev, which makes algorithmic sorting more efficient, to GNoME, which facilitates the process of materials discovery.

GraphCast is a new weather forecasting system that delivers highly accurate 10-day weather predictions in under a minute. Utilizing graph neural networks and machine learning, GraphCast processes vast datasets to forecast temperature, wind speed, atmospheric conditions, and more. It can be a valuable tool in deciphering weather patterns, enhancing preparedness for extreme weather events, and contributing to global climate research.

A team of Google researchers has used AI to develop highly accurate hydrological simulation models that are also applicable to ungauged basins. These innovative methods can predict certain extreme flood events up to five days in advance, with accuracy that matches or surpasses current state-of-the-art models, such as GloFAS. The model is open-source and is already being used to predict flood events in over 80 countries.

In 2023, several significant medical systems were launched, including EVEscape, which enhances pandemic prediction, and AlphaMissence, which assists in AI-driven mutation classification. AI is increasingly being utilized to propel medical advancements.

Over the past few years, AI systems have shown remarkable improvement on the MedQA benchmark, a key test for assessing AI’s clinical knowledge. The standout model of 2023, GPT-4 Medprompt, reached an accuracy rate of 90.2%, marking a 22.6 percentage point increase from the highest score in 2022. Since the benchmark’s introduction in 2019, AI performance on MedQA has nearly tripled.

9. The number of AI regulations in the United States sharply increases.

The number of AI-related regulations in the U.S. has risen significantly in the past year and over the last five years. In 2023, there were 25 AI-related regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations grew by 56.3%.

At the same time, according to the 2023 report, in the fiscal year 2023, U.S. government agencies allocated a total of $1.8 billion to AI research and development spending. The funding for AI R&D (Research & Development) has risen annually since 2018, more than tripling since then. For 2024, a larger budget of $1.9 billion has been requested.

In 2023, policymakers on both sides of the Atlantic put forth substantial AI regulatory proposals. The European Union reached a deal on the terms of the AI Act, a landmark piece of legislation enacted in 2024. Meanwhile, President Biden signed an Executive Order on AI, the most notable AI policy initiative in the United States that year.

Mentions of AI in legislative proceedings across the globe have nearly doubled, rising from 1,247 in 2022 to 2,175 in 2023. AI was mentioned in the legislative proceedings of 49 countries in 2023. Moreover, at least one country from every continent discussed AI in 2023, underscoring the truly global reach of AI policy discourse.

The number of U.S. regulatory agencies issuing AI regulations increased to 21 in 2023 from 17 in 2022, indicating a growing concern over AI regulation among a broader array of American regulatory bodies. Some of the new regulatory agencies that enacted AI-related regulations for the first time in 2023 include the Department of Transportation, the Department of Energy, and the Occupational Safety and Health Administration.

10. People across the globe are more cognizant of AI’s potential impact — and more nervous.

A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect their lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness toward AI products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests that 52% of Americans report feeling more concerned than excited about AI, rising from 37% in 2022.

In an Ipsos survey, only 37% of respondents feel AI will improve their jobs. Only 34% anticipate AI will boost the economy, and 32% believe it will enhance the job market.

Significant demographic differences exist in perceptions of AI’s potential to enhance livelihoods, with younger generations generally more optimistic. For instance, 59% of Gen Z respondents believe AI will improve entertainment options, versus only 40% of baby boomers. Additionally, individuals with higher incomes and education levels are more optimistic about AI’s positive impacts on entertainment, health, and the economy than their lower-income and less-educated counterparts.

About the author

Renata Liborio is a Project Manager at Version 1.