Scale AI: The Data Foundry for AI

DFJ Growth
DFJ Growth News
4 min readMay 24, 2024


By Randy Glein, Sam Fort, Kevin Tu, and Brian Akin

Scale AI is on a path to be a major driver and beneficiary of the emerging AI megacycle.

When ChatGPT burst onto the scene in late 2022, it signaled the dawn of a new era of technological advancement and human-machine interaction in modern society. We have entered the era of generative AI. With a simple prompt backed by massive computing power, generative AI can now answer complex questions about nearly any topic, create mind-blowing art and music, optimize business operations, and push the boundaries of scientific discovery with unprecedented speed and capabilities.

As these advancements in machine learning and computing capacity reshape industries and redefine possibilities, data has become the new currency of the AI economy. The AI Scaling Laws dictate that artificial intelligence systems improve exponentially as the quality and quantity of data they are built on increases. As society races to harness the power of artificial intelligence, the need for higher-quality data has never been greater to ensure access to verifiable truths while combatting the potential for false or misleading content. Data accuracy is central to meeting this challenge.

Building the AI Data Foundry

We are thrilled to announce our investment in Scale AI’s $1 billion Series F financing as part of our expanding thesis and growing portfolio in the generative AI movement.

Since its inception in 2016, Scale AI has been on a relentless mission to revolutionize how data is made available and utilized in AI development.

This role has positioned Scale AI as the data foundry for AI, fueling exciting advancements since its inception eight years ago.

During the industrial revolution, foundries were the beating heart of progress. These factories transformed raw materials into the engines, rails, and machinery that powered society’s rapid advancement and economic expansion. In the AI revolution, Scale AI plays a similar, pivotal role, leveraging data as the key input that enables AI. Just as foundries turned ore into the material that produced engines, turbines, skyscrapers and bridges, Scale AI molds raw unstructured data into the engine that powers the AI economy.

Transforming Data to Accelerate AI Development

Scale AI sits at the center of the generative AI movement, ensuring model builders and enterprises have access to the rich, organized, and accurate data needed to deploy AI confidently. The Scale Data Engine leverages human insight to generate vast amounts of supervised fine-tuning data and apply human preferences to model outputs for reinforcement learning from human feedback (RLHF). This process is crucial for companies developing AI systems, as high-quality data directly correlates to more accurate and reliable outcomes.

Scale AI’s network today encompasses a global network of data experts consisting of tens of thousands of active contributors with capabilities across 70+ languages, 30+ coding languages, and 40+ subject domains.

Operating atop this foundation, the Scale GenAI Platform (SGP) streamlines enterprise-grade development and deployment of Generative AI applications. Using SGP, enterprises can leverage their own proprietary data and supplement it with Scale AI’s Data Engine to rapidly fine-tune, test, and deploy custom generative AI applications purpose-built for their own unique needs.

The Scale Data Engine and SGP have created a data flywheel for the company, powering an internal catalog of specialized AI applications and proprietary datasets that customers can use to jumpstart their AI journey across myriad industries and use cases. This expanding trove of emerging applications features Donovan, a chat interface for government and national defense information that can surface critical national security insights, and AFM-1, a foundation model trained specifically for enabling autonomous vehicles.

The company has reached escape velocity in recent years, demonstrating exceptional growth at significant revenue scale.

Scale AI’s value is evidenced through its close partnerships with leading AI innovators like OpenAI*, Microsoft, Meta, Nvidia, US DoD, Toyota, and GM. Led by founder and CEO, Alexandr Wang, who left MIT at age 19 so start the company, the company has rapidly adapted and expanded its product portfolio to keep pace with the advancing AI landscape. Alex has noted that AI is built from three fundamental pillars: data, compute, and algorithms, where Scale AI supplies the data pillar that advances AI by fueling its entire development lifecycle. Today, Scale AI delivers the data layer to power nearly every leading AI model and has a broad array of partnerships across these pillars.

Scale AI team

Scaling Into the Future

Scale AI is positioned at the center of the AI value chain alongside the foundation layer of large language models (LLMs) and major compute infrastructure providers. This has set the company on a path to be a major driver and beneficiary of the emerging AI megacycle. We are in the early stages of this technological revolution, and Scale AI is uniquely positioned to enable the market’s exponential growth. Their pioneering work has become a key enabler of the AI wave, and we are honored to partner with Alex and the entire Scale team on their mission to forge the future of AI.

* DFJ Growth Portfolio Company