A looming electricity shortage and other AI dinner-table gossip

Chase Roberts
Vertex Ventures US
Published in
4 min readDec 8, 2023

Vertex US hosted an intimate dinner with executives and researchers from institutions including Google, OpenAI, Cohere, NVIDIA, and UC Berkeley. Like any good AI discussion, the conversation covered topics like AI adoption in the enterprise (companies don’t know where to start), prompting (it will be automated), agents (the future is bright), and OpenAI DevDay announcements (😅). This dinner occurred before Sam Altman’s round trip to/from OpenAI; otherwise, I’m sure this mesmerizing saga would have consumed the evening’s chatter. Beyond the standard AI fodder, a few nuggets stood out from the conversation.

Will we run out of electricity before GPUs?

AI workloads favor GPUs over CPUs because they support parallel processing. Comparable workloads on CPUs would take decades longer to execute, so GPUs should remain in vogue for the foreseeable future. As AI applications increase in prominence — and we expect they will, thanks to recent breakthroughs in large language models (LLMs) — so will the demand for GPUs and the corresponding data centers to host them. Due to greater power and cooling requirements and alternative physical configurations, it’s unlikely that data center providers will be able to retrofit existing data centers for GPUs easily. Not to mention, operators are holding onto old servers for more extended periods than expected, meaning existing capacity won’t turnover quickly. On the demand side, analysts expect the data center GPU market to grow 4.5x over the next five years to $63B.

Source: Midjourney (“one transformer giving a lightbulb to another transformer”)

Here’s the thing, though: transformers need transformers. Yes, NVIDIA has more demand for its powerful GPUs than it can supply. But before we have to worry about hitting peak GPU, the looming specter of an upcoming electricity shortage could present an even bigger problem.

The typical data center requires 100–300 megawatts of electricity. GPU workloads require 2–3x the power consumption, suggesting an equivalent data center could require upwards of a gigawatt of electricity. Well, The evening’s discourse suggested that global power grids may lack the transformer stations needed to support the pending GPU data center infrastructure.

Prompting will become programming

Natural language prompts are the only way to communicate with large language models. Prompt engineering describes the revelatory process of identifying the word combination to maximize the performance of an LLM for a given task. Prompts aren’t transferable between models and represent the lowest level of communication between developers and the powerful LLMs. In that way, prompts are analogous to assembly languages, where code is written directly to the processor without the benefit of compilers or other abstractions.

Source: Midjourney (“Prompting becomes programming”)

We all know that few developers write software in assembly language today, thanks to the wide availability of higher-level languages and compilers. Higher-level languages like Python, JavaScript, and Rust are transferrable across hardware, abstracting pesky binary code tedium from developers. The same will happen for prompting. Developers will program declaratively, and compiler-like functions will translate these tasks, SLAs, and I/O guarantees into optimized prompts. The prompts will compile differently depending on the underlying model, offering much more modularity for developers and masking the stochastic nature of these powerful predictive programs. Recent research projects like DSPy and Promtbreeder are early examples of these coming abstractions.

Model ensembles replace foundational model providers

My dinner colleagues described a frequently observed development lifecycle among early LLM adopters, in which AI developers prototype applications using foundational model (FM) providers like OpenAI and then transition to a combination of fine-tuned open-source (OSS) models overlayed by FM models for output analysis. The core desire driving this pattern is reducing the costs of the primary workload by shifting to fine-tuned models. Using the more powerful FM provider to check the fine-tuned models’ outputs requires far less processing, suggesting minimal incremental costs. The net effect of this strategy is optimizing for faster prototyping and then costs-to-performance after that.

Source: Midjourney (“combining large language models”)

Lastly, there are early signs that fine-tuning embeddings instead of language models offers better price and effort to performance. I’d add another section revealing the anecdotes and research, but these insights were shared off the record. 😉

Vertex US regularly hosts AI and infrastructure dinners like these. If you’re a researcher or practitioner interested in attending a future meeting of the minds, email me at chase@vvus.com.

--

--