The Rise of AI and GPU Shortages: How Blockchain Alleviates Machine Learning Bottlenecks
With AI’s trajectory and increasing demand for GPUs, the machine learning industry faces problems with GPU costs and accessibility. Let’s look at how blockchain technology offers solutions.
Authored by Accelerated Compute Engineering Lead Tommy Eastman.
The GPU Industry
In the past year, AI-based applications and integrations have grown tremendously. OpenAI’s ChatGPT became the fastest-growing application of all time, reaching 100M monthly active users just two months after launching. For comparison, TikTok took nine months, and Instagram took 18 months to achieve that same milestone.
The demand for AI has significantly impacted the value and availability of Graphics Processing Units (GPUs). GPUs are processing units optimized for performing parallel calculations, processing many pieces of data simultaneously — making them useful for machine learning, video editing, and gaming applications. Demand for GPUs has increased as they are multi-purposed in the AI pipeline.
GPUs are developed and distributed by a select few companies, which is apparent in manufacturing supply chain delays. They’ve been tightly associated with the blockchain industry since the 2017 bull run, and the 2018 shortage when Ethereum proof-of-work miners bought almost all available GPUs. The Ethereum blockchain has since moved to proof-of-stake, but with the explosion of AI, blockchain technology still presents helpful solutions to common problems around obtaining GPUs, the cost of training, distributed inference, and more.
Machine Learning Process and Bottlenecks
Machine learning is a vast and rapidly expanding industry. The training of a model is generally broken into a few steps, each posing certain bottlenecks.
1. Foundation Model Training
Foundation Model Training includes taking large data sets (e.g., Wikipedia) and training an initial base model to be used as a general intelligence model or eventually fine-tuned. It uses learned patterns and relationships to predict the next item in a sequence.
For example, image generation models are trained to associate image patterns with corresponding text, so when given text inputs, they produce images based on those learned patterns. Similarly, with text, the model predicts the next word in a string of text based on the previous words and context.
The training of foundation models is expensive in terms of labor, infrastructure, time, and energy. The cost is compounded by the current supply chain difficulty in obtaining state-of-the-art NVIDIA GPUs, even for companies with significant capital.
For instance, the iterative training of OpenAI’s GPT-3 spanned several months and consumed millions of dollars in energy costs alone. Consequently, the training of foundation models remains a prohibitively expensive endeavor, within reach only for a select few private enterprises.
Notably less resource-intensive than foundation model training, fine-tuning optimizes a model for a specific task (e.g., a language model learning a new vernacular). Foundation models’ performance on specific tasks can be increased drastically with fine-tuning.
While GPU scarcity affects all three areas, fine-tuning is impacted the least. However, fine-tuning is entirely dependent on foundation models being open-sourced. If private companies decide to stop open-sourcing their models, community models will lag behind State-Of-The-Art (SOTA) models at an astonishing rate.
Accessing the models represents the last step of the pipeline — such as receiving an answer to your question from ChatGPT, a generated image based on user prompts on Stable Diffusion — necessitating GPU resources for model querying. Inference is rapidly escalating in its computational demands, particularly in terms of GPU expenditure.
Inference encompasses both end users and developers who incorporate models into their applications. It’s the pathway for ensuring the economic viability of the model. This concept is critical to integrating AI systems into society, and its significance is displayed by the rapid rate of adoption across end users actively engaging with tools like ChatGPT.
GPU scarcity is driving inference costs up rapidly. While the baseline requirements for inference are low compared to foundation model training, the scale at which companies are deploying applications demands an incredible amount of load on GPUs querying the models. As GPU model diversity increases (through fine-tuning and new foundation model development), application diversity will increase, and GPU demand derived from inference will increase drastically.
Blockchain offers solutions to machine learning bottlenecks.
In the past, GPUs were used to mine Ethereum and other proof-of-work coins. Now, blockchain is looked at as a unique opportunity to provide access and increase coordination across bottlenecks in the GPU space — specifically for machine learning.
Significant upfront capital is required for large-scale GPU deployments. This prevents all but the largest companies from developing in the space. Blockchain incentives create the potential for GPU owners to earn from spare compute, creating a cheaper and more accessible market for users.
Anyone can supply/utilize the compute, host a model, and query a model — a stark difference from needing to be in a beta or having limited access in the traditional space.
A significant feature blockchain can provide to the machine learning space is distributed access. Traditionally, large datacenters are needed, as FMT hasn’t been done at scale across non-clustered GPUs. Protocols are attempting to tackle this issue, and if successful, will open the floodgates for FMT.
Blockchain marketplaces help coordinate GPU purchasing, allowing people and companies that own GPUs to find others who want to rent them, rather than have them sit idle. Generating income while GPUs would otherwise sit idle can help offset the upfront costs of purchasing GPUs, allowing more entities to participate in GPU hosting.
Foundry’s Commitment to Responsible AI
The blockchain machine learning space is a fledgling industry that has very few projects on mainnet. Currently, Foundry is supporting the Bittensor AI project, as well as Akash, which is proving to be a meaningful way to advance distributed AI.
Bittensor is a decentralized and permissionless inference network that allows easier access to models and creates a cheaper model marketplace with crypto incentives. Anyone can host a model, and user prompts are matched with the top-ranked model for a given modality.
Bittensor has grown to be one of the largest AI projects in crypto, leveraging blockchain to create a large-scale ranked inference network. The network recently released subnetworks that incentivize different modalities, including image generation, prediction markets, and more.
Foundry is validating and mining on the network, and running proof-of-authority nodes to secure consensus.
Akash is a general compute marketplace that allows easier access to GPUs at scale, more foundation models to be trained, and drives down the costs of GPUs.
Akash recently launched their GPU marketplace with a similar goal of reducing onboarding overhead, lowering GPU compute costs, and increasing accessibility. Foundation model training is planned for development on Akash.
Foundry is providing GPU compute to the network and working with the team to develop features.
As machine learning continues to be integrated into businesses, the demand for GPUs will continue to skyrocket — posing ongoing supply chain issues in the machine learning space. Blockchain technology offers a bridge to accessing lower-compute cost GPUs by allowing distributed access to models and creating a cheaper model marketplace with crypto incentives. At Foundry, we’re committed to participating both as node operators and compute providers in AI-related networks that support these advancements.
The contents of this post have been provided by Foundry Digital LLC (“Foundry” or “we”) for informational purposes only, and should not be construed as giving legal, financial or any other kind of advice. Although we strive to provide quality information, we do not guarantee or warrant any particular results from the use of this information or any opinions provided. Foundry accepts no liability whatsoever for any damages, costs or any other consequences resulting from any actions taken on the basis of the information or opinions provided. Furthermore, Foundry has no control over information provided in any third-party sites linked herein, and Foundry accepts no liability whatsoever over any consequences resulting from any actions taken on the basis of that information. Foundry reserves the right to make changes to this information at any time without prior notice and makes no commitment to update the information contained in this post.