Issues with Centralized GPU Resources in the Tech Industry
The monopolization of GPU resources by major cloud service providers is a critical issue that demands immediate attention in the computing industry. Their dominance has resulted in exorbitant costs, restricted accessibility, and data privacy concerns.
Cloud computing emerged as a game-changer in the late 2000s, enabling companies to access computing resources like servers, storage, and applications over the internet, eliminating the need for expensive on-premises data centers. However, today, a few hyperscalers, including AWS, GCP, and Azure, control the cloud-based infrastructure market, creating significant challenges for users.
With the proliferation of Large Language Models (LLMs) and compute-intensive AI applications, the demand for high-performance GPUs has reached an all-time high. From large corporations to bootstrapping startups, everyone is scrambling to secure their access to compute.
Current Issues/Problems with GPUs
The exponential growth of compute-intensive AI has created an unprecedented demand for high-end GPUs, vastly outpacing the available supply. Unfortunately, most of these GPUs are controlled by the hyperscalers who use their position to control market prices. This has resulted in accusations of price gouging, which cannot be ignored. AI researchers and developers require multiple GPUs for distributed training, forcing them to rely on major clouds for GPU access. This creates vendor lock-in, limiting their bargaining power and ability to switch if prices hike. It’s high time that this issue is addressed and resolved to ensure a level playing field for all stakeholders in the AI industry.
For example, an Ohio instance on AWS costs $4.10/hr/GPU. According to Amazon’s balance sheet and some outside research, the marginal COGS (cost of goods sold) for delivering the instance is $1.60 (equating to a 61% gross margin), and the marginal operating expense amounts to $1.50 (equating to a 24.3% operating margin in Q4 2023). Microsoft has an even higher gross margin for its cloud services: 72%.
The skyrocketing demand for AI compute has resulted in an acute shortage of GPUs, leading to capacity constraints across major clouds. AWS, Azure, and GCP have waiting lists and limited GPU quotas, with customers waiting for weeks or even months for access. This delay in development and under-provisioning is unacceptable, especially when buying GPUs, which is also challenging, with Nvidia being stocked out until possibly mid-2024. It is high time we acknowledge that the centralized nature of GPU cloud infrastructure makes it impossible to handle today’s AI compute needs.
Moreover, users are often forced to accept permissioned, vendor-locked access, knowing that the desired GPUs may not even be available. It’s common for the highest-performance GPUs to be prioritized for reserved instances, which are expensive and often reserved for the largest customers, who prioritize control and access over cost.
To make matters worse, many alternate data centers globally have declining GPU utilization. They are left collecting dust over time, with people unable to access them due to a lack of awareness and a relationship. The lack of robust tooling that hyperscalers provide, coupled with regulatory concerns over the origins of the compute, makes it even harder to use these alternatives.
Ultimately, it’s the innovators who are driving the AI/ML industry forward that are getting hurt here. It’s clear the intelligence revolution driven by AI/ML will be powered by compute, yet with the current GPU bottleneck, startups, individual consumers, and researchers who normally drive creative innovation cannot experiment. Looking at past innovation revolutions, we see that a surplus in some commodities powered them, whether it was steam, electricity, or access to the internet. The GPU shortage bottleneck is inherently causing Open AI to become Closed AI, powered only by the largest tech companies in the world. This situation is unacceptable and needs to be addressed immediately.
We will uncover the implications of centralization in the GPU resource market below.
1. High Pricing and Cost Barriers
The pricing models dominant cloud service providers established can be prohibitively expensive, particularly for smaller entities and individual developers. This high cost of accessing GPU resources creates barriers to entry and innovation, limiting the democratization of advanced computing technologies.
2. Limited Customization and Flexibility
Users often encounter restrictions in terms of customization and flexibility when relying on centralized providers. The availability of only predefined instances and configurations limits the ability of users to tailor resources to their specific needs, potentially hindering optimal performance and efficiency.
3. Vendor Lock-in
Vendor lock-in is a prevalent challenge, with users becoming heavily dependent on a single provider’s services, tools, and APIs. This dependence can challenge migration to other platforms, reducing market competition and potentially stifling innovation.
4. Data Privacy and Security Concerns
Due to the concentration of user data, centralized structures are susceptible to cyber-attacks and data breaches. Concerns regarding these providers’ control and access to user data have raised significant privacy and security risks, necessitating alternative solutions for safeguarding information.
5. Innovation Stifling
The concentration of GPU resources in the hands of a few is not just a challenge but a potential threat to innovation. It restricts access and affordability, making it difficult for smaller players to compete in a market dominated by well-resourced corporations. This limitation on diversity and advancement of computing applications is a call for action.
6. Market Concentration and Reduced Competition:
The dominance of a few major players leads to market concentration, which in turn diminishes competition among cloud service providers. A competitive deficit can result in fewer incentives for service improvement, price reduction, and innovation.
Decentralized Compute as a Solution
Decentralized compute is the ultimate solution to tackle the limitations of centralized GPU resources. A peer-to-peer cloud model allows anyone with computing resources to offer their resources in exchange for compensation. Specific networks have evolved to provide GPUs for AI-specific workloads, making it possible for AI researchers, startups, and other users to tap into a global network of hundreds to thousands of GPUs.
On the supply side, anyone with a spare GPU can join a decentralized network and offer their computing power. They can install software to connect their hardware to a decentralized compute platform, and their GPUs will be matched with demand from users wanting to train ML models. This way, they can make extra money when their GPUs are unused.
Decentralized platforms use blockchains for transparency, encryption, and incentives to facilitate transactions between individual GPU suppliers and model trainers. Smart contracts guarantee payment for contributors while keeping order in these peer-to-peer networks.
Decentralized compute is an excellent way to democratize access to AI model training. It helps distribute power away from Big Tech clouds and makes innovations in ML more accessible and affordable. By fostering a network where individuals around the globe can contribute their GPU power, we’re not just building, but BUIDLing a more inclusive and democratized computing ecosystem.
Given these benefits, companies and developers, especially those immersed in AI and high-performance computing, are turning to decentralized solutions like Spheron Network. With decentralized compute, we can realize the full potential of AI and build a better future for all.
Originally published at https://blog.spheron.network on May 6, 2024.