Reshaping the Boundaries of Computing: The Current State and Prospects of Decentralized Computing Power
Demand for Computing Power
Since the release of “Avatar” in 2009, which began a new era of 3D movies with its unparalleled realistic visuals, Weta Digital has been a significant contributor, handling all the visual effects rendering for the film. In their 10,000 square foot server farm in New Zealand, their computer cluster processed up to 1.4 million tasks per day, processing 8GB of data per second. Even with such capabilities, it took over a month of continuous operation to complete all rendering work. The massive machine deployment and investment in “Avatar” set a remarkable milestone in movie history.
On January 3 of the same year, Satoshi Nakamoto mined the genesis block of Bitcoin on a small server in Helsinki, Finland, earning 50 BTC as a block reward. Since the birth of cryptocurrency, computing power has played a vital role in the industry.
“The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power.” — Bitcoin Whitepaper
In the context of the PoW (Proof of Work) consensus mechanism, the investment in computing power ensures the security of the chain. The continual increase in Hashrate also signifies the miners’ ongoing investment in computing power and their positive revenue expectations. The real demand for computing power in the industry has significantly propelled the development of chip manufacturers. Mining chips have evolved through phases including CPU, GPU, FPGA, and ASIC. Currently, Bitcoin mining machines typically use ASIC (Application-Specific Integrated Circuit) chips designed to efficiently execute specific algorithms like SHA-256. The significant economic benefits brought by Bitcoin have also driven the demand for computing power in mining, but the highly specialized equipment and clustering effect have led to a siphoning effect among participants, whether miners or mining machine manufacturers, trending towards capital-intensive centralization.
With the advent of Ethereum’s smart contracts, its programmability and composability have led to a wide range of applications, especially in the DeFi sector, resulting in a continual increase in the price of ETH. Ethereum’s mining difficulty, still in the PoW phase, has also been rising. The requirement for Ethereum mining machines is also increasing, but unlike Bitcoin, which uses ASIC chips, Ethereum mining requires Graphics Processing Units (GPUs), like the Nvidia RTX series. This makes it more suitable for general-purpose computing hardware, leading to a market scramble for GPUs and causing a shortage of high-end graphics cards.
On November 30, 2022, ChatGPT, developed by OpenAI, also demonstrated epoch-making significance in the AI field. Users marveled at the new experience brought by ChatGPT, which, like a real person, could fulfill various requests based on context. The new version launched in September added features like voice and image generation, elevating the user experience to a new level.
However, GPT-4 involved over a trillion parameters in model pre-training and subsequent fine-tuning, representing the two most significant parts of AI’s demand for computing power. During pre-training, the model learns a vast amount of text to grasp language patterns, grammar, and context, allowing it to understand language rules and generate coherent, context-relevant text based on input. After pre-training, GPT-4 undergoes fine-tuning to better adapt to specific types of content or styles, enhancing performance and specialization for specific scenarios.
Due to the Transformer architecture used in GPT, which introduces a self-attention mechanism, this model can pay attention to relationships between different parts of an input sequence, leading to a sharp increase in demand for computing power, especially for processing long sequences, which require extensive parallel computation and storing a large number of attention scores, demanding significant memory and high-speed data transfer capabilities. The mainstream demand for high-performance GPUs in similar architecture LLMs (Large Language Models) indicates the substantial investment required in the AI big model field. According to estimates by SemiAnalysis, a single model training session for GPT-4 costs up to $63 million. To ensure a smooth interactive experience, GPT-4 also requires substantial computing power for daily operations.
Classification of Computing Hardware
Let’s understand the main types of computing hardware: CPU, GPU, FPGA, and ASIC, and how they cater to different computing needs.
From the architecture diagrams of CPU and GPU, it is evident that GPUs contain more cores, allowing them to process multiple computing tasks simultaneously. Their parallel processing capability is stronger, making them suitable for handling a large number of computing tasks. Therefore, they are widely used in machine learning and deep learning fields. CPUs, with fewer cores, are better suited for focusing on single complex computations or sequential tasks but are less efficient in parallel computing tasks. In rendering tasks and neural network computations, which often involve repetitive calculations and parallel processing, GPUs are more efficient and suitable than CPUs.
FPGA (Field Programmable Gate Array) is a type of semi-custom circuit in the ASIC (Application Specific Integrated Circuit) domain. Comprising an array of small processing units, FPGAs can be considered programmable digital logic circuit chips. Their current applications mainly focus on hardware acceleration, with other tasks still completed on CPUs, allowing FPGAs and CPUs to work in tandem.
ASICs, designed for specific user requirements and electronic system needs, have advantages over general integrated circuits in terms of smaller size, lower power consumption, improved reliability, enhanced performance, increased security, and reduced costs. Hence, in scenarios like Bitcoin mining where only specific computational tasks are needed, ASICs are the most fitting choice. Google has also introduced TPUs (Tensor Processing Units) for machine learning, a type of ASIC, mainly available through Google Cloud’s computing power rental services.
Compared to FPGAs, ASICs are fixed once designed, while FPGAs consist of a large number of digital circuit basic gates and memory integrated into an array. Developers can define the circuit by programming the FPGA configuration, and this programming is changeable. However, given the rapid pace of updates in the AI field, custom or semi-custom chips cannot be reconfigured in time to perform different tasks or adapt to new algorithms. Therefore, the general adaptability and flexibility of GPUs have made them shine in the AI field. Major GPU manufacturers have also optimized their GPUs for AI, such as Nvidia’s Tesla series and Ampere architecture GPUs, which include hardware units optimized for machine learning and deep learning computations (Tensor Cores), enabling them to execute neural network forward and backward propagation more efficiently and with lower energy consumption. They also provide a wide range of tools and libraries to support AI development, like CUDA (Compute Unified Device Architecture) to help developers utilize GPUs for general parallel computing.
Decentralized Computing Power
Decentralized computing refers to a method of providing processing power through distributed computing resources. This decentralized approach often combines blockchain technology or similar distributed ledger technologies to pool and distribute idle computing resources to users in need, achieving resource sharing, trading, and management.
Background of Emergence:
Strong demand for computing hardware. The prosperity of the creator economy has led to the era of mass digital media processing, resulting in an increased demand for visual effects rendering. Specialized rendering outsourcing studios and cloud rendering platforms have emerged, but these methods also require significant initial investment in computing hardware.
Single source of computing hardware. The development of the AI field has intensified the demand for computing hardware, with global GPU manufacturing enterprises, led by Nvidia, profiting handsomely in the AI computing race. Their supply capacity has even become a key factor that can constrain the development of certain industries, with Nvidia’s market value exceeding one trillion dollars for the first time this year.
Reliance on centralized cloud platforms for computing power. Currently, centralized cloud providers like AWS, which offer GPU cloud computing services, are the main beneficiaries of the surge in high-performance computing demand. For instance, renting an AWS p4d.24xlarge machine, specialized in ML and HPC, with 8 Nvidia A100 40GB GPUs, costs $32.8 per hour, with an estimated gross margin of up to 61%. This has prompted other cloud giants to join the fray, hoarding hardware to gain an advantageous position in the industry’s early stages.
Political and human interventions leading to uneven industry development. It’s not hard to see that the ownership and concentration of GPUs are skewed towards organizations and countries with abundant capital and technology, and they are heavily reliant on high-performance computing clusters. This has led countries like the United States, a leading semiconductor manufacturing powerhouse, to impose stricter restrictions on AI chip exports, aiming to weaken other countries’ research capabilities in general-purpose AI.
Centralized distribution of computing resources. The AI field’s development is largely controlled by a few giant companies, with OpenAI, backed by Microsoft’s Azure’s abundant computing resources, leading the way. Each new product launch by OpenAI reshapes and consolidates the AI industry, leaving other teams struggling to compete in the large model domain.
So, facing high hardware costs, regional restrictions, and uneven industry development, is there an alternative solution?
Decentralized computing platforms have emerged in response, aiming to create an open, transparent, and self-regulating market to more effectively utilize global computing resources.
Adaptability Analysis
Supply Side of Decentralized Computing Power
The high prices of hardware and the artificial control on the supply side have provided fertile ground for the construction of decentralized computing power networks.
Looking at the composition of decentralized computing power, a variety of computing power providers range from personal PCs and small IoT devices to data centers and IDCs. The accumulated computing power offers more flexible and scalable computing solutions, helping more AI developers and organizations use limited resources more effectively. However, the availability and stability of these computing powers are limited by the users’ usage restrictions or sharing limits.
A potential high-quality source of computational power could be the resources provided by mining farms transitioning after Ethereum’s switch to Proof of Stake (PoS). For example, Coreweave, a leading GPU-integrated computational power provider in the United States, was formerly North America’s largest Ethereum mining farm, built on a robust infrastructure. Additionally, retired Ethereum mining machines, including a large number of idle GPUs (estimated at around 27 million working GPUs at the peak of Ethereum’s mining era), could further serve as a significant source of computational power for decentralized networks.
Demand Side of Decentralized Computational Power
Technically, decentralized computational resources are currently used in tasks like graphic rendering and video transcoding, which have lower computational complexity. Combined with blockchain technology and the Web3 economic system, these tasks provide tangible profit incentives for network participants, developing effective business models and customer bases. The AI field, however, involves extensive parallel computing and requires high standards for network environment due to the need for communication and synchronization between nodes. Therefore, applications in this area are currently focused more on fine-tuning, inference, and AI-generated content (AIGC).
From a business perspective, a market based solely on computational power lacks imagination, leading to competition in supply chains and pricing strategies. These are areas where centralized cloud services excel. Hence, the market ceiling is low with limited imaginative space. This is evident as networks originally focusing on graphic rendering, like Render Network, are seeking AI transformation. For instance, in Q1 2023, Render Network launched an integrated Stability AI toolset, allowing users to bring in Stable Diffusion jobs, extending its services beyond rendering to the AI domain.
In terms of primary customer base, large B2B clients tend to prefer centralized integrated cloud services due to their ample budgets and the need for efficient computational power aggregation for developing fundamental large models. Decentralized computational power, on the other hand, caters more to small and medium-sized development teams or individuals working on model fine-tuning or application layer development. These users are more price-sensitive and benefit from the fundamental cost reduction of decentralized computational power. For instance, Gensyn’s previous cost estimates show that their computational power, equivalent to V100, costs only $0.4 per hour, compared to AWS’s $2 per hour for the same type, a reduction of 80%. Though this segment doesn’t currently dominate industry expenses, the expanding use cases of AI applications suggest a significant future market potential.
In terms of services provided, current projects resemble decentralized cloud platforms, offering a complete set of tools for development, deployment, launch, distribution, and trading. This attracts developers by simplifying development and deployment, thereby enhancing efficiency. It also attracts users to the platform to use these complete application products, forming an ecosystem moat based on the network’s own computational power. However, this also imposes higher operational demands on the projects, making it crucial to attract and retain skilled developers and users.
Application in Different Fields
1. Digital Media Processing
Render Network is a blockchain-based global rendering platform aimed at assisting creators with digital creativity. It enables creators to scale GPU rendering jobs to global GPU nodes on demand, offering faster and cheaper rendering capabilities. Payments to nodes are made in tokens via the blockchain network after creators approve the rendered results. This approach is more cost-effective compared to traditional methods of establishing local rendering infrastructure or adding GPUs in purchased cloud services.
Since its inception in 2017, Render Network users have rendered over 16 million frames and nearly 500,000 scenes. The data released in Q2 2023 by Render Network shows a growing trend in the number of rendered frames and active nodes. Additionally, in Q1 2023, Render Network integrated the Stability AI toolset natively, allowing users to incorporate Stable Diffusion tasks, expanding its services beyond rendering to AI.
Livepeer offers real-time video transcoding services to creators by leveraging the GPU power and bandwidth contributed by network participants. Broadcasters can send videos to Livepeer for various transcoding processes and distribute them to end users, facilitating the dissemination of video content. Payments can be conveniently made in fiat currency for services like video transcoding, transmission, and storage.
In the Livepeer network, anyone can contribute personal computing resources (CPU, GPU, and bandwidth) for transcoding and distributing videos to earn fees. The native token (LPT) represents participants’ stake in the network, and the amount of staked tokens determines a node’s weight in the network, influencing its chances of receiving transcoding tasks. LPT also guides nodes to complete assigned tasks securely, reliably, and quickly.
2. Expansion in the AI Field
In the current AI ecosystem, the requirements for computational power vary at different stages of the industry. For instance, in the development of underlying models, the pre-training phase demands high standards for parallel computing, storage, and communication, requiring large computational clusters. Currently, the main supply of computational power primarily relies on self-built data centers and centralized cloud service platforms. In later stages like model fine-tuning, real-time inference, and application development, the requirements for parallel computing and inter-node communication are not as high, providing an opportunity for decentralized computational power.
Akash Network has made some attempts in decentralized computational power. It combines various technical components, allowing users to efficiently and flexibly deploy and manage applications in a decentralized cloud environment. Users can package applications using Docker container technology and then deploy and scale them on Akash-provided cloud resources via CloudMOS. Akash uses a “reverse auction” method, making its prices lower than traditional cloud services. In August of this year, Akash Network announced the 6th major upgrade to its mainnet, incorporating GPU support into its cloud services, expanding computational power supply to more AI teams.
Gensyn.ai, a notable project in the industry this year led by a16z with a $43 million Series A funding, is a mainnet based on the Polkadot network’s L1 PoS protocol, focusing on deep learning. It aims to push the boundaries of machine learning by creating a global supercomputing cluster network. This network connects a variety of devices, from data centers with surplus computational power to PCs with potential personal GPU contributions, as well as custom ASICs and SoCs.
To address some issues in decentralized computational power, Gensyn has drawn on new academic research findings:
1.Adopting probabilistic learning proofs, using metadata from gradient-based optimization processes to build proofs for task execution, speeding up the verification process.
2.The Graph-based Pinpoint Protocol (GPP) serves as a bridge connecting the offline execution of DNN (Deep Neural Network) with the blockchain’s smart contract framework, solving inconsistencies across hardware devices and ensuring consistent verification.
3.Incentives similar to Truebit, using a combination of staking and penalties, establish a system that allows economically rational participants to honestly execute assigned tasks. This mechanism uses cryptographic and game theory methods, essential for maintaining the integrity and reliability of large-scale model training computations.
However, it’s important to note that the above content focuses more on task completion verification, rather than on decentralized computational power for model training, especially regarding the optimization of parallel computing and communication/synchronization among distributed hardware. Current challenges like network latency and bandwidth limitations can increase iteration time and communication costs, potentially reducing training efficiency. Gensyn’s approach to managing node communication and parallel computing in model training might involve complex coordination protocols to manage the distributed nature of computing. However, without more detailed technical information or a deeper understanding of their specific methods, the exact mechanism by which Gensyn achieves large-scale model training through its network remains to be revealed upon project launch.
Edge Matrix Computing (EMC) protocol, which applies computational power to various scenarios such as AI, rendering, scientific research, and AI e-commerce integration through blockchain technology. It distributes tasks to different computational nodes using elastic computing, improving the efficiency of computational power usage and ensuring data transmission security. EMC also offers a computational power marketplace for users to access and exchange computing resources, facilitating easier deployment for developers and faster reach to users. Combined with the Web3 economic model, it enables computational power providers to earn real profits and protocol subsidies based on actual user usage, while AI developers benefit from lower inference and rendering costs. Here is an overview of its main components and functionalities:
EMC is expected to launch GPU-based RWA (Real-World Asset) products, which aim to mobilize hardware typically fixed in data centers, by dividing and circulating them as RWAs to gain additional financial liquidity. High-quality GPUs can serve as the underlying assets for RWAs because computational power is a hard currency in the AI field, with a clear supply-demand contradiction that is not expected to be resolved in the short term, thus stabilizing GPU prices.
Deploying IDC (Internet Data Center) rooms to create computational clusters is also a key focus of the EMC protocol. This not only allows GPUs to operate efficiently in a unified environment for large-scale computational tasks like model pre-training but also meets the needs of professional users. Additionally, IDC rooms can centrally manage and operate a large number of GPUs, ensuring uniform technical specifications of similar high-quality hardware, making it convenient to package them as RWA products for the market and introduce new thoughts in DeFi.
Recent academic developments in edge computing have introduced new technological theories and practical applications. As a complement and optimization of cloud computing, a part of artificial intelligence is rapidly moving from the cloud to the edge, entering increasingly smaller IoT devices. These devices are often small in size, hence the preference for lightweight machine learning to address issues of power consumption, latency, and accuracy.
Network3 has built a specialized AI Layer2, offering services to AI developers globally through AI model algorithm optimization and compression, federated learning, edge computing, and privacy computing. It enables them to train or validate models quickly, conveniently, and efficiently. By leveraging a large number of intelligent IoT hardware devices, Network3 focuses on smaller models for computational power supply. It also constructs a TEE (Trusted Execution Environment) that allows users to complete related training by uploading only model gradients, ensuring the privacy and security of user data.
Conclusion
Based on the above discussion, as fields like AI continue to develop, many industries will undergo significant fundamental changes. Computational power will rise to a more prominent position, and all related aspects will spark widespread exploration within the industry. Decentralized computational power networks have their own advantages, capable of reducing centralized risks, and can complement centralized computational resources.
Moreover, teams in the AI field are at a crossroads, deciding whether to use pre-trained large models to build their own products or to participate in training large models within their own regions. Such choices are often dialectical. Therefore, the ability of decentralized computational power to meet diverse business needs is a welcome development trend. With technological updates and algorithm iterations, breakthroughs are bound to occur in key areas.
— — — — — — — — — — — — — — — - END — — — — — — — — — — — — — — — — — — — -