Sora Emerges: Will 2024 Be the Year of AI+Web3 Revolution?

YBB
YBB Capital
Published in
12 min readFeb 23, 2024

Author: Zeke, YBB Capital

Foreword

On February 16th, OpenAI announced its latest text-to-video generative diffusion model named “Sora,” marking another milestone in generative AI with its ability to produce high-quality videos across a wide range of visual data types. Unlike AI video generation tools like Pika, which generate a few seconds of video from multiple images, Sora trains in the compressed latent space of videos and images, breaking them down into spatiotemporal patches for scalable video generation. Moreover, the model demonstrates capabilities of simulating both physical and digital worlds, with its 60-second demo described as a “universal simulator of the physical world.”

Sora continues the technical path of “source data-Transformer-Diffusion-emergence” seen in previous GPT models, indicating its development maturity also relies on computational power. Given the larger data volume required for video training compared to text, the demand for computational power is expected to increase further. However, as discussed in our earlier article “Promising Sector Preview: The Decentralized Computing Power Market,” the importance of computational power in the AI era has been explored, and with the rising popularity of AI, numerous computational power projects have emerged, benefiting other Depin projects (storage, computational power, etc.) with a surge in value. Beyond Depin, this article aims to update and complete past discussions, pondering the sparks that might arise from the intertwining of Web3 and AI and the opportunities within this trajectory in the AI era.

The Three Major Directions in the Development of AI

Artificial Intelligence (AI) is an emerging science and technology aimed at simulating, extending, and enhancing human intelligence. Since its inception in the 1950s and 1960s, AI has evolved over half a century and has now become a key technology driving the transformation of social life and various industries. During this process, the intertwined development of three major research directions — symbolism, connectionism, and behaviorism — has laid the foundation for the rapid advancement of AI today.

Symbolism

Also known as logicism or rule-based reasoning, symbolism believes that simulating human intelligence through the processing of symbols is feasible. This approach uses symbols to represent and manipulate objects, concepts, and their relationships within a problem domain, and employs logical reasoning to solve problems. Symbolism has achieved significant success, especially in expert systems and knowledge representation. The core idea of symbolism is that intelligent behavior can be realized through the manipulation of symbols and logical reasoning, where symbols represent a high-level abstraction of the real world.

Connectionism

Or known as the neural network approach, aims to achieve intelligence by mimicking the structure and function of the human brain. This method constructs networks composed of numerous simple processing units (similar to neurons) and adjusts the strength of connections between these units (similar to synapses) to facilitate learning. Connectionism emphasizes the ability to learn and generalize from data, making it particularly suitable for pattern recognition, classification, and continuous input-output mapping problems. Deep learning, as an evolution of connectionism, has made breakthroughs in fields such as image recognition, speech recognition, and natural language processing.

Behaviorism

Behaviorism is closely related to the research of biomimetic robotics and autonomous intelligent systems, emphasizing that intelligent agents can learn through interaction with the environment. Unlike the previous two, behaviorism does not focus on simulating internal representations or thought processes but achieves adaptive behavior through the cycle of perception and action. Behaviorism posits that intelligence is manifested through dynamic interaction with the environment and learning, making it especially effective for mobile robots and adaptive control systems operating in complex and unpredictable environments.

Although these three research directions have fundamental differences, they can interact and integrate with each other in practical AI research and applications, collectively propelling the development of the AI field.

The Principles of AIGC

The explosively developing field of Artificial Intelligence Generated Content (AIGC) represents an evolution and application of connectionism, enabling the generation of novel content by mimicking human creativity. These models are trained using large datasets and deep learning algorithms, learning the underlying structures, relationships, and patterns within the data. Based on user prompts, they generate unique outputs including images, videos, code, music, designs, translations, answers to questions, and text. Currently, AIGC is fundamentally composed of three elements: Deep Learning (DL), Big Data, and Massive Computational Power.

Deep Learning

Deep learning, a subfield of Machine Learning (ML), employs algorithms modeled after the human brain’s neural networks. For instance, the human brain consists of millions of interconnected neurons working together to learn and process information. Similarly, deep learning neural networks (or artificial neural networks) are composed of multiple layers of artificial neurons working together within a computer. These artificial neurons, known as nodes, use mathematical computations to process data. Artificial neural networks utilize these nodes to solve complex problems through deep learning algorithms.

Neural networks are divided into layers: input layer, hidden layers, and output layer, with parameters connecting different layers.

· Input Layer: The first layer of the neural network, responsible for receiving external input data. Each neuron in the input layer corresponds to a feature of the input data. For example, in processing image data, each neuron might correspond to a pixel value of the image.

· Hidden Layers: The input layer processes data and passes it further into the network. These hidden layers process information at different levels, adjusting their behavior upon receiving new information. Deep learning networks have hundreds of hidden layers, which can analyze problems from multiple perspectives. For instance, when presented with an image of an unknown animal that needs classification, you can compare it with animals you already know by examining ear shapes, the number of legs, pupil size, etc. Hidden layers in deep neural networks work in a similar manner. If a deep learning algorithm attempts to classify animal images, each hidden layer processes different features of the animal and tries to classify it accurately.

· Output Layer: The final layer of the neural network, responsible for generating the network’s output. Each neuron in the output layer represents a possible output category or value. For example, in classification problems, each output layer neuron might correspond to a category, while in regression problems, the output layer might have only one neuron, whose value represents the predicted outcome.

· Parameters: In neural networks, connections between different layers are represented by weights and biases, which are optimized during the training process to enable the network to accurately recognize patterns in the data and make predictions. Increasing parameters can enhance the neural network’s model capacity, i.e., the ability to learn and represent complex patterns in the data. However, this also increases the demand for computational power.

Big Data

For effective training, neural networks typically require large, diverse, high-quality, and multi-source data. It forms the foundation for training and validating machine learning models. By analyzing big data, machine learning models can learn patterns and relationships within the data, enabling predictions or classifications.

Massive Computational Power

The complex multi-layer structure of neural networks, numerous parameters, big data processing requirements, iterative training methods (during training, models need to iterate repeatedly, involving forward and backward propagation calculations for each layer, including activation function calculations, loss function calculations, gradient calculations, and weight updates), high-precision computing needs, parallel computing capabilities, optimization and regularization techniques, and model evaluation and validation processes collectively lead to high computational power demands.

Sora

As OpenAI’s latest video generation AI model, Sora represents a significant advancement in artificial intelligence’s ability to process and understand diverse visual data. By employing video compression networks and spatiotemporal patch techniques, Sora can convert massive visual data captured worldwide and from different devices into a unified representation, enabling efficient processing and understanding of complex visual content. Leveraging text-conditioned Diffusion models, Sora can generate videos or images highly matched to text prompts, demonstrating high creativity and adaptability.

However, despite Sora’s breakthroughs in video generation and simulating real-world interactions, it still faces some limitations, including the accuracy of physical world simulations, consistency in generating long videos, understanding complex text instructions, and efficiency in training and generation. Essentially, Sora continues the “big data-Transformer-Diffusion-emergence” old technical path through OpenAI’s monopolistic computational power and first-mover advantage, achieving a form of brute-force aesthetics. Other AI companies still have the potential to overtake through technological innovation.

Although Sora’s relationship with blockchain is not significant, it is believed that in the next one or two years, due to Sora’s influence, other high-quality AI generation tools will emerge and develop rapidly, impacting various Web3 sectors such as GameFi, social platforms, creative platforms, Depin, etc. Therefore, having a general understanding of Sora is necessary, and how AI will effectively integrate with Web3 in the future is a key consideration.

The Four Pathways of AI x Web3 Integration

As previously discussed, we can understand that the foundational elements required by generative AI are essentially threefold: algorithms, data, and computing power. On the other hand, considering its universality and the effects of its outputs, AI is a tool that revolutionizes production methods. Meanwhile, the greatest impacts of blockchain are twofold: restructuring production relationships and decentralization. Therefore, I believe the collision of these two technologies can generate the following four pathways:

Decentralized Computing Power

As previously discussed, this section aims to update the status of the computing power landscape. When it comes to AI, computing power is an indispensable aspect. The demand for computing power by AI, which became unimaginable following the emergence of Sora, has been starkly highlighted. Recently, during the World Economic Forum in Davos, Switzerland, in 2024, OpenAI’s CEO Sam Altman openly stated that computing power and energy are the current biggest constraints, suggesting their future importance might even be equivalent to currency. Following this, on February 10th, Sam Altman announced a shocking plan on Twitter to raise 7 trillion USD (equivalent to 40% of China’s GDP in 2023) to overhaul the current global semiconductor industry, aiming to create a semiconductor empire. My previous thoughts on computing power were limited to national blockades and corporate monopolies; the idea of a single company aiming to dominate the global semiconductor industry is indeed crazy.

The importance of decentralized computing power is, therefore, self-evident. Blockchain’s features can indeed solve the current issues of extreme monopoly in computing power, as well as the expensive costs associated with acquiring specialized GPUs. From the perspective of AI needs, computing power usage can be divided into two directions: inference and training. Projects focusing on training are still few, as decentralized networks require integrating neural network designs and have an extremely high demand for hardware, making it a direction with high barriers and difficult to implement. In contrast, inference is relatively simpler, as decentralized network design is not as complex, and the requirements for hardware and bandwidth are lower, making it a more mainstream direction.

The imagination space for the centralized computing power market is vast, often linked with the “trillion-level” keyword, and it is also the most hype-prone topic in the AI era. However, looking at the plethora of projects that have recently emerged, most seem to be ill-conceived attempts to capitalize on trends. They often uphold the banner of decentralization but avoid discussing the inefficiencies of decentralized networks. Additionally, there’s a high degree of homogeneity in design, with many projects being very similar (one-click L2 plus mining design), which may ultimately lead to failure, making it difficult to carve out a niche from the traditional AI race.

Algorithm and Model Collaboration System

Machine learning algorithms are those that can learn patterns and rules from data, and make predictions or decisions based on them. Algorithms are technology-intensive because their design and optimization require deep expertise and technological innovation. Algorithms are the core of training AI models, defining how data is transformed into useful insights or decisions. Common generative AI algorithms include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformers, each designed for a specific domain (such as painting, language recognition, translation, video generation) or purpose, and then used to train specialized AI models.

So, with so many algorithms and models, each with its unique strengths, is it possible to integrate them into a versatile model? Bittensor, a project that has gained much attention recently, is leading in this direction by incentivizing different AI models and algorithms to collaborate and learn from each other, thereby creating more efficient and capable AI models. Other projects focusing on this direction include Commune AI (code collaboration), but algorithms and models are closely guarded secrets for AI companies and are not easily shared.

Thus, the narrative of an AI collaborative ecosystem is novel and interesting. The collaborative ecosystem utilizes blockchain’s advantages to integrate the disadvantages of isolated AI algorithms, but whether it can create corresponding value remains to be seen. After all, leading AI companies with proprietary algorithms and models have strong capabilities in updating, iterating, and integrating. For instance, OpenAI has evolved from early text generation models to multi-domain generative models in less than two years. Projects like Bittensor may need to explore new paths in the domains their models and algorithms target.

Decentralized Big Data

From a simplistic viewpoint, utilizing private data to feed AI and annotating data are directions that align well with blockchain technology, with the main considerations being how to prevent junk data and malicious actions. Additionally, data storage can benefit Depin projects like FIL and AR. From a more complex perspective, using blockchain data for machine learning (ML) to address the accessibility of blockchain data is another intriguing direction (one of the explorations by Giza).

Theoretically, blockchain data is accessible at any time and reflects the state of the entire blockchain. However, for those outside the blockchain ecosystem, accessing these vast amounts of data is not straightforward. Storing a complete blockchain requires extensive expertise and significant specialized hardware resources. To overcome the challenges of accessing blockchain data, several solutions have emerged within the industry. For example, RPC providers offer node access via APIs, and indexing services make data retrieval possible through SQL and GraphQL, playing a crucial role in addressing the issue. However, these methods have their limitations. RPC services are not suitable for high-density use cases that require large amounts of data queries and often fail to meet the demand. Meanwhile, although indexing services offer a more structured way of retrieving data, the complexity of Web3 protocols makes it extremely difficult to construct efficient queries, sometimes requiring hundreds or even thousands of lines of complex code. This complexity presents a significant barrier for general data practitioners and those with a limited understanding of Web3 details. The cumulative effect of these limitations highlights the need for a more accessible and utilizable method of obtaining and leveraging blockchain data, which could foster broader application and innovation in the field.

Therefore, combining ZKML (Zero-Knowledge Proof Machine Learning, which reduces the burden of machine learning on the chain) with high-quality blockchain data could potentially create datasets that address the accessibility of blockchain data. AI could significantly lower the barriers to the accessibility of blockchain data. Over time, developers, researchers, and ML enthusiasts could access more high-quality, relevant datasets for building effective and innovative solutions.

AI Empowerment for Dapps

Since the explosion of ChatGPT3 in 2023, AI empowerment for Dapps has become a very common direction. The broadly applicable generative AI can be integrated through APIs, thus simplifying and smartening up data platforms, trading bots, blockchain encyclopedias, and other applications. On the other hand, it can also act as chatbots (like Myshell) or AI companions (Sleepless AI), and even create NPCs in blockchain games using generative AI. However, due to the low technical barriers, most are merely tweaking after integrating an API and the integration with the projects themselves is not perfect, so it’s rarely mentioned.

But with the arrival of Sora, I personally believe that AI empowerment for GameFi (including the metaverse) and creative platforms will be the focus moving forward. Given the bottom-up nature of the Web3 field, it’s unlikely to produce products that can compete with traditional games or creative companies. However, the emergence of Sora could potentially break this deadlock (perhaps in just two to three years). From the demo of Sora, it has the potential to compete with micro-drama companies. The active community culture of Web3 can also give birth to a plethora of interesting ideas, and when the only limit is imagination, the barriers between the bottom-up industry and the top-down traditional industry will be broken.

Conclusion

As generative AI tools continue to evolve, we will witness more groundbreaking “iPhone moments” in the future. Despite skepticism towards the integration of AI with Web3, I believe that the current directions are largely correct, with only three main pain points needing resolution: necessity, efficiency, and fit. Although the fusion of these two is still in an exploratory phase, it does not prevent this path from becoming mainstream in the next bull market.

Maintaining sufficient curiosity and openness to new things is an essential mindset for us. Historically, the transition from horse-drawn carriages to automobiles was settled in an instant, just as inscriptions and past NFTs have shown. Holding too many biases only leads to missed opportunities.

About YBB

YBB is a web3 fund dedicating itself to identify Web3-defining projects with a vision to create a better online habitat for all internet residents. Founded by a group of blockchain believers who have been actively participated in this industry since 2013, YBB is always willing to help early-stage projects to evolve from 0 to 1.We value innovation, self-driven passion, and user-oriented products while recognizing the potential of cryptos and blockchain applications.

Website | Twi: @YBBCapital

--

--

YBB
YBB Capital

A leading Web3 fund driving the future through innovative investments.