The Web3 Evolution and the Next Generation Data Center

Alphatu
17 min readNov 4, 2022

--

Written By Yaqi ZHANG(Twitter@Alphatu4) &Shawn Chang (HardenedVault CEO、Chaos Communication Club Member )

Introduction

Web3 has gradually gone mainstream. As the core of IT infrastructure, data center also play a vital role in the evolution of Internet. In this article, we analyze the reasons for the transition of data centers from the computer room to industry clouds, trying to predict what would the Next Generation Data Center look like from the perspective of traditional business and Web3 business, analyzing the reasons for the underlying change.

1.What is the data center?

To understand the shift of data center, we must understand what data center is. According to Wikipedia, a data center is a dedicated space within a building that houses computer systems and related components, such as telecommunications and storage systems. As IT operations are critical to business continuity, it typically includes redundant or backup components and infrastructure for power supply, data communications connectivity, environmental controls, and various security devices. A large data center is an industrial-scale operation that uses as much power as a small town.

Source: https://en.wikipedia.org/wiki/Data_center

In the 1980s, people could deploy computers in various locations because the operational requirements were not complex. However, as the complexity of information technology (IT) operations increased, it is necessary for Data center (applied to a specially designed computer room) to control IT resources.

After the 1980s, with the development of information technology, the Internet industry developed a great deal, and then came the period of the Internet bubble in the United States around the 2000s. Although many companies crashed during the dot-com bubble, this period also witnessed the development of data centers. Many SME Internet startups needed fast Internet connection and uninterrupted operation to deploy their systems during this time; however, many small Internet companies emerged at that time that did not have the resources to deploy large Internet data centers. As a result, IDCs (Internet Data Centers) gradually emerged to meet these market needs.

IDC

Source: https://www.datacenterknowledge.com/archives/2014/11/11/idc-amount-of-worlds-data-centers-to-start-declining-in-2017

How has the data center evolved from the earliest computer rooms to today’s common industry clouds?

Data centers date back to the mid-1940s when the first computer rooms housed large military machines to handle specific data tasks. By the 1960s, the birth of mainframe computers saw IBM deploy dedicated large rooms for large companies and government agencies. Still, the growth in demand for the use required separate buildings to host these large computers, which was the occasion for the birth of the earliest data centers.

In the 1980s, Personal computers (PCs) were introduced, and computers needed to be networked to remote servers so that large data files could be accessed. By the 1990s, when the Internet (Web 1.0) became widely available, Internet Exchange (IX) buildings had gradually sprung up in major international cities to meet the needs of the World Wide Web.

What is an Internet exchange point?

An Internet exchange point (IXP) is a physical location through which Internet infrastructure companies such as Internet Service Providers (ISPs) and CDNs connect. These locations exist on the “edge” of different networks and allow providers to share transit outside their network. By having a presence inside an IXP location, companies can shorten their path to the transit coming from other participating networks, thereby reducing latency, improving round-trip time, and potentially reducing costs.

Source: https://www.cloudflare.com/learning/cdn/glossary/internet-exchange-point-ixp/

Data centers have gone through three main periods:

Data Center 1.0,mainly referring to computer rooms (the facility used to house computer systems) between 1990 and 2006. During this period, telecommunications companies would provide large enterprises with data center sites, power supplies, networks, communications equipment, and other primary telecommunications resources, hosting, and maintenance line services.

Data center 2.0:The development of the macro environment gradually commercialized the Internet industry, and as a result, the number of websites increased. The demand for centralized placement and maintenance of equipment and resources such as servers, hosts, and export bandwidth increased, and business models such as colocation and website hosting emerged. Then later, IDC service providers appeared, who provided data storage management, security management, network interconnection, and export bandwidth network services around colocation. Internet companies build or lease their data centers. Still, there are the same problems, such as the relatively high data center construction and maintenance cost, and it is challenging to expand flexibly with business development. In this context, cloud computing emerged, and the period from 2007 to 2013 was the era of a general-purpose computing cloud. The main feature of data centers in this period was that the business model was mainly based on user leasing.

Data center 3.0 refers to the industry cloud era from 2014 to 2021. At this time, cloud service providers have become the mainstream of business models. The scale of industry clouds reaches unprecedented levels, which in turn implies a high degree of centralization of computing and data. At the beginning of this century, AWS really kicked off the prologue of cloud computing, and computing really became an out-of-the-box public service, and data centers gradually moved from “small stations” scattered around to centralized “big factories”, where the big factories refer to the large-scale, virtualized, integrated data centers built by technology giants. By virtualizing storage and computing power, it becomes a kind of on-demand computing power, and for users, the centralized scale reduces costs while providing flexible expansion capabilities.

2.What is the reason for the evolution of data centers?

It all comes down to demand and supply of the market, starting with the demand for web hosting.

With more and more websites, The Internet industry continues to grow. Around the 2000 dot-com boom, many companies were trying to build their websites. At this time, Data centers with many servers and cables from different network operators could provide hosting services and remote servers for these websites. Besides, the data center could provide technical support. For example, if there were any technical problems with the website, the data center operator could immediately replace the computer or switch the connection to keep the website running.

These are the reasons why users choose data centers in this period:

Over the past decade, cloud providers like Microsoft, Google, Amazon, IBM, Oracle, SAP, Salesforce, and others began to evolve. Initially, AWS began to offer hosted solutions for businesses (IaaS) by allowing companies the flexibility to access remote servers.

The growth of cloud data centers began when many companies began accessing critical business software applications remotely through the cloud rather than deploying and managing those applications on servers in their server rooms.

To summarize:

During 1980–2000, when the Internet first appeared, the use of general-purpose computers and systems like UNIX was also very costly. Also, the general-purpose computer x86 performance was not yet sufficient to meet the business needs, but as general-purpose computers started to become the next era, the balance of cost and efficiency became a rigid demand for all businesses.

From 2000 to now, AWS was established in 2006, after which everyone gradually recognized the public cloud as an infrastructure. However, in 2022, we also find that using the public cloud in the early years is inexpensive and has enormous cost advantages. However is a trend that many big manufacturers have started to build their cloud instead of using public cloud like AWS.

According to datacenterdynamics.com, 2016, Dropbox is moving from Amazon Web Services (AWS) services and has set up its own data center facility in the US. Akhil Gupta, Dropbox vice president of infrastructure engineering, said, “Few companies in the world have the same requirements for scale of storage as we do. And even fewer have higher standards for safety and security. We built reliability and security into our design from the start, ensuring that the system stores the data in a safe and secure manner, and is highly available.”

3.What will Web3 and the next generation (4th generation) data center look like?

First, the model of the next-generation data center will be different from the current data center, which is caused by the evolution of business and the iteration of the underlying technology.

Second, the next-generation data center will be distributed (confederated).

Why?

Let us analyze it from two different perspective: traditional business and web3 business:

First, if we look at the non-web3 business, we should pay attention to the movement of Service Mesh, bringing distributed properties to many scenarios.

This part I refer to Phil Calçado’s article written in 2017, named Pattern: Service Mesh.

In Phil Calçado’s article ( Pattern: Service Mesh) written in 2017, he said that distributed systems enable use cases we could not even think about before them, describing the development of distributed systems.

What happened when people first started networking computers?Since people first thought about getting two or more computers to talk to each other, they envisioned something like this:

A service talks to another to accomplish some goal for an end-user. This is an obviously oversimplified view, as the many layers that translate between the bytes your code manipulates and the electric signals that are sent and received over a wire are missing. The abstraction is sufficient for our discussion, though. Let’s just add a bit more detail by showing the networking stack as a distinct component:

Variations of the model above have been in use since the 1950s. In the beginning, computers were rare and expensive, so each link between two nodes was carefully crafted and maintained. As computers became less expensive and more popular, the number of connections and the amount of data going through them increased drastically. With people relying more and more on networked systems, engineers needed to make sure that the software they built was up to the quality of service required by their users.

And there were many questions that needed to be answered to get to the desired quality levels. People needed to find ways for machines to find each other, to handle multiple simultaneous connections over the same wire, to allow for to machines to talk to each other when not connected directly, to route packets across networks, encrypt traffic, etc.

Amongst those, there is something called flow control, which we will use as our example. Flow control is a mechanism that prevents one server from sending more packets than the downstream server can process. It is necessary because in a networked system you have at least two distinct, independent computers that don’t know much about each other. Computer A sends bytes at a given rate to Computer B, but there is no guarantee that B will process the received bytes at a consistent and fast-enough speed. For example, B might be busy running other tasks in parallel, or the packets may arrive out-of-order, and B is blocked waiting for packets that should have arrived first. This means that not only A wouldn’t have the expected performance from B, but it could also be making things worse, as it might overload B that now has to queue up all these incoming packets for processing.

For a while, it was expected that the people building networked services and applications would deal with the challenges presented above in the code they wrote. In our flow control example, it meant that the application itself had to contain logic to make sure we did not overload a service with packets. This networking-heavy logic sat side by side with your business logic. In our abstract diagram, it would be something like this:

Fortunately, technology quickly evolved and soon enough standards like TCP/IP incorporated solutions to flow control and many other problems into the network stack itself. This means that that piece of code still exists, but it has been extracted from your application to the underlying networking layer provided by your operating system:

This model has been wildly successful. There are very few organizations that can’t just use the TCP/IP stack that comes with a commodity operating system to drive their business, even when high-performance and reliability are required.

What happened when we first started with microservices

Over the years, computers became even cheaper and more omnipresent, and networking stack described above has proven itself as the de-facto toolset to reliably connect systems. With more nodes and stable connections, the industry has played with various flavors of networked systems, from fine-grained distributed agents and objects to Service-Oriented Architectures composed of larger but still heavily distributed components.

The Service Mesh

This extreme distribution brought up a lot of interesting higher-level use cases and benefits. Take the Service Mesh as an example, in such model, each of services will have a companion proxy sidecar. Given that services communicate with each other only through the sidecar proxy, we end up with a deployment similar to the diagram below:

Buoyant’s CEO William Morgan made the observation that the the interconnection between proxies form a mesh network. In early 2017, William wrote a definition for this platform, and called it a Service Mesh:

A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It’s responsible for the reliable delivery of requests through the complex topology of services that comprise a modern, cloud native application. In practice, the service mesh is typically implemented as an array of lightweight network proxies that are deployed alongside application code, without the application needing to be aware.

Probably the most powerful aspect of his definition is that it moves away from thinking of proxies as isolated components and acknowledges the network they form as something valuable in itself.

As organisations move their microservices deployments to more sophisticated runtimes like Kubernetes and Mesos, people and organisations have started using the tools made available by those platforms to implement this idea of a mesh network properly. They are moving away from a set of independent proxies working in isolation to a proper, somewhat centralised, control plane.

Looking at our bird’s eye view diagram, we see that the actual service traffic still flows from proxy to proxy directly, but the control plane knows about each proxy instance. The control plane enables the proxies to implement things like access control and metrics collection, which requires cooperation:

The recently announced Istio project is the most prominent example of such system.

It is still too early to fully understand the impacts of a Service Mesh in larger scale systems. Two benefits of this approach are already evident to me. First, not having to write custom software to deal with what are ultimately commodity code for microservices architecture will allow for many smaller organizations to enjoy features previously only available to large enterprises, creating all sorts of interesting use cases. The second one is that this architecture might allow us to finally realize the dream of using the best tool/language for the job without worrying about the availability of libraries and patterns for every single platform.

Source: https://philcalcado.com/2017/08/03/pattern_service_mesh.html

To sum up Service Mesh is a representative of the development of centralized business towards distributed business, which is supported by both technology and business needs.

Will there be more basic businesses that gradually move towards decentralization? These will be associated with distributed next-generation data centers.

4.Why is the next generation of data centers distributed?

Let us explain it in terms of business needs, and societal culture:

Firstly, enterprises need to reduce costs, including building data centers. Some well-known companies have tried to build their cloud, mainly because the current public cloud cost is exceptionally high, while the self-built cloud will significantly reduce the cost. A typical example is Dropbox, which, by building its cloud infrastructure, Dropbox, saved nearly $75 million in operating costs in two years.

According to datacenterdynamics.com, 2016, Dropbox is moving from Amazon Web Services (AWS) services and has set up its own data center facility in the US. Akhil Gupta, Dropbox vice president of infrastructure engineering, said, “Few companies in the world have the same requirements for scale of storage as we do. And even fewer have higher standards for safety and security. We built reliability and security into our design from the start, ensuring that the system stores the data in a safe and secure manner, and is highly available.”

It is worth noting that the early Dropbox built a large user base and brand image using AWS and that many companies still use AWS to manage their cloud infrastructure. Nevertheless, let us ponder the question: Will everyone’s reliance on this well-known cloud provider always be there?

For example, suppose some startup leaps into a large company with hundreds of millions of users, and these companies understand the computing needs very well. In that case, it is more efficient to build a computing infrastructure that is completely designed to meet their business needs.

If Dropbox builds its own data center, will this be a trend in the industry?

Secondly, there is a correlation between the growth of data centers and Moore’s Law. In 1965, Gordon Moore posited that roughly every two years, the number of transistors on microchips would double. However, Moore’s Law is becoming obsolete, which will become a bifurcation point for general-purpose computing.

However, AMD EPYC3 brings a high performance because the EPYC3 microarchitecture is well-optimized and has a more advanced 5nm process. Moreover, the EPYC3 cost advantage is enormous and will significantly reduce the costs for Web3 data centers.

Third, just as BTC emerged as a rebellion against the traditional financial system after the financial crisis, a similar revolution will also occur in IT infrastructure (data Centre). Web1.0 was decentralized initially, but by the Web2.0 era, a few monopolistic technology companies gained huge benefits by exploiting the data of ordinary users, and Web3 would try to solve this problem. The distributed data center, then, could be used to be an essential support point for the underlying Web3 business.

Fourth, mobile data centers are gradually developing and can be a supplement to web3’s data center business. For example, the mobile data center can be a containerized movable data center. So what is its relationship with Web3 (in the PoW era, in fact, even in proving its arithmetic power is large, then this mobile data center can directly rely on the number of heaps to go up to Web3’s node verification to provide arithmetic power.)

5.Web3, Blockchain Features, and Next-Generation Data Centers

What do the Next-Generation Data Centers look like in the Web3 world? Unlike Web 2.0, the business and infrastructure of Web3 are bound deeply, and after Ethereum switched to PoS, the Validator in Ethereum need to stake to keep the system running, and the Node have more responsibility as well,becoming a node is usually demanding in terms of equipment and network requirements.

In PoW mode, all nodes can participate in the verification, but in the PoS scenario, only the elected super nodes can become the verification nodes; these super nodes usually run

on GNU/Linux. In this way, each verification needs more than 2/3 of the machine’s nodes to participate in the vote to agree. Let us assume that a specific business has dozens of super nodes worldwide, and one node is placed in the German Hetzner data center, one node in France to the OVH data center, and then the Japanese node is a local server room for hosting.

How to ensure that the operating status of the servers is secure? For example, would they not be tampered with by the server room management or involved with Evil Maid attack?

The Impossible Triangle of Public Blockchian

Besides, as we discuss next-generation data centers, it is also essential to think about how Web3 business and the inherent characteristics of blockchain relate to next-generation data centers.

In the context of public blockchains, the impossible triangle presents the idea that a blockchain can have decentralization, security and scalability all to varying degrees — but cannot achieve all three to an extent that is sufficient. For example, blockchain systems are currently designed to meet only two of the three: decentralization, security, and scalability. Extreme decentralization solutions like BTC (Bitcoin) and XMR (Monroe) come at the expense of scalability, which results in a technical architecture for BTC/XMR that cannot provide more complex businesses. Therefore, for blockchain solutions requiring decentralization and security, a second layer is inevitable if we want to grow and accommodate a broader business ecosystem.

However, layer 2 of the blockchain faces many problems in terms of security.

First, as Ethereum switched to PoS, introducing supernodes will increase system security risk. For example, under the PoW model, if someone wanted to launch an attack in the past, they needed to attack tens of thousands of nodes to launch a hack, like 51% attack. However, in the PoS mode, hackers need to control the number of nodes to launch an attack significantly reduced compared to the PoW mode. Therefore, there is a potential security risk.

Secondly, with the popularity of cross-chain bridges in the Web3 era, all kinds of cross-chain protocols have flaws. For example, the various bugs that currently exist in cross-chain bridges also require attention to security.

The third is supply chain security, mainly whether developers will implant backdoors, and infrastructure construction needs to consider security reinforcement from the bottom.

However, is there a solution? Suppose we sacrifice part of the decentralized features and adopt a confederated architecture. In that case, the impossible triangle in the above figure can become possible (that is, it can meet the scalability and security features). We believe scalability and security cannot be sacrificed because complex business cannot be carried out once this is done.

6. Distributed Web3 data center application example

The application scenarios of the fourth-generation data center are as follows.

As shown in the figure, A is an agricultural town, B is a financial town, C is a mining town, D is a factory town, the blue ovals represent business nodes, and the green ovals represent confederated servers. These four interconnected towns use a distributed/confederated data center architecture.

First of all, each node in the data center of the town can interact with the business nodes of other towns based on the specific business needs, thus balancing extreme decentralization and efficiency (PoS represents the balance between extreme decentralization and efficiency, because PoW cannot carry complex business), which is also one of the mainstream forms of the next generation Internet (Web3) as considered in this article.

Second, in the distributed ledger (blockchain), the confederated servers are somewhat similar to super nodes, and such nodes need to make their security levels public.

Third, the four towns use the fourth-generation data center architecture, and the servers in one single cabinet have high-performance features (for example, servers equipped with AMD EPYC3 can provide more than 4,000 CPU cores per cabinet), which makes the computing power that originally required 10 cabinets now requires only about 2.

Fourth, the server firmware used by the four towns is an open-source implementation. While open source does not directly guarantee security, its auditable nature reduces the risk of security.

Fifth, if users in the four towns deploy their servers in their own homes, reducing the risk of Evil Maid attacks. However, if users are allowed to arbitrarily place physical servers in any data center in the four towns, then hardware-level security features are required to ensure that users can always verify their machines are in the “expected” state of operation via remote proof. The hardware-level security features are needed to ensure that users can verify that their machines are in the “expected” state of operation at any time via remote proof.

7. Looking to the future

The current technological process has gradually met the utopian vision of the future Internet by early 0ldsk00l hackers and cypherpunks. Sprits from the cypherpunk manifesto, the free software movement, and the open-source culture and cryptocurrency, the philosophy and underlying technology have gradually accelerated the development of Web3.

The recent ETH merger also represents the end of the era of trying to build complex business by main chain and the layer 2 or layer 3 will play more important role now. What are the opportunities and challenges we may face in the future?

The boundary between ETH’s main chain and layer 2 will become blurred.

If BTC keep the status of digital gold in people’s consensus, then BTC will carry most of the work of PoW, and if the consensus takes privacy into consideration some of the computing power will shift to XMR/ZEC.

Web3, which carries complex operations, is clearly moving gradually towards confederation rather than a fully decentralized route, Many people predict that the future will move a large number of super nodes from the public cloud (mainly AWS) to the data center, which can avoid a certain degree of security risk, but still face a large number of security risks from the infrastructure, such as OS (operating system level) and BELOW-OS (operating system below the level).

The next generation data center will become the main force of Web3, and the ecology of ETH will be more diversified.

If you have any suggestion about this article, we want to hear from you. we are eager to communication with friends who will building in this space.

Twitter: @Alphatu4 @Citypw

--

--

Alphatu

Writer & Researcher | Microsoft MVP | Author of 2 Books |Lecturer of PKU/Upenn | Twitter:@Alphatu4 | Arabic Learner | Believe in the power of deep thinking