WEB 3.0 or life without websites

Alexander Boldachev
8 min readApr 1, 2023

--

Tim Berners-Lee introduced the term “semantic web” in 1998. The idea was simple: let’s teach computers to differentiate the meaning (sense) of content posted on web pages and combine this content into a single semantic web. Later, the concept of meaningful data linking was associated with the term web 3.0, and the upcoming version of the internet was thought of as a semantic network.

A special format for recording semantic data (RDF), a language for describing subject areas (OWL), and a query language for searching linked data (SPARQL) were developed and standardized. It was assumed that everyone would start semantically tagging their website pages… But something went wrong, and sometime shortly after 2010, web 3.0 was almost forgotten.

And now, following the rollback of several waves of mass interest in digital technologies — big data, the Internet of Things, artificial intelligence, blockchain — the term “web 3.0” re-emerges (not without the help of Tim Berners-Lee himself). However, more often with a new epithet — “decentralized” — picked up from the quietly disappearing blockchain scene. Sometimes, out of old habit, semantic data search is added to the general stack of decentralized web 3.0 technologies along with the Internet of Things and artificial intelligence, but how to combine it with decentralization is usually not specified. However, before discussing the place of semantic data representation in the new decentralized internet, let’s try to understand the reasons for the failure of the first — purely semantic — approach to web 3.0.

The semantic web was initially thought of exclusively as an extension of the existing internet (then still, of course, web 1.0). That is, as carriers of semantically marked data, they thought of regular pages and other content of millions of diverse websites. It was proposed to endow each object — each web page, file, description of an offline object on a web page — with a unified identifier and, using these links, unite all network content into a single semantic network…

So, even this knowledge about the plans for weaving a semantic web is enough to understand their futility. It is clear that the weakest link in this project is the use of ordinary web pages as its basis. First, pages periodically change their addresses, or simply disappear along with the entire website and its content. Second, the last people who should be trusted with semantic markup are website owners: they have no incentive to do this, and if they did take it on, they would write anything just to attract users (this is the very reason search engines have long stopped considering keywords when ranking pages). Third, the same content (articles, images, offline object records) is duplicated many times on thousands and thousands of websites, which fundamentally excludes the possibility of achieving unique addressing. After all, it is obvious that each website owner would mark the content as their own. Fourth, even if we assume that it would be possible to perfectly overlay a semantic network on websites, to implement semantic search, it would still be necessary to store all pages with all content duplicates in one place, as modern search engines do. In the end, semantic technologies were applied only where centralized content verification is possible, such as mega portals like Google, which started using semantic data markup proposed by Schema.org.

So, let’s think about what is needed for a full-fledged implementation of a global semantic network project? First of all, unique identification of resources is needed: the semantic connectedness of content will only make sense if each online and offline object has a unique identifier for all its copies. It is clear that this identifier should be assigned to the object either by its author/legal owner, or by some authorized person, or ultimately by reaching consensus among many independent individuals. At the same time, the identity of content copies must be ensured, that is, the ability to reliably verify their authenticity. And of course, fast access to all content must be provided.

It is clear that the simplest way to meet these conditions is to create a single centralized repository of semantically linked data, managed by one organization, which should ensure both unique identification of content and protection against falsification. Such semantic repositories were created (DBpedia, Freebase, OpenCyc), but they did not live up to the hopes placed on them.

However, in recent years, a fundamentally new comprehensive solution for implementing the global semantic network project has emerged — distributed ledger technology (DLT). Initially, the technology was implemented in the form of blockchain networks, and now there are solutions based on directed acyclic graphs.

So, what does the new technology offer us as mandatory architectural solutions? (1) Entries in the DLT network initially have unique identification, (2) are signed by private keys of users who added them to the network, (3) are cryptographically protected from falsification, and (4) are decentralized, meaning they are stored on multiple equal nodes of the network, which prevents their loss and ensures fast search. At the same time, there is no single point of failure and no single control center in the DLT network — changes in the state of data on all nodes occur as a result of consensus, which provides increased resistance to attacks and malicious actions by users.

It turns out that the idea of the semantic web was ahead of its time. The technology on which it can be implemented, a technology that solves problems related to the unsuccessful attempt to deploy the semantic web on web pages, was developed ten years later. And it took another ten years for the understanding to come not only of the possibility but also of the necessity of the symbiosis of semantics and DLT. The semantic web, in the form of DLT, finally gets a reliable, verifiable, decentralized data storage with unique identification of content and users.

The combination of semantic web technology and distributed ledger technology (DLT) brings the vision of a global semantic network closer to reality. By addressing the challenges faced in earlier attempts to create a semantic web, DLT provides a solid foundation for unique identification, data integrity, and decentralized storage. This symbiosis opens up new possibilities for organizing and searching the internet’s vast amounts of information, making it more accessible, trustworthy, and efficient for users.

So, what does DLT gain from its symbiosis with semantics?

The main advantage of DLT technology is traditionally considered to be the ability for direct (bypassing intermediaries) non-falsifiable interaction between independent counterparties. If we are talking about cryptocurrency, then this is indeed the case: a single algorithm for everyone (the entire network) and no intermediaries. However, the situation changes fundamentally when considering networks with so-called smart contracts — program controllers that are specifically designed to provide free interaction between independent counterparties. The programmer who created this very contract-controller acts as another party, another agent in the relationship. We must trust them, not understanding what they have written and not having the opportunity to independently verify the contract code. The programmer in this situation acts as an intermediary and the proverbial single point of failure in the interaction of business agents.

Or let’s think about the legitimacy of using the term “decentralized application” (DApps). What is decentralized in these DApps? Only that they work in a peer-to-peer decentralized DLT network, uniformly processing network transactions on each of its nodes. In essence, these are ordinary contract controllers written by one programmer (or a team of programmers). And if you run such a “decentralized” application on a separate server (on one node), it will work perfectly, not losing any of its functions, losing only in security, which is provided by a decentralized DLT network.

It turns out that DLT technology, having solved the problem of decentralizing control and exchange of transactions at the level of the network protocol and data storage, is not able to ensure decentralization at the level of interaction between counterparties in specific activities: to connect counterparties, to implement some business function, a special, quite centralized written and centrally operated application is needed. And it is clear that to expand the functionality of this application, you will have to contact the same intermediary-programmer again. And organizing interaction (exchange of data) between different business functions, between working DLT applications written by different programmers, is frightening to think about…

And this is when it’s time to remember semantics, Tim Berners-Lee’s original idea of teaching computers to distinguish the meaning (sense) of content, and the universal semantic language. And if the idea of describing web page content in this language ultimately proved to be questionable, using the semantic format for data exchange between DLT network applications appears not just as a good solution, but as a natural development of technology.

So, on one hand, the DLT network is seen as a basic storage for semantically linked data, ensuring their unique identification, immutability, and accessibility. On the other hand, describing data in a single semantic language for the network solves the problem of implementing truly decentralized interaction of independent DLT applications. In essence, it’s about storing all data in the network as a single semantic graph, and network transactions must have a unified semantic format that is understandable to all applications. This opens up the possibility of writing applications/contracts in a human-readable, semantically defined language. Using a single semantic transaction format is extremely important for the development of the Internet of Things, that is, for the unification of data from all kinds of sensors and the use of this data by independent applications. A single format for recording various data in the global graph is simply a goldmine for artificial intelligence.

It is significant that the semantic approach offers natural, not contrived solutions to other known problems of DLT networks — such as limited horizontal scalability and low transaction exchange speed. After all, the single semantic graph containing all network data is not homogeneous — it is divided into many weakly connected sectors corresponding to different subject areas, which can serve as the basis for natural network clustering. Users and applications working with data in one subject area can be combined into relatively independent subnetworks or clusters, the nodes of which store only the relevant sectors of the semantic graph. This not only solves the problem of storing the entire volume of data on each DLT network node but also the problem of parallelizing transaction processing from different subject areas. Moreover, the semantic definition of transactions will allow organizing their parallel processing within a single cluster in cases where they are not semantically connected (which is unambiguously determined by the graph).

So, Web 3.0 is a semantic decentralized network consisting of multiple subject-oriented clusters, with a unified semantic transaction format and a single graph data storage. Web 3.0 is a new internet, not only implementing storage, exchange, and semantic search of any content but also ensuring truly decentralized interaction of independent agents in any activity.

And where are the websites, the web pages for which the semantic web was invented? They are gone. They are simply not needed. In the new network, each user is their own site, or rather, a node in one or several clusters of the global peer-to-peer network, as well as the owner of the content they create. And the browser — now a semantic browser — is used not for navigating between millions of pages, but for searching and presenting fragments of a single semantic, cryptographically protected graph, distributedly stored on the same user nodes.

P.S. It should be noted that the RDF/OWL semantic data description tools mentioned at the beginning of the text turned out to be unsuitable for modeling complex, dynamic, evolving systems, and new solutions are needed to implement a unified semantic space for Web 3.0.

--

--