A Brief Summary of the Linked Data

Published in

Thinkerfox

10 min readSep 26, 2019

Listening to some music and wonder about the lead guitarist’s other affiliations? For this kind of information, you may check the Wikipedia about it, or you can “Google it”; find a page on the Web, and get the data you seek.

What we call a “web page” is a document in a select format located on some computer. Many of these pages come together, and basically, that’s what we call “the internet.” There is a relationship between these web pages, and this is what we call a “hyperlink.” Traditionally, this relationship is implicit, and as the data format, i.e., HTML is not sufficient. Notably, not sufficiently expressive to enable individual items described in a specific document to be connected by typed links to related entities. In other words, a link from one web page to another is not enough. There must be a link between the “entities” on these web pages.

The last sentence raises a fair question: What is an entity? As far as the Semantic Web is concerned; everything that you can describe with its properties is an entity. This article is an entity, the reader of this article is an entity; John Lennon is an entity, and “The Beatles” is an entity.

In today’s digital world, the data on the Web is losing much of its structure and semantics. The semantics of the information is important because as humans, we can recognize the entities when we see them -most of the time; but it is nearly impossible for a machine to interpret these entities as a human would do. Unlike a web browser where users can jump through these information pieces; search engines index the documents and find the structural links between them to reveal the most detailed results to the users. This approach would be much more effective in a Web where all the data is adequately linked among these documents in a globally-agreed way so that a machine cannot misinterpret.

What is Linked Data?

Before describing the Linked Data, first, we should understand the structured data. As we described above, an entity has several properties and relationships more than mentioned in a document. With that in mind; let’s review the following paragraph in terms of structured data:

“9-time GRAMMY winner Norah Jones comes full circle with Day Breaks, her stunning sixth solo album which is a kindred spirit to the singer’s breakout debut Come Away with Me and finds Norah returning to the piano and her roots. The album features jazz luminaries including her Blue Note label mates saxophonist Wayne Shorter, organist Dr. Lonnie Smith, and drummer Brian Blade on a 12-song set that presents nine new originals alongside covers of songs by Horace Silver, Duke Ellington, and Neil Young.”

We can almost immediately pick-up the fact that we are talking about an artist that is a human being named Norah Jones who have won prizes of Grammy which is a globally recognized prize given in music industry yearly. So this person has recorded a new album. What is an album? Album is a collection of songs. What is a song? A song is a music with some lyrics. Norah Jones is returning to the piano, which is an instrument. An instrument is a thing used for performing music. Other people like Duke Ellington and Dave Brubeck also play this instrument as well. This explanation is human’s interpretation. Also, the description of this interpretation is meaningful for a human. However, for a machine, these are still ones and zeros, and there’s no difference between “Norah Jones is recording an album” and “I’m going to the moon with a bathtub filled with batteries for the safety of Donald Duck.” However, there should be a difference because the latter makes no sense at all. For a machine, it has a different combination of bits, and that’s all it matters.

Let’s take the first sentence from the paragraph above and decompose to its entities: “9-time GRAMMY winner Norah Jones comes full circle with Day Breaks, her stunning sixth solo album which is a kindred spirit to the singer’s breakout debut Come Away with Me and finds Norah returning to the piano and her roots.”

Now imagine that these explanations are properly encoded in a machine-understandable way. By reading this encoded information, a machine could easily see the similarity between “Day Breaks” and “Come Away with Me”: those both have the same type (MusicAlbum), and therefore these are similar entities. If we continue to define these entities along with their properties, we would have more detailed results. For example, we would have a MusicAlbum entity for “Day Breaks” album like following;

Describing an entity with its properties is what we mean by “semantics” for a machine. We use a “schema” to describe the features of a music album. For more details of a MusicAlbum entity, you can visit https://schema.org/MusicAlbum

Relationship between Entities

Now that we have established the idea of structured data, it is time to define the relationship between these structured entities. If we consider the example above, it’s clear that the MusicAlbum object has a relation with other objects like Person (Norah Jones) or Organization (Blue Note Records). In other words, every entity-card has other entity-cards. Also, these cards should not necessarily be on the same document. Some information might be on one page, and some detailed information might be on another. For example, we can say that Norah Jones was born in England. Also, we know that England is a “Country.” However, as a “Country,” England has so many properties such as its population, current prime-minister, flag, and national anthem. However, we can’t embed every piece of information about England into a document that only contains an article about the Norah Jones’ latest album. The detailed information about England appears on other pages like Wikipedia or somewhere else. Instead of embedding every information about England in the Norah Jones article, what we can do is, we can link our “England” entity to another entity on Wikipedia where it is defined in greater depth. As you can imagine, on that document, the prime minister of England would also be an entity. Which means, It would only take two links to go from Norah Jones to the prime minister of England.

Finally, Linked Data:

Linked Data is a set of best practices about how we can use the Web to connect the entities to create machine-readable links from different sources.

In the following section, we’ll explain Linked Data further and define the characteristics of Linked Data.

Elements of Linked Data

In 2007, Tim Berners-Lee introduced the fundamental design principles of Linked Data:

a. Use URIs as names for things

b. Use HTTP URIs so that people can look up those names.

c. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

d. Include links to other URIs. So that they can discover more things.

We will now briefly address these elements mentioned above.

Uniform Resource Identifiers (URIs):

As we all know and frequently use; a URL (Uniform Resource Locator) is a universal way of expressing the location of a resource on the Web. For example, imagine typing “http://www.example.com/my-page/details" to a browser. Now basically we have three portions within this URL: the schema (in this case the type of request: HTTP), the authority (www.example.com) and the path: “/my-page/details/.” Besides, we could specify the “query” and the “fragment” as well. However, the main idea is, URL defines the “location” of a resource, not just the resource itself.

For example, “Serhat Uzunçavdar” is my name. My name can be a URI because it identifies a resource. However, It cannot be a URL, because it doesn’t say anything about how to connect with me. On the other hand, “Boğaziçi University, Northern Campus, Department of Computer Engineering” can be URL because it holds the information about a “location” as well as a resource.

Resource Description Framework (RDF):

RDF is a widely-adopted standard for describing the resources in the Web. Wikipedia -the most recognized encyclopedia in the Web- has used RDF to launch its “semantic-sibling” DBpedia, which became the core platform of Linked Open Data. RDF improves the linkage structure used in the current Web. This linkage defines the relationships among various entities as well as the entities themselves. Therefore, this structure forms a directed graph: edges are the relationships, and nodes are the entities.

Works & Toolsets

SOLID (Social Linked Data Project):

SOLID is one of the most significant projects on Linked Open Data. SOLID is led by Tim Berners-Lee himself at MIT and supported by Inrupt Company, which also co-founded by Tim Berners-Lee. SOLID project is based on existing web standards and been under substantial development for the last 15 years. The basic idea behind SOLID is encouraging everyone to contribute to the Web instead of just reading the content.

So, what is SOLID? SOLID is an ecosystem, where you can store your data: photos, contacts, events, activities… As of November 2018, anyone can get a free space called a SOLID POD. These PODs can connect to various SOLID APPS. However, as of November 2018, there was no published app in the SOLID ecosystem. Once there is, you’ll be able to give these apps permissions to read/write to your PODs. Also, you’ll be able to assign similar permissions to other people as well. This way, your data will always stay with you in one place, synced all the time.

DBpedia & WikiData:

DBpedia is simply the sibling of Wikipedia, built with RDF, where you can perform SPARQL queries through its APIs. The funny thing is, WikiData is essentially the same thing. However, the difference is that the DBpedia is relying on Wikipedia for extracting the information and convert it to a machine-readable state; while WikiData is an independent platform where everyone can contribute, like Wikipedia. In other words, WikiData is a combination of DBpedia and Wikipedia.

Google Knowledge Graph:

In 2012, Google launched the Google Knowledge Graph that understands the search queries and gives direct answers along with the search results. For example, when we search “Norah Jones” on Google, we’ll have a straightforward response to the question “Who is Norah Jones?”.

At the figure above, what you’ll see is not just a result page. On the right, what you’ll see is a “snippet” about Norah Jones. This snippet comes directly from the Google Knowledge Graph. Furthermore, within this snippet, you can jump to other nodes related to Norah Jones; like her parents Ravi Shankar and Sue Jones. More interestingly, you can visit the genres of Norah Jones’ music, which are arbitrary entities, that we’ve mentioned earlier.

Open Graph Protocol

Facebook introduced Open Graph in 2010. According to the official website: “The Open Graph protocol enables any web page to become a rich object in a social graph.” In practice, Open Graph converts a web page into a node in the social graph.

Conclusion

As Tim Berners-Lee said; “The future is still so much bigger than the past.” The Web today already contains a tremendous amount of data, but it is not too late to construct a global structure to it. We need to notice that unless we contribute to the Linked Open Data, every piece of information that we create will be dumped into the garbage someday instead of contributing to the collective human knowledge. Moreover, the Linked Open Data is the most agreed, and well-structured way to achieve this. In other words, we need Linked Data for the Web’s future.

Works and studies at universities like MIT hold high importance. Besides, it is quite promising to see the contribution of digital giants like Google and Facebook. In spite of these efforts, public enlightenment plays a significant part to achieve this goal globally. Therefore, in addition to these efforts, people must be informed about the importance of the Open Data, beginning with the universities. Contribution of the universities will flourish the Web in terms of data-quality and data-dependability.

Also, companies should inform software developers about Linked Data toolsets, and the development of easier toolsets must be encouraged. Software companies such as Microsoft or programming language communities such as Python or Java must be encouraged to build delicate, easy-to-use wrappers around Linked Data toolsets.

Finally, companies outside of the digital industry must be informed about how they can benefit from the Linked Data concept. For example; well-structured web documents will improve search engine performances, but this is only one side of the coin. More importantly, they can make use of the Open Data for their R&D departments. So that is why their contribution to Open Data is so important.

References

Bizer, Christian; Heath, Tom; Berners-Lee, Tim — Linked Data: The Story So Far, International Journal on Semantic Web and Information Systems (IJSWIS), 2009

Bizer, Christian; Heath, Tom — Linked Data: Evolving the Web into a Global Data Space, 2011

Rodríguez-Doncel, Víctor & Santos, Cristiana & Casanovas, Pompeu & Gomez-Perez, Asuncion. Legal aspects of linked data — The European framework. Computer Law & Security Review, 2006, Issue 32. 10.1016/j.clsr.2016.07.005.

Berners-Lee, Tim — Linked Data, https://www.w3.org/DesignIssues/LinkedData