Part One demonstrated how OST Music, a hypothetical music streaming service, was able to link and enrich large datasets of music industry data into a unified knowledge graph.
This article will explain how the same music service was able to validate and query their knowledge graph using RDFox, without compromising speed or correctness.
You can read Part One here.
With RDFox, the music platform can validate the music industry data integrated from the various sources. During the data integration process outlined in Part One, inconsistencies can be highlighted, for example, by flagging data which doesn’t corroborate between the three datasets. By doing so, the correctness of information stored within the knowledge graph can be verified. Inconsistencies are found and fixed, making the knowledge graph more accurate.
Large datasets may have broken links, i.e. one of the datasets links to an entry that does not exist in another dataset (see diagram below). To provide the highest level of service to their users, OST Music can use rules to find broken links within the data, and then use queries to find out all the information needed to fix or remove the links.
For example, the ‘Marginal Prophets’, a hip hop/garage band from San Fransisco, were found in WikiData, however their code in Discogs is incorrect, leading to a 404! page being shown. Using this information, the user can replace the broken link with the correct information, resulting in a correct and comprehensive dataset.
The music platform can use RDFox’s reasoning capabilities to collect and aggregate data on user activity periodically. This allows suspicious activity to be flagged. For instance, if a fake user listens to more than 5000 songs a day, and more than 95% of these are from the same publisher, this could be a scam.
Queries are requests for information from the database. RDFox supports the standard query language SPARQL. SPARQL queries are sent to the knowledge graph, through the command line interface, referred to as the shell, or through a web interface, known as the RDFox console. The shell can also be accessed through an integrated development environment (IDE) for example, Visual Studio Code.
The following images provide the result to a query requesting Japanese Alternative Rock bands where all the members are female in the RDFox console, and RDFox Data Explorer tool.
For more information on SPARQL and RDFox, read this article.
Most users of modern-day applications do not know how to write queries in SPARQL, or read query results. As a result, OST Music had to create a way for the users to easily interact with the music database. This can be done with a predefined set of queries for the user to ask. However, to provide a high-quality experience for the users, Natural Language Processing (NLP) was supported. NLP can process and translate natural language into SPARQL queries for RDFox, and convert the answer back into natural language.
NLP allows for advanced semantic search, which goes beyond keyword searches, to provide more depth and scope for correct answers. Semantic search allows users to interact with the database, resulting in more personal search experiences. The application of advanced semantic search has benefits for other industries for example, e-commerce and e-services.
To learn about RDFox and faceted search, read this article.
Music streaming services, and other real-time applications need to be supported by appropriate software. RDFox’s incredible speed and incremental capabilities allow the platform to operate in real-time. It can load data, materialise data through reasoning, and incrementally update the knowledge graph, with almost no iteration time. For example, this music knowledge graph contained 83 million triples and 7.4 million artists, all of which loaded in RDFox in 133 seconds.
The music industry is a dynamic space with new releases and new artists around the clock. The incremental features of RDFox, a novel development within the knowledge graph industry, allows for incremental updates to both the data and the rules. This underpins truly responsive applications, which allow for constantly evolving data, updated in real-time.
By linking, validating, enhancing and querying data in record speed, the music streaming platform can provide exceptional service for their users. The methods and benefits of using a high-performance knowledge graph and semantic reasoning engine to underpin responsive applications, can be extrapolated from this example and applied to many industries and use cases.
To learn more about RDFox visit our website or check out our medium publication. To try RDFox yourself, you can request a free 30 day trial license here. To request a demo, contact us at firstname.lastname@example.org.
Team and Resources
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.