The music industry is a dynamic space, with daily new releases, artists, bands and albums. Information on the industry is vast, presenting music platform providers with a great challenge, if their aim is to provide a complete, up to date service for their users.
This two-part article will demonstrate how RDFox can be used within a music streaming service, to link, enrich, validate and query large datasets, with record accuracy and speed. The provider can operate a responsive application, which obtains real value from their data. This use case nicely demonstrates the power of RDFox and its applicability to modern day applications.
RDFox is a knowledge graph and semantic reasoning engine. RDFox is a triplestore, which provides more flexibility than the strict tabular structure of an SQL database format, and allows various data sources to be easily linked, using reasoning. As an in-memory solution, RDFox is extremely fast, even for very large datasets. The powerful reasoning engine is unmatched in power and is the fastest graph-based querying system available.
RDFox uses semantic reasoning, also referred to as rules, to encode domain expertise into the knowledge graph. Rules can be used to compute metrics, identify missing features, categorise behaviours, discover repeated patterns and highlight inconsistencies. And what’s more, in RDFox this can be done incrementally. The music provider no longer has to worry about having a consistent database or responsive service — it’s all managed by RDFox.
In this case study, a hypothetical music platform ‘OST Music’ was created. Users can create a user account, listen to music, search for artists and songs and songs are recommended to them.
For functionality, the platform needed a complete understanding of the music industry. A dataset containing a broad amount of music industry data was created and stored within a knowledge graph which can be queried by users in natural language. The music knowledge graph incorporates three data sources: Wikidata, Discogs and MusicBrainz. Each source has strengths and weaknesses.
On their own, not one of these data sources has enough information for the music platform, but linked together, they provided a wealth of knowledge which can be cross referenced for validity. Each data source is in a different format. RDFox is a Resource Description Framework (RDF) triple store, so it requires information to be imported as triples.
- Wikidata exists as triples. However, a process of data trimming was required to pull out the relevant information on the music industry. This can be done with
CONSTRUCTqueries. After trimming the data, the triples were imported using a simple import command in RDFox.
- Discogs data is in XML format, this requires a conversion to triples using a custom-built parser to extract an equivalent RDF representation.
- Finally MusicBrainz is built on PostgreSQL which can be integrated with RDFox.
Initially, the information was stored in three separate knowledge graphs within RDFox. Using RDFox’s reasoning capabilities, rules are created which link the three knowledge graphs together into a fourth, unified graph containing information on artists, bands and recordings.
In RDFox, rules are expressed in Datalog and represent ‘if-then’ statements. Rules are used to determine that an artist named in Discogs, is the same artist that is found in MusicBrainz and Wikidata, and then the information is stored on this artist within the unified knowledge graph (in grey).
This artist found in the unified knowledge graph is equivalent to the artists found in the data source knowledge graphs. However, to prevent four artists being returned when the knowledge graph is queried, rules are used to establish that the data from the three sources represents the same artist (ostmusic:artist/1), as seen below.
This example of linking data is applicable to other use cases and demonstrates the flexibility and power of reasoning with knowledge graphs.
For tips and tricks on writing rules read the article here.
To enrich the data, rules were used to materialise new information. This allows users to ask simpler queries and get the results quicker. One example of enriching the dataset includes calculating the count of members in a band.
‘The Knife’, has two members. This count is stored within the unified knowledge graph, so it is directly queriable by the music platform’s users. RDFox’s unique incremental reasoning capabilities mean that should a new member join ‘The Knife’, this number will be updated to three. Similarly, if one of the members leaves, the count will be updated to one, immediately, and automatically.
Using rules again, RDFox knows that a band with two members is called a ‘duo’, three a ‘triplet’ and four a ‘quartet’, etc. This increased the users’ ability to query the knowledge graph, for example, if asked for a ‘Swedish Electronic Duo’, RDFox knows that the user means ‘a band with two members’.
The following image provides an example of how to label a quartet:
OST Music want to provide recommendations to their users, this requires the data to be enriched, and is done in a number of ways.
Hierarchies were established which enriched the data within the knowledge graph, harnessing the ontological design of RDFox. A process of mapping for genre and location allows information on the music industry, stored as relationships between the data points to be added to the knowledge graph. By understanding genres and locations, the music platform can offer a more diverse recommendation experience, without compromising user satisfaction.
Additionally, recommendation services can suggest tracks based on streaming history, or songs listened to by similar users. With rules, RDFox can also determine trends by assessing the increasing popularity of genres or artists, based on the interaction of users with the platform.
It is also possible to discover similar entities through determining compatibility or finding similar patterns within the rich graph of connections. Thus, the music platform includes information on covers of the same track (e.g.Que sera sera, in different styles by different artists), alternative versions (e.g. remixes or reworks) or songs by the same artist. For more information on how RDFox can be used to determining compatibility between entities read our article on configuration.
Part One has explained how RDFox can be used to link and enrich data, providing a unified knowledge graph for users to query, and enriched results. To find out how OST Music validated and queried their data, and view performance statistics, read Part Two.
To learn more about RDFox visit our website or check out our medium publication. To try RDFox yourself, you can request a free 30 day trial license here. To request a demo, contact us at firstname.lastname@example.org.
Team and Resources
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.