IKEA’s Knowledge Graph and Why It Has Three Layers

Katariina Kari
Flat Pack Tech
Published in
5 min readAug 24, 2022

--

At IKEA we are building a knowledge graph to improve the overall experience of our customers in the physical and digital space. I like to use the following layered pyramid depiction when explaining to stakeholders what a Knowledge Graph (KG) consists of. I saw it for the first time when I watched this video by Dave McComb explaining the gist ontology.

Layers of a Knowledge Graph: concepts, categories, data

The pyramid defines three layers, which are concepts, categories, and data. For example, in IKEA a concept would be a product, a category would be a bookcase, and one data point would be the BILLY bookcase in white 80x28x202 cm.

Technically, a KG consists of two layers: kind of things and individual things, this is analogous to classes and instances in object-oriented programming or to ideas and the real world from Platon’s Theory of Ideas. In this three-layered pyramid the middle layer, the categories, are technically individual things but are expressing information very close to that of kind of things, that is why it makes sense to distinguish them as a layer of their own.

In IKEA the top layer represents the business concepts central to what the company does. Typically, these concepts lie in the number of hundreds, and as Dave McComb illustrates in his book The Data-Centric Revolution, they are something one can take home with them over the weekend and learn by heart. If one is using the W3C stack for defining the KG, this part would be the ontology, so the definition for classes and properties. If one uses an LPG like Neo4J to build the KG, the concepts would be the node labels and relationship types.

Categories represent the controlled vocabulary or terminology used by IKEA and they lie in the number of thousands. Different experts in IKEA determine what the terminology is. Someone is responsible for defining all the ways in which we categorise our products: bookcase, sofa, coffee table, etc. In general, categorisation is different in every company and it depends on their use cases: some terminology is represented on the concept level, others on the category level, and some even on both levels. For example, in the music industry, an instrument would be a concept, where then the individual instruments, such as violin, piano, cello, etc., would be the categories. Now, a particular musical composition would be categorised by the key instruments that are needed to perform it, e.g. a piano trio or a violin concerto.

The third layer is the data layer. At IKEA these are all the products we are offering to our customers defined by their Swedish iconic name, their colour, and their size information. Our KG is customer-facing and it enhances the digital experience of IKEA. If we would use our KG for logistics, the data layer would distinguish between every single physical product, so each of the many BILLY bookcases in white of the size 80x28x202 cm sold around the world in the stores.

We use the distinction between these three layers to organise our work for building a KG. Since the concept layer is small and creates a company-overarching terminology it makes sense to define it manually with robust governance policies. Categories by definition need domain expertise and agreement so it is an internal collaboration including many subject matter experts. Finally, the data layer is so large it needs to be automated and developed in a test-driven way.

My friend and colleague in the semantic web community, Juan Sequeda, pointed out that one theme that seems to be repeating when companies are forming data strategies is that of centralisation and de-centralisation. Inspired by this I ventured to look at how the three-layered pyramid is authored, owned, and stored in an enterprise based on my applied experience in building enterprise knowledge graphs.

Authoring, Ownership, and Storage

Layers of a Knowledge Graph — Authoriship is centralised for concepts, decentralised for categories, the data graph is automatically generated, concepts and categories are stored centrally and data graph is stored decentralised.

Concepts are authored centrally by the business concept owning team, which in practice is a team of ontologists, those who translate human concepts into a machine-readable format. The concepts need to form a logically coherent structure because they are used to create powerful queries over the data layer and for reasoning. Since their number lies in the hundreds it makes sense to define them manually and centrally.

Categories are best authored in a decentralised manner because the subject matter experts of the enterprise are the ones who decide which terminology should be used and they are scattered around the company. Their number lies in the thousands so dividing the work between expert teams helps to manage the workload.

The data layer is a combination of one or many data sources with references to the category layer, thus its ownership should remain at the data source as also stated by the data mesh idea of data as a product.

At IKEA we store the concepts and categories in a git repository. This way we can tag releases, use versioning, review them line-by-line, and automatically trigger quality assurance tests via git actions.

Depending on performance and usage needs the contents of the data layer can be stored in two following ways: it can be persisted in a database or virtualised by storing instructions on how to access the original data source. Virtualising the data layer means it can be stored in a decentralised manner and accessed at its point of origin. Persisting the data layer might lead to optimised performance, however, if the data ownership is not clear and enforced, this leads to the problem of multiple deferring copies of the same data, which leads to inconsistency in data, the current data problem in most organisations. We have chosen to persist our data layer and keep it updated with respect to its upstream.

Envisioning our KG on three layers is not only useful for organising our work but also for communicating its contents to non-technical stakeholders. While they might not grasp all the technical implications the layers provide, it helps them to focus on what kind of content they need to provide to make the KG successful. It also manages their expectations on what the KG can provide them in return.

--

--