Semantic user management

Published in

Wallscope

8 min readSep 20, 2018

How we built access/permissions into our data

It must come as no surprise if you’ve heard about Wallscope before that we like semantic technologies.
Linked Data/Semantic Web Techniques are a big part of our work, and because we’re always moving/changing/updating/developing the ways we work with data, we are always keen to explore new solutions and learn how Linked Data can help us build better systems.

In this blog post, we’ll explore how Linked Data can help in managing a user base and how we can leverage Linked Data and open standards to build smarter data.

Before we start off, it’s good to define a few concepts.

What we mean by Linked Data

Linked Data is a group of practices and standards introduced by Tim Berners-Lee with the aim of bringing structure to a largely unstructured web of data that arose from the growth of the World Wide Web. Linked Data is often open, the so called Linked Open Data (LOD), but it does not have to be. Linked Data on Wikipedia

What we mean by user management

User management is the process of managing user entities in a database and related processes. For the purposes of this post, user management will involve authorisation, storage/retrieval and modelling of user entities. This post will however, not detail authentication, as that is a complex subject and this post should only be so long…

Building smarter data™

Since we started building linked data enabled applications, we realised that one of the major advantages of linked data is that you can make your data smarter.

By using RDF as our data model, we allow some of the logic to shift to the data itself, which means that our program can be simpler, which leads to less code, which leads to less bugs, which leads to happiness, which leads to world peace.

And that’s a fact.

Relations

With the advent of Object Oriented Programming, software programs started to model complex entities in ways that were hard to express in relational models, which lead to years of database design that focused on getting around the limitations of the relational model.

I was once told that you know you’re building a big system when you start turning people into numbers.

Turning people into numbers gives us a comfortable primary key for our user tables, which makes it easier to create relationships and linking tables.

Raise your hand if you’ve encountered a table like this one:

Linking tables are indispensable in expressing complex relationships between entities. And many-to-many relationships are a taboo in relational DBs.

Many-to-many is the bloody mary of relational database design.

We’re essentially creating a table that does nothing more than storing relationships between entities. This becomes a problem when the biggest value in your data are the relationships between entities, and when you have to traverse these links in a way that is more natural to a graph structure.

Surely there’s a better way. But the world runs on relations.

Making users people again

Actually, there is something we can do about it.

Entities in a database don’t have to be just an ID referenced in multiple linker tables…

Embrace the many-to-many!

Graph databases and their widespread use in companies that deal with Big Data and big data problems have shown us that if we treat this kind of data as nodes in a graph, a lot of these problems become trivial.

Exploring the relationship between nodes becomes a lot easier as well.

But graph databases introduce new problems, one of the biggest ones being:

How do you expect BigCorp.Inc or Small-Startup.io to adopt this kind of technology when it hasn’t matured yet or is not bound to a standard?

Are you Intrigued? Hi Intrigued, I’m RDF

RDF lives in triplestores. Triplestores are cool beings. They understand that even though you have decided to stay with them for the past day/month/year, you might change your mind in the future and seek a different experience with a different or multiple other triplestores. So they make sure your RDF is portable. They make sure that when you inevitably decide that you must seek stability or performance elsewhere, you can take your RDF and be happy.

Sure, certain triplestores might try to keep you from going to a different one by offering extensions to RDF, but at core, RDF is still a very powerful model without those extensions.

Now that we’ve introduced a lot of different ideas, problems and peeked at solutions, let us explore this in a practical scenario.

So, you have your typical User management situation, where we have the following users

and we have a list of files with certain attributes
(some of which may be separated by commas because sometimes you just can’t be bothered creating linker tables)

and then we have a set of rules about who can do what on which kind of files

Accountants can read and write on financial files
Everyone can see files that are Fun, only secretaries can edit them (don’t ask why, corporate rules ¯\_(ツ)_/¯)
Only C-level people can edit HR documents, but secretaries can see them
Only programmers can see code files (because no one touches my code!!!)

We’re about to get technical, if that’s not your thing, you should probably skip to the end where we talk about advantages of the different approaches we’ll look at.

On a relational model

The scenario might look something like this:

Users

User Roles

User Roles linker table

Files

File types

At this point you’re thinking I should probably have drawn an ER diagram

File types linker table

Read permissions table

Write permissions table

This is quite verbose, and it illustrates the problem quite well. Even with permissions that are not very granular, they are role based, and in this design, we have to create a new table for every permission. Not very scalable.

As you might be thinking — “Hey, you could put all permission in the same table” — true

But you still have to create a new column for every permission, and this example is not exhaustive, for example, there is no row that says that a SEC can or can not read or write a FIN type file, which means this table would be even bigger, for only 2 permissions and a few roles and file types.

Let us look at a different way of doing things.

On a graph model

For simplicity sake, the graph will be explained in an RDF format (Turtle), for the lack of a standard to represent non-RDF graph data. (I promise I’ll upload a nice picture in a future edit)

For consistency and comparability, we will use the same unique identifiers that we chose to be Primary Keys in the relational example to Uniquely identify our entities. When designing an RDF schema, some improvements could be made if we removed that constraint

Do not forget that the relational model is conceptual, to actually implement it, we would have to write the SQL that creates the schema and then inserts the values in each table. Also remember that SQL is mostly portable but not 100% portable, which might mean we would have to tweak SQL statements (probably not for such a simple example). Because of the expressiveness of RDF, this graph representation is a perfectly understandable definition of our data model, while being 100% portable and parseable by any triplestore.

Results, results, results

Ok, sure, the RDF example is shorter and there are no elements that make no sense taken out of context, like linker tables, but what else is there about RDF that makes it better?

Let’s have a look at how we could query this database .

It’s a file database, so we might want to get a list of files that we can read.

So, say we’re logged in as jane@wallscope.co.uk

A SPARQL query that would get all files that Jane can see would be something like:

Which returns

As expected, Jane can see the code file, because of the programmer role, the World Cup Sweepstakes file because everyone can see it (due to it being visible to w:User and all user types being a subclass of w:User),
and the Hiring file, because of the implicit w:C_level role.

If you remove the comments and the CONSTRUCT part of the query, we have a very expressive query that explores advantages of RDF/linked data, such as inference, property paths, filter by triple pattern matching …etc

?file a w:File ; rdf:type ?types ; rdfs:label ?label ; w:size ?size .
<mailto:jane@wallscope.co.uk> rdf:type/w:canRead ?types .

Yes I know some of those are features of the SPARQL language, not RDF, before you complain.

Finale

(Not a typo, just a fancy way of saying conclusion)

RDF and Friends (mainly SPARQL, RDF is not one of the popular kids) can provide new ways of making sense of our data.
The fact that it derives from an open standard means that our data does not become vendor locked.

In the age of rebelling against SQL’s monopoly on data storage, Graph databases provide us with a way of modelling our data in a way that is closer to the real world, and smarter data opens the doors to smarter applications.

Hopefully this will allow us to shift the focus to the things in our application that really make it different from any other application, rather than having to built yet another user system based on the only data structure know to database-people-kind (relations).

Knowledge Graphs are making the rounds, and Big Data keeps on growing. Amazon just got behind RDF by releasing Neptune and realising that a graph database makes more sense when it’s backed by a standard and technologies that have been in development for decades.

By turning our otherwise siloed data into a standards based format, we can make it part of our knowledge graph. Why should we keep users in our databases as randomly generated artificial primary keys, when we can have them as schema:Agents of foaf:Persons (sorry, foaf:People isn’t a thing).

I’ll leave you with a perfectly looped GIF, because everything makes sense.