RDBMS vs. GRAPH

Kelsey Whitehead
4 min readMar 20, 2017

--

While researching graph databases, I have frequently been struck by how simple and usable they seem to be, and how much use they could provide for systems requiring a relationship driven approach to data storage and querying. I have been wondering regularly why I haven’t heard of them before beginning this project, and why it seems like they have been so vastly underutilized. The obvious answer is that people like to do things how they’ve always done them, or because of concerns regarding the process involved with switching to a new system. In this entry we’ll look closely at what type of data could benefit from a graph, and how to begin the process of transferring your system to a graph database.

As previously mentioned, the strength of a graph database lies in its relationships — the ability to directly model the connections of a set of data. For data sets that feature a large amount of relationships the overhead of modelling them can be significant. Let’s look at an example of this

Used with permission — Southern Alberta Institute of Technology

This is an ERD I’ve used during my studies at SAIT. It features quite a large amount of connections between the tables, with bridging tables used to resolve many to many relationships. Within a traditional data model, the foreign keys are how the model defines relationships — for example, within the COURSE_REGISTRATION table there are a number of foreign keys (student_id, course_code, section, semester, year). These are used to create a relationship between a row from the STUDENT table and a row from the CLASS_SECTION table, in this case a ‘registered in’ relationship. Looking a little closer though, you realize that this is a table composed of six columns with only one containing data that is actually unique to this table (the grade column). The rest of the data is already stored within the STUDENT and CLASS_SECTION tables, and is repeated purely to define the relationships between objects, and link a single student to a single class section. It might not seem like much, but with any reasonably sized school this table could quickly balloon out of control.

(As an illustration, SAIT had 4,378 students in its graduating class of 2015. If each was enrolled in 5 classes for 4 semesters then that would be 5 classes x 4 semesters x 5 columns x 4,378 students — a total of 437,800 pieces of data within the COURSE_REGISTRATION table per graduating class.)

Using this model for data-sets like this can also create a headache for developers trying to perform complicated queries requiring a large amount of joins, and all those joins can slow down a system that regularly needs to perform those complicated queries. This is where a graph database can assist with the simplification of a data set like this.

RDBMS to Graph

Let’s convert the above database into a graph. To do this, we’ll need to complete 3 steps

Step 1: Find the nodes

Step 2: Find the relationships

Step 3: Manage bridging tables

First, let’s find the nodes. The simplest way to accomplish this is to locate the objects within the database. In this case the STUDENT, FACULTY, CLASS_SECTION, and COURSE tables are all representing real life objects and will be our nodes, while EXPERTISE and COURSE_REGISTRATION are bridging tables providing a basis for defining a relationship between some of those objects.

Second, we need to find our relationships. This is done by examining the foreign keys. Whenever a table contains a foreign key from another table, we can assume that there is some type of relationship between them. For this step, we’ll ignore bridging table relationships.

For our example, the relationships would be (using the cypher notation utilized by Neo4j)

· (FACULTY) –[INSTRUCTS]-> (CLASS_SECTION)

· (CLASS_SECTION) –[ONE_OF]-> (COURSE)

· (COURSE) –[PREREQUISITE]->(COURSE)

Next we need to turn the bridging tables into relationships as well. Graph databases are able to contain many to many relationships, so the process of converting these tables is as simple as making them a relationship that contains properties for any unique data they contain (like that grade value discussed earlier)

· (STUDENT) –[ENROLLED_IN]-> (CLASS_SECTION) /grade

· (FACULTY) –[EXPERTISE]-> (COURSE) /expertise_level

Now we can make our ERD

Simpler!

This looks much simpler than our original ERD, but is able to contain all of the information the original could.

Relational databases have been in use for a long time. Oracle was originally released in 1979 — and the relational model continues to see widespread adoption across a large variety of applications, for better or worse. For many of these applications, using a RDBMS may be the easiest option for now, but in the long run may be adding unnecessary difficulties to their systems. There are a variety of NoSQL database types available, and I believe many companies would benefit from taking a close look at their current system to see if there is any room for improvement.

--

--