The Artist and the Data Model

(Translation: 3.1 Data Modelling)

Its been a while since the last post, mainly because I’ve had to dig deep to get this week’s work done. But I tried not to just rush it to completion because this week was just like the work we did with prototyping - an important part of the foundation.

The first task for this week was creating a data model for our application.

What is a data model?

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities.A data model explicitly denotes the structure of data in graphical form.

Why create a data model?

When you create a data model, it makes you focus on the details. Also, you can present your model to a person and gather feedback/ideas before starting development. Even if it’s a business-oriented person (usually the customer) who does not understand the details of creating software, data models work as a common language and enable mutual understanding.

Different stages of this process will require representations of different complexity and detail. There are three styles/levels of data models that are usually used:

  1. Conceptual
  2. Logical
  3. Physical

The models evolve from ‘Human-oriented’ to ‘Computer-oriented’.
Instead of describing each in detail, I will use this table which manages to capture their properties and differences perfectly:

Source: Data Modeling for Big Data apps - LinkedIn.com

With that understanding of data models, I can now discuss the one that I designed for my application.

Refer my previous blog-posts if you would like to get some context about what I will be discussing:

Idea:

Prototype:


Time to dive in to my data modelling process.

Stage 1: Pencil and paper (as always)
Before I started working on a data modelling software, I wanted an outline of the model that I would be making, so that I could just fill in the details. To get this done fast, I just sketched it on a piece of paper (the other side of the same page on which I made my prototype! Talk about saving trees!)

Same camera. Same horrible quality.

Even though I did not know the features of Conceptual and Logical data models at the time of making this diagram, I think I have captured them appropriately here. To me, this is a great indicator of how intuitive the design process stages are, if you take small steps.

As you may notice later, my final data model is not exactly like the one I have made here. Also notice the ‘?’ after the ‘categories’ attribute in the User entity. This was because I had not yet been able to decide how I would be implementing a multi-valued attribute.
These highlight the fact that I used this an iteration in my process in order to have a base to improve upon later, instead of listing out details from scratch.

Stage 2: Data modelling software - Vertabelo
Now that I had the basic idea in front of me, it was simply a task of choosing an appropriate software and translating the sketch into a schema with details.
For the software, there were many good options such as DbDesigner.net, LucidChart and PonyORM (which was recommended in one of our webinars). However, I ended up using Vertabelo, mainly because it had all the features I was looking for (while the other options had a feature or two missing, or just implemented differently). There is one drawback to Vertabelo: it is available only on Safari and Chrome; this was not a problem to me since I had Chrome installed.

Finally, the complete data model:

I will briefly discuss the tables, views and relations in this schema.

User: Captures all the information required for a particular user (recall that only registered users will be using this application). Apart from the obvious ‘uid’ (primary key), ‘username’ and ‘email’, it has the ‘type’ attribute which denotes whether the user is a Helpee or Helper. The dual state of this attribute made me put it down as a boolean, but I will decide if I want to implement it so. The ‘available’ attribute denotes the Offline/Online status of the user, to be implemented as a boolean. This is important since in the application, Helpers will only be able to respond to Helpees who are online.

Problem: Captures all the details for a particular problem; ‘pid’ (primary key), ‘category’, ‘title’ and ‘description’. Additionally, ‘solved’ denotes the solved/unsolved status of the problem and hence, is a boolean. This is important since only unsolved problems will be displayed for the users.
‘user_id’ is a foreign key referencing the table ‘User’ and denotes which user has submitted that problem. The relation between ‘User’ and ‘Problem’ is a one-to-many relationship since a user may ask several problems while a problem can only have been submitted by a particular user.

Connection: Represents the state after a Helper responds to a particular problem and the Helpee (who is online) also follows the notification for the response.
‘helpee_id’ and ‘helper_id’ are foreign keys, referencing the ‘User’ table. I have shown two different one-to-one relationships instead of a single one-to-many relationship, to signify that only two users, each of different ‘type’, can be part of the connection/room.
‘prob_id’ is a foreign key referencing the ‘Problem’ table. It also forms a one-to-one relationship since a connection is formed over a single problem only.
‘helpee_id’, ‘helper_id’ and ‘prob_id’ together form a primary key since all three are required to identify a unique instance of a connection.

Categories and Expertise: This is an extension from the Stage 1 model. I had left the implementation of ‘categories’ attribute from the ‘User table’ to be decided later. The idea is to have multiple categories for each user which they can claim to have expertise in. So that would form a multi-valued attribute, which is not desired. I considered three ways of implementing this:

  1. Have a fixed number of categories (say 3) for a user which can all be filled or left empty i.e., a user can have 0 to 3 categories of expertise only. For this, three attributes in the ‘User’ table would suffice.
  2. Have an ‘Expertise’ table with two attributes: ‘user_id’ (foreign key) and ‘category’ (a string). Both the attributes together would form the primary key.
  3. The way I have implemented it (explained below).

The problem with the first was that it wasn’t scalable. I did not want to have to change the data model later in order to expand the functionality.
The second had no problems, except that the categories could be anything. What I wanted was to have some fixed categories from which the user could choose while signing up. Now these 10–15 could later be scaled to 50–100 categories or more. 
So I wanted a table which was purely a list of categories that can be chosen. For this, I have the ‘Categories’ table, which is essentially a integer-to-string mapping, from ‘cid’ (primary key) to ‘category’ (a string).
To link the user to his chosen categories, I set up the ‘Expertise’ table. ‘user_id’ is a foreign key referencing the ‘User’ table while ‘cat_id’ is a foreign key referencing the ‘Categories’ table. Together these foreign keys form a primary key.
The relation between ‘User’ and ‘Expertise’ is one-to-many, signifying that one user can have multiple categories of expertise. The relation between ‘Categories’ and ‘Expertise’ is also one-to-many, signifying that multiple users can have the same category of expertise.

Views:

  1. Helper_screen_list
    From the Helper screen of the prototype we can see that there is a scrollable list which contains all the problems which belong to the Helper’s categories of expertise. To populate this list, we have a view which contains ‘pid’, ‘title’ and ‘category’ for problems such that the categories are matching the Helper’s expertise. The title and the category of the problem will be displayed in the list; and only when a particular problem is clicked, its complete details will be fetched from the ‘Problem’ table using the ‘pid’.
  2. Helpee_screen_list
    From the Helpee screen of the prototype we can see that there is a list of problems that the user has already submitted, which are still unsolved. To populate this list, we have a view which contains ‘pid’, ‘category’, ‘title’ and ‘description’ of all problems, whose solved status is FALSE i.e., unsolved, and are asked by a particular user.
    You can see how Vertabelo allows you to generate the view using an SQL query:
Forgot to include ‘pid’ in the query, which is definitely required. Will update the model during implementation.

That description turned out to be not-so-brief.


With that piece of the foundation in place, I will barely have to think while creating tables in the Hasura console. That reminds me…

Note: With an update in Hasura local development, it is now possible to expose your local project to public Internet with a single command. Check my blog-post about Hasura local development to see the updates.

After creating the tables, I will move on to the data and authorization API calls, which are can be smoothly executed using the Hasura Data and Auth APIs. That will be the focus of my next two posts.
Look forward to it.