Photo by Taylor Vick on Unsplash

Basics of Databases

Eliot Howes
Eliot’s Journey
Published in
6 min readSep 16, 2019

--

A brief overview of databases and how to choose a database for your project

imgur.com/gallery/E5gN1eK

Data really powers everything that we do.
Jeff Weiner — CEO LinkedIn

Data is information — both quantitative and qualitative, numbers and words, really just about anything is data. It can be measured, collected, reported on and analysed and as programmers this is what we do when build our applications. We collect and use data.

A database is a storage solution for this data, somewhere for it to go — just like a Billy Bookshelf from Ikea or Tupperware for your lunch. As the data we want to store can be so varied there are many different types of databases, each with its own pro’s and con’s.

This will be a VERY brief (as the world of databases is vast and sometimes complex) overview of databases and what to think about when selecting a database for your project.

Relational vs Non-Relational / SQL vs NoSQL Databases

Relational Databases

Relational databases are structured around tables which is something we are all likely to be familiar with from programs like Excel. In a relational database we can have many different tables which each have unique names for example Users.

As you would expect our tables are divided into rows or columns. Columns hold the specific characteristics or attributes that we give to our table and the Rows hold data that belongs to a single record (sometimes known as instances) in our table:

A basic Users table showing attributes (columns), and records 0–6 (rows)

As stated we can have multiple tables that make up our database. For example if we were creating a blog site a User might write blog posts so we could also write a Posts table. If we think about ‘the real world’, these two things are linked as a user would write a blog post and that blog post would belong to that user, so we would say these two things have a relationship — hence Relational Databases.

The way in which these relationships are defined is through the use of Keys. When data is inputted into a table or a new row is create (a new instance or dataset is created) we can assign a Key to this row to allow us to easily identify our data in the future (this is know as the Primary Key). Keys tend to be a number (but don’t have to be) and most Relational Database Management Systems (RDBMS) will automatically assign a numerical key to our data which auto increments when new data is inputted.

We can then use the Key from one table in another to demonstrate our relationship for example:

Our Posts table has rows which have their own primary key (1–3) but also have a column called user_id which takes the key from a row in our Users table (known as the Foreign Key). This means that the blog post now belongs to that specific user (here the user with the id of 1).

This is how Relational Databases work as the most basic level. When we would like to retrieve data from our tables we would then use a language like SQL (Structured Query Language), it is beyond the scope of this blog to discuss SQL but they are thousands of resources out there if you want to know more.

Non-Relational Databases or NoSQL Databases

Usually called NoSQL databases, non-relational databases are quite different in the way they are structured to relational databases. NoSQL does not mean that these databases cannot be queried using SQL although often they are not — NoSQL actually stands for Not Just SQL meaning they can be queried in other ways.

The biggest difference between these two types of database is that non-relational databases do not store information in tables and instead use different structures for storing data.

Broadly these structures are document, key/value pair, column or graph data stores.

Document data stores are a collection of key value pairs where its values are referred to as documents. These documents are often in JSON format and provide some structure and encoding to manage the data. It is usual for a document to contain the entire data set for an entity for example a customer and their previous shopping orders.

https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data

The application would then be able to retrieve this data by using the documents key. This key is a unique identifier for the document and is often hashed.

Key/value pair data stores are similar to document data stores — they are really just large has tables where data values are associated to a unique key. Values are hashed and stored but unlike document data stores can only be queried by their keys. This means that more complex queries are difficult and sometimes not possible.

Column data stores on the face of it look like relational database tables as they are organised into columns and rows. Instead of only containing a single piece of data like relational database tables, column data store columns hold multiple pieces of related data that are often queried and manipulated as a unit for example:

https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data

Graph data stores are made up of nodes and edges. Nodes represent entities and edges the relationship between these entities. This type of data storage is vert useful when querying the relationship between data:

https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data

So how do you know which database to use

In short, this is a challenging question to answer as there are many factors that would influence your decision.

It mostly boils down to three things:

  • Structure
  • Size
  • Speed and scalability

The main advantages of NoSQL databases are they do not require strict schema as in relational databases and are usually more scalable and therefore tend to be used when dealing with large data sets or more specific data sets. You can get databases up quickly and start building your app out without worrying too much about your data structure by defining a schema.

Relational databases on the other hand have strict schema and therefore have non dynamic data. SQL databases are ACID complaint which is very useful for things like banking where you do not want data to be taken from one table if the other table is unable to receive the date for example when exchanging money.

Like I said this is a very brief overview of the pro’s and con’s but there are many resources where you can read more and much of the data in this blog came from here which is a good starting point: https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data

Until next time…

--

--

Eliot Howes
Eliot’s Journey

Personal and Technical blog about all things software development. Full Stack Software Developer - London.