How I choose between SQL and No-SQL solutions

Martin Hodges
14 min readMay 30, 2024

--

In this article I describe how I would choose between SQL and No-SQL databases for a solution. As part of that decision, I explore what part structured and unstructured data plays in the decision as well as other factors. It can be a complex decision [Article on software management].

No SQL vs SQL

I remember one university lecturer stating ‘I am not here to teach you to be an engineer. Engineering is about money!”

When I look back on my engineering career, I see he was right. Decisions on technology are as much about the cost as it is about whether the technology is right for the job. Both must be right to make a decision.

The same is true when deciding on No-SQL vs SQL.

Any discussion on No-SQL vs SQL, is normally a discussion on the database technology to use for a given solution.

Let’s begin with ‘being right for the job’ and for that we have to look at our data. Inevitably this starts with the question of whether our data is structured or unstructured.

We will then look at the non-technical aspects of the decision.

Structured vs unstructured

I hear many people talking about the fact that their data is unstructured. Some reports state that businesses run with about 80% unstructured data.

It is much rarer to hear people talking about their data being structured. This is probably because, at some level, all data is structured.

So, what does it mean to have structured or unstructured data?

There are two ways we can decide if we have structured or unstructured data:

  • Is the information about the data (its metadata) known before we write our code?
  • Will the data we receive change its form at runtime?

It is likely that the answer to both of these questions is ‘well, yes and no’.

Let’s take a page of text. This is generally classed as unstructured. This is kind of strange as it is very well structured. It is written in a language with a limited set of symbols and (generally) written using the grammatical rules of that language. All this demonstrates that our page of unstructured text has very structured metadata.

The actual meaning of the text could be anything. Thus, the semantics is unstructured.

So, we have data that is structured at one level but unstructured at another. This is the normal case for most data.

Truly unstructured data is random and can have few rules applied to it. They are limited to presence, size and the ability to transform it (such as with encryption).

For a solution with any extensive function, the software relies on the data or its metadata being structured. When you realise this, talking about whether you have structured or unstructured data is irrelevant. It is likely you have both.

Data Fluidity

Instead of talking about structure, I think it is better to talk about the fluidity of the data.

If your data conforms to strict typing rules, then it can be considered as solid, like house bricks. Its structure is set and immutable.

If your data does not conform to any typing rules, then is can be considered as fluid, like water. Its structure is not defined and is effectively random.

Now we can now look at the fluidity of our data. Undoubtedly it will fit somewhere between bricks and water, more like jelly (or jello).

Depending on the problem you are solving, data fluidity will tend to be closer to bricks or closer to water. Understanding that fluidity will help with understanding the technology you will require to support it.

Persisted data

I mentioned earlier that the decision about No-SQL vs SQL is actually a decision about the database technology or the persistence layer. Let’s examine what that means.

If you Google structured vs unstructured, you will see that most descriptions come down to the way data is saved (or persisted) in your database. You will see that this definition tends to polarise the storage solutions into SQL-based databases to store structured data (like bricks) or No-SQL-based databases to store unstructured data (like water).

There are other options but for this article I will use these two.

Let’s look at how this plays out.

Structured vs unstructured data

Structured data in SQL databases

Structured data has a fixed data definition (or schema). The database stores data in tables, with each new piece of data (or record) being saved as a row in the table. Each field within the record is saved as a column in the row. Each column is a specific type of data. In this way, all structured data is stored within a fixed, 3 dimensional grid of tables, rows and columns.

Because of the strict structure used to store the data, a Structured Query Language (SQL) is used to create, read, update and/or delete the data. Your code can now rely on the data and its structure when it is processing it.

You should understand that SQL databases can also support unstructured data as a blob, a string or even a JSON field. More on this later.

Unstructured data in No-SQL databases

Unstructured data has no defined schema. The database has no strict definition of what a record contains or what it looks like. It is completely agnostic and does not care what the data is.

You may now be asking yourself, ‘if there is no structure, how can we access the data?’

Well, even unstructured data needs to be referenced in some way. Typically, the tables and rows of an SQL database are replaced with collections and documents.

This makes sense. In common vocabulary, a document could have anything in it, and so it is with unstructured data. A collection is a set of documents that have some common theme to them, such as a collection of documents about fishes. Documents can have any type of data in them, even documents within the same collection can have different types of data.

It should now be clear that very fluid data is a much better match for a No-SQL database.

Relational databases

When we talk about an SQL database, we are actually referring to a Relational Database Management System (RDBMS).

In structured data, there can be relationships between our data. For instance, if we had a table of fishes and a table of fish tanks, we may want to know which fishes are in which tanks. This implies a relationship between the data, the fishes and fish tanks.

A relational database understands these relationships and uses references between the data records to represent those relationships.

As the relationships are so important, an RDBMS will specifically manage them, such as using referential integrity rules to ensure the references are never broken.

It is these relationships that allow SQL to manage the data across the database even when it is in different tables, rows and columns.

Non-relational databases

In a truly unstructured data set, there is no defined relationship between the data. Take books in a library. They may not be related to each other in anyway. They are just a collection (the library) of documents (the books).

Without a structure, to access unstructured data, you need an alternative to SQL. Whilst SQL is fairly-well standardised, No-SQL query languages are more dependent on the underlying storage technology. One thing they all have in common is that they are all referred to as No-SQL.

It is useful to think of No-SQL queries as more like searching for something such as a description of a goldfish in a library.

Whilst it is tempting to think that items of data (documents) in a No-SQL database have no relation to other documents, it is rarely the case. No-SQL databases support relationships but do not, necessarily enforce them with the same integrity as an RDBMS. This may or may not be a problem to you.

Many No-SQL databases now support a form of SQL.

Best technology for the job

Ok, so far I have mentioned that data is generally not fully structured and not fully unstructured. I have also stated that SQL databases can handle both and that No-SQL databases can handle both.

So how do we make a decision?

You need to consider how fluid your data is. If it is more brick-like, you will benefit from the strict rules enforced by an SQL database. If it is more water-like, you will benefit from the flexibility of the No-SQL databases.

There is one more scenario. In some cases, the structure of your data is not yet known and yet you need to store it. You will process it when you get better at identifying the structure. In this case, you do not yet know the level of fluidity of the data and you may be better to use a No-SQL database.

Hopefully you have already gathered that, when it comes to fluidity, most data does not sit at one extreme or the other but sits someone in between. It may be nearer to one end or the other and so you will tend to go for an SQL or No-SQL database as appropriate.

But how do you handle the other type of data?

Unstructured data with an SQL database

As I mentioned earlier, SQL databases can generally handle unstructured data. They can handle it as a blob of data. Similar to a file, a blob can contain any type of data.

Alternatively, you may also store it as text fields. A common practice is to store it as a JSON formatted string within a text field. JSON is a common format for fluid data (but actually has a strict structure itself).

The latest Postgres SQL database actually has a field type of JSONB that natively manages JSON formatted data. You can then access the data in the JSON field using SQL.

So you can see, there are several options for storing unstructured data in an SQL database.

Structured data with a No-SQL database

On the other hand, you may have selected a No-SQL database but need to have some structure to it so you can process it.

Like SQL databases, there options.

Given a completely undefined database structure (beyond collections and documents), you can store whatever you like, including structured data. The only thing that you need to do is to ensure that, before you store your data in the database, that you apply validation rules to ensure it conforms to the structure you want.

Doing this ensures that you can now rely on the structure of the data within your code.

Alternatively, if the data was not entered by your system, you can apply validation rules when you read it to ensure that it confirms to the structure you want. Of course, any failure in this validation may mean you cannot process that data and you will require a mechanism to remediate it first.

Need for consistency

There is another consideration when it comes to the storage technology decision and that is the need for consistency.

When a change is made to the data by, say, one user, how do other users experience that change?

There are two main experiences:

1. Given one or more changes occur during a single transaction, other users do not see the change until the transaction ends. At that point everything that changed during the transaction will be saved together and the other users always see a consistent set of data.

2. Under the same circumstance, each change is saved separately and so other users will see those changes immediately, meaning the data could be inconsistent until the transaction is complete.

The second option is called eventual consistency, referring to the fact that there is inconsistent data for a short period. In the case of a failure, the data could be left in an inconsistent or even an invalid state.

You have probably already worked it out that SQL databases deliver the option 1 experience whilst No-SQL databases deliver option 2.

So, if your solution requires all data to be shown in a consistent state, you may need to use an SQL database.

However, if you can tolerate inconsistency for a while, there are benefits. By not ensuring consistency, the No-SQL database can scale horizontally. The replication across multiple nodes can be done as needed. This means that the No-SQL database performs better with larger data sets and also allows content replication across geographic areas, improving responsiveness, especially if the data is read more than it is written.

You may be asking why ‘eventual consistency’ could ever be tolerated in a solution. After all, isn’t the point of using computers because they provide that level of certainty?

In actual practice, if you are adopting a microservice architecture, backed by an asynchronous event or messaging queue, then you have already accepted eventual consistency as a valid design. For most solutions it appears to be an acceptable strategy.

Now we have consistency and fluidity as technical factors in the decision to choose one technology over another.

How I make the choice

Up to now, you have probably thought about your particular problem and solution and decided on one technology or the other. Remember, though, I said it was not just about the ability to do the job but how much it will cost you.

This cost is not only the cost to develop the solution but also the cost to operate and support it. Together these form the Total Cost of Ownership (TCO). The TCO considers costs over a period of time and so can include costs such as the following:

  • Missed opportunity costs (time to market)
  • Initial development
  • Enhancements and evolution
  • Maintenance
  • Quality assurance
  • 3rd party licensing and support
  • Service reliability and availability
  • Scalability

In reality, your technology choice may be more to do with TCO than it is with technical capability. This is generally because there is a budget allocated to the solution that caps the TCO.

These costs can be broadly categorised as:

  • Time to market
  • Development effort
  • Operational support and maintenance

Let’s look at each of these in a bit more detail.

Time to market

Every project has a deadline. This may be because it needs to meet a market condition, a revenue target, cost savings or other dependency.

Missing that deadline will likely have a financial impact on the company.

This drives the need for a solution that minimises the risk to delivery timeframes.

As such, a decision based on time to market does not favour one technology or another but the one with minimal implementation risk. This is generally the one that you already have or with which the delivery team is most familiar with.

This type of decision could lead to a higher level of technical debt but like all debt, it may enable you to meet your current business targets earlier.

Development effort

It is often thought that the adoption of a No-SQL database speeds up development as the storage layer does not have to be designed and implemented. This, theoretically, speeds up and de-risks delivery but, in reality, any saving in time will be marginal.

In fact, whilst the data layer does not need to be configured, it still needs to be designed and the code written to enforce that design.

No-SQL databases do tend to favour unknown data structures and unknown business requirements. This makes them ideal for scenarios where you need a solution now but know that the requirements will not be confirmed until later. Using a No-SQL database can lower the amount of technical debt that such decisions create.

Before you make a decision on technology, you need to consider your development team. When it comes to time to market, quality of the solution (in terms of functional, non-functional and security requirements) and reduction in technical debt, using a technology that the development team knows is better.

If the time and budget is available for the development team to either acquire skills in the new technology (whichever it is) or for the team to be augmented with relevant skills, the development time is not a factor in the decision but experience shows this is rarely the case.

Operational support and maintenance

Ok, you have selected a different technology to your current one and have developed a solution around it. Everything has gone well and you are happy with your decision. You place the solution into production and now your problems start.

You find that your maintenance support costs increase because the new technology licensing does not fall under existing agreements and so they become an additional cost.

Your operations team (including level 1, 2 and 3 support) does not understand the new technology and need to go up a similar learning curve to the developers. This impacts your project delivery timeframes and/or the quality of your service.

New operational support tools need to be built to allow your operations team to manage faults of different severities. Mistakes are made as people learn and this could impact your users. Restoration times could increase.

All of these are impacts to your business and need to be considered (and planned for) when you think about changing technology.

Wrapping it all up

So far, in this article I have examined things to be considered when selecting a database technology. In summary, the decision is more complex that ‘I want to work on xyz’. At a high-level it includes:

  • Fluidity of data
  • Need for consistency
  • Time to market
  • Development effort
  • Operational support and maintenance

Hopefully I have shown that the gap and differences between No-SQL and SQL databases narrows each day. Whilst each has different set of characteristics that may benefit your project, overall, these unique benefits are becoming increasingly smaller.

Everything being equal on a technology basis, we then turn to the business impacts, which generally come down to financial impacts. Unless you have the luxury of a greenfield project and the ability to recruit a new development team, there will be existing factors that will tend to favour the incumbent technology.

In my career I have been involved with technology selection and seen the challenges it poses. I have had the benefit of a greenfield project and have been able to recruit my own development team. I have also inherited embedded technology and development teams and have had to look at different options.

In almost every case, the question that I have to answer is about the impact to the budget and timeframes. Like all debt, technical debt will be sorted out later and will, most likely, be amassed either way.

Anecdotal evidence suggests software becomes obsolescent in 5–8 years before it is replaced. This time frame is based on movements in overall contemporary architecture, changes in the market, competitor pressure for suppliers to improve their technologies etc. How would you use a credit card that vanished in 5 years?

So, when it comes to selecting between No-SQL and SQL, I would apply the following priorities to the decision:

  • Data fluidity (either at design time or runtime)
  • Need for data consistency
  • Available budget / timeframes
  • Existing technology and team capability
  • Scale and performance

We can examine this through a set of scenarios:

Let’s say we have two different projects:

Project #1 — a fluid data set, no need for data consistency and the need to massively scale.

Project #2 — relatively solid data set, need for consistency and no requirements for massive scale

  1. Given a greenfield project, #1 would see me select a No-SQL option.
  2. Given a Greenfield project, #2 would see me select an SQL option.
  3. Given we have existing technology and associated skills, and sufficient budget and timeframe to change technology, the same selections would apply (#1 = No-SQL) and (#2 = SQL).
  4. Given we have existing technology and associated skills, with no budget or timeframe to change technology, I would use existing technology to address either #1 or #2 with an appropriate strategy to address any compromises.

Summary

In this article I examined some of the factors I consider when deciding between a No-SQL and SQL database.

Primarily the decision is based on the ability to ‘do the job’ but increasingly this is a mute argument as technologies improve to reduce the gap.

Next, the decision is based on the TCO both in terms of the cost of development, the cost of support and maintenance and the cost of operations.

it is to be noted that the business case for introducing a new technology is much harder to justify than utilising what you already have, given there is no ‘deal breaker’ on the technology side.

Everyone’s project, business and decision framework is different but I hope I have shone a light on the factors that you should consider.

I hope you enjoyed this article and that you have extended your skills by learning something new, even if it is something small.

If you found this article of interest, please give me a clap as that helps me identify what people find useful and what future articles I should write. If you have any suggestions, please add them as notes or responses.

--

--