NoSql for Data storage

daniel paes
Sep 4, 2018 · 4 min read

Introduction

Within Big Data scope, it ain’t a secret the existence of tons of buzz and/or hype words that are only useful for increasing your self-selling power to unadvised clientele. But besides this swamp of words there are a lot of good stuff to be excited about, no questions asked! Inside this impressive domain it brings a lot of emerging new technologies, helping us to solve some issues with a more optimised approach. This article will present you an overall introduction to use NoSql databases for data storage, showing the characteristics of each of them. Hopefully at the end it could gives you an overall understanding to help you out with your choice (or helped you to kill you some time lol).

What is NoSQL:

The acronym stands for “Not Only Structured Query Language”, and it is very popular on big data projects due to its horizontal scalability (fancy name to add more hardware to process stuff :) ), fast data access and high availability (imagine this as those junk food restaurants that you crave on 1 am, but better!) to list some. It becomes a good choice for data storage options, even in a non big data projects, such as a product catalogue system for example, due to its capability of handling not only structured data like excel sheets, but being also able to handle data in a semi structured format such as user generated content in social medias (Yeah man you can store those useless twitters from your beloved one) or even non structured format (yep you can also store their photos in a way that you can extract some information from it). The way the data is stored and processed on that kind of systems can mitigate the process of accessing them as well managing the environment.

Differences between SQL and NOSQL

Before starting, it is good to mention that NoSql is not a substitute option for SQL. SQL, also known as relational databases, is good to store structured data, having more analytical functions than the NoSql databases, so each type has its use cases. The NoSQL works as a complement to what is call a full analytical platform, it is usually used on cases of non conventional structures of data. Cases which the use of relational databases could be complex or even in viable to do, where as for analytical purposes some may say that relational databases are still the best choice.

Types of NoSQL

Key based:

It consists of a key value database, where the key used to reference the data can be user defined (such as a name) or auto-generated (sequenced numbers) while the value can be of a variety of datatypes. The key value type basically, uses a hash table in which there exists a unique key and a pointer to an item of data, very useful for reference data, for example.

Document Based:

The document refers to the data stored as a collection of key value pairs compressed, and it handles the data in a quite similar way as the key-value store. The main difference is that you can have sub-domains of labelled data, expanding the accessibility of the each element. Making possible of querying the data based on its contents, rending easy to work with complex data, such as event logs. It is good to remember that since each entity is a document it is expected to have a lot of data redundancy.

Columnar based:

In column-oriented NoSQL database, data is stored in columns which groups related data based on its business lineage, also known as column families. Each column families can contain a virtually unlimited number of sub-columns that can be created at run-time, keeping the schema on read characteristics. This grouping property in addition of its fast access on time interval based search, are some of the reasons why this choice is good for dimensional analysis such as time-series analysis.

Graph Based:

Graph structures differs from other types, since it stores not only the data itself, but, it focus on the relationship between them as well in the way it is correlated. The graph database are defined by edges, nodes and properties. The nodes (also known as entity) are connected among them (this connection is named edge) based on actions ( properties ). Providing then a database structure which is index-free adjacency. Data can be easily transformed from one model to the other within the database server, making it a good choice for systems which the data is referenced based on the interrelation between them, a good example is social media interactions.

Conclusion

There is no such thing as a silver bullet, there might be different preferences, distinct requirements to prefer over some other database. So, figure out your requirements and find out the database which wisely provides the integrated support for your project developments. Each type will handle some problems, and rarely all of them. Since every business will have their own preferences based on the project requirements so, focus on them.

There are some cases that you could get biased by the technology, so pay attention to not implement something really complex to solve a problem that could be handled by a simple SQL database (or even a hadoop environment well managed).

Have that in mind while choosing your solution. It could be something brand new to be managed in your company, but, since it is hard to teach an old dog new tricks, try to always have in mind that your team might not be open to learn or support new stuff. Implementing new technology is satisfactory in one point of view, but could be a real pain in the ass in the another, so do the maths to see if it is worth the risk. Try to find the right sponsors and team, always keeping it simple and thinking each step carefully. Doing like that you’ll be a nice guy to your colleagues and even helping you out in the future, since it might be you the one who get called at 1 am.