NoSQL Options

How to Determine If NoSQL Is Right for You

Choosing the right architecture and some possible solutions

William D'Souza
The Startup

--

Time and time again, relational databases have proven to be superior to challengers that arose since its creation, even though it has had very little improvements since its inception. With good reason, many companies are highly dependent on them. They are not only used to hold your data, they are also used to perform analytics and infer insights with the many relationships they hold.

The design of a database is not a simple task and implementing a proper architecture enables the database to be powerful, consistent, and preserve integrity. From an architects perspective, a relational databases ability to be consistent (transactions are only valid according to rules), atomic (transactions are treated as single units), isolated (purpose of concurrency controls), and durable (a committed transaction remains committed) are the reasons why they are an ideal choice to store your data and run your back-end.

NoSQL became popular because of the growing concern with it’s ability to handle different types of big data, along with the need for applications to scalable. Relational databases were seen as a subpar solution, but have come quite a ways with hybrid approaches and specific database improvements to be able to handle the concerns. While they are some additional inherent concerns of NoSQL databases to be useful for analytics, they are very useful for operational applications. With the right stack, it is possible to perform analytics with these data stores, but relational models with hybrid approaches can still perform extremely well.

Do You Need a NoSQL Architecture?

If you find yourself starting a new application or have encountered issues with running your system off of a relational framework, it’s worth exploring the world of NoSQL and finding a tool that will fit your needs. Here are a few questions that can help you make the choice.

Do I need the ability to handle big data with speed?

Relational databases generally don’t handle big data as well as NoSQL databases. Relational systems are powerful in terms of consistency and availability. If the system you are designing is going to be on a small scale and it doesn’t have the potential to be hitting deep waters, then it can work for you! However, if the application is already experiencing problems that come with harvesting large amounts of varying data, a NoSQL solution will help. They are a handful of tools that can be chosen that will sacrifice either consistency or availability for the ability to scale.

Are you storing structured, semi-structured and unstructured data?

Does your application involve storing a variety of data types? Will there be a need to store audio/video, transcripts, webpages? Is there the possibility that you could be mixing up structured and unstructured data? If you answered yes to any of these questions, then there’s a good chance that the NoSQL architecture is a good choice. Creating wide tables to store your data is not the best solution, and creating a table for every subdomain isn’t either. Although these could potentially work, a schema-less architecture is better suited for this as every entry can have its own set of attributes which can take on its values.

Is scalability important to you?

If money is not an issue and you can afford to burn it, then scaling vertically with a relational database will work fine. Buying more hardware (in-house or cloud) will be expensive when you need more performance because of storage requirements. If you find yourself being restrained by a budget constantly or just want to be economical, then the ability to scale horizontally with a NoSQL model will work amazingly! If your data storage requirements grow too much, you can add inexpensive servers and connect them to your cluster, working as a single service. This is one of it’s biggest strengths.

Are you shipping lots of data around a network?

Processing jobs and queries pass through several servers, providing very high levels of parallelization for ingestion and workloads. Being able to calculate aggregations right beside the data is essential and there isn’t much of a need for a warehouse system that updates itself every night. Data does not need to be passed around a larger network to achieve analysis, as its ability to aggregate and efficient query handling quickly paves the way for a well performing system. If there isn’t a lot of data being moved around, then a relational model will work just fine.

What are the different types of NoSQL architectures?

NoSQL databases come in different types of flavours, all with their individual strengths and capabilities. With relational databases, although the inner functions can change from each system, the foundations are the same; data is stored in tables comprised of rows and columns. Depending on the NoSQL architecture, either consistency or availability will be sacrificed for its ability to partition, and because of this, the needs of your application can be met with one of these architectures:

Key-Value

  • These databases are simple and store data as a group of a key-value pairs. Because of its simplicity, the architecture enables speed, flexibility, scalability, and ease of use.
  • They do not enforce a specific schema, they treat data as a single collection with the key representing an arbitrary string.
  • They use less memory, increasing the performance for certain types of workloads

Document Storage

  • These databases typically use a file format to store documents in a certain structure. They are generally built around JSON like document, becoming a popular alternative to tabular data.
  • They map easily to your code, eliminating the need to decompose it to tables and worry about costly joins. Any data accessed is also stored side-by-side, making it intuitive.
  • Because the data is in a JSON format, it is descriptive and doesn’t need to be defined.
  • Items in a document can vary and modifications can be done whenever, eliminating the need for migrations.

Graph

  • These databases are composed of two elements: a node and a relationship. Nodes represent entities and relationships determine how nodes are connected.
  • Graph databases use graph storage, but can also use other formats (document storage, columnar and object-oriented). They also use index-free adjacency which leads to overall efficiency when processing data.
  • The overall benefits are increased agility, performance, and flexibility.

Wide-Column

  • These can be seen as a two-dimensional key-value store. It is similar to the concept of tables, rows, and columns. However, values stored in columns can vary from row-to-row.
  • Data is usually grouped into “column families”, where each column family contains multiple columns used coherently. Data is stored in rows, in that columns for a row are stored together.
  • Can be a good choice for data warehouses as it increases speed and performance. Analytical data doesn’t change often and is usually created with data dumps, so column-based stored lets you ignore data that doesn’t apply to your queries, retrieving information from just the columns you want.

A few use cases of my favourite NoSQL databases

Elasticsearch

Elasticsearch is by far one of my favourite tools. It is extremely fast and can store different types of content. It is powerful for qualitative data and has a decent analytics aspect to it. Elasticsearch comes with Kibana: an Elasticsearch management and data visualization tool. Kibana is elegant, user-friendly, and is just one aspect of the elasticstack (a set of tools around Elasticsearch). Some use cases for Elasticsearch are:

  • Website Search
  • Log Analytics
  • Application Search
  • Application Performance Monitoring

Cassandra

Cassandra is the go-to tool when your application requires a heavy workload of writes over reads. It has its SQL like syntax termed CQL (Cassandra Query Language), which makes the learning curve a lot less steep for most people. The main benefits are that it is fault-tolerant (failed nodes can be replaced with no downtime), decentralized (no single point of failure) and durable (no loss of data). Some use cases for Cassandra are:

  • Storing time-series data
  • Tracking Health Data
  • IoT History
  • Logging transactions

MongoDB

MongoDB has become a brand name for NoSQL databases. I have found that when anyone mentions “NoSQL”, the first thought that comes to most people’s mind is MongoDB. The software has come a long way and is highly adopted by its users for its simplicity in installation, setup and thorough documentation. Some use cases include:

  • Mobile development
  • Real-time analytics
  • Product catalogues
  • Content management

Is NoSQL frameworks useful (a light introduction) for data teams?

I have found that there are a few fears of adopting NoSQL for teams that are performing analytics and data science. We have become so used to relational systems that adopting new frameworks for analyzing data is not as easy as sticking to traditional systems. Querying from relational systems is relatively easy and results can be easily interpreted, when explaining context or even providing sheets of data to others. There is a learning curve when it comes to NoSQL architectures, and similar to the costs of general training rising within organizations, there is a cost associated with adopting these systems. Even so, with many analytical programs, data needs to be in a tabular format to create visualizations or dashboards, so a relational framework works extremely well for this.

How can we create statistical models or perform deep analysis with big data? It is much harder to create queries or perform analytics with a non-relational system. Spark is an unbelievably great engine for analytics and big data; it works with ease, speed, flexibility, and is known for its generality. Spark supports a wide array of operators and algorithms for processing data that can handle advanced computations. It’s machine learning system can even work in tandem to serve up powerful predictions. Analysis can be performed quickly as jobs can typically be done directly on the data itself, which makes real-time analysis possible. With its powerful python package (pyspark), developers can use skills they already have to build deep analytical workflows.

--

--

William D'Souza
The Startup

providing solutions for common data problems @ Kizmet Solutions. www.kizmetsolutions.com