System Design: What to Consider When Choosing Data Stores

Wahidyan Kresna Fridayoka (Yoka)
AyoKoding
Published in
6 min readJun 30, 2023

Overview

One of the most important considerations when designing a system is the type of data store used. Many different types of data stores are available, each with unique characteristics and trade-offs. Understanding the key characteristics of different data stores is essential for making informed decisions about system design.

This section will explore the key characteristics of data stores, including the CAP theorem, data model, query language, scalability, best practices, common pitfalls, and similar databases. By the end of this section, you will better understand how to choose the right data store for your system.

Key Characteristics

The key characteristics of a data store can include structured data, complex queries, transactions, joins, and more. These characteristics can impact the performance and scalability of the data store. For example, a data store optimized for complex queries may not be the best choice for a system requiring high write throughput.

Considering the system’s specific needs is essential when evaluating a data store’s critical characteristics. For example, a system requiring high read throughput may prioritize a data store optimized for read performance. In contrast, a system requiring high write throughput may prioritize a data store optimized for write performance.

CAP Theorem

The CAP theorem is a concept that describes the trade-offs between consistency, availability, and partition tolerance in distributed systems. Understanding how a data store handles these guarantees is essential for system design considerations.

Consistency requires that all nodes in a distributed system see the same data simultaneously. Availability refers to the requirement that a distributed system continues functioning even if some nodes fail. Partition tolerance requires that a distributed system continues functioning even if network partitions occur.

Different data stores handle these guarantees in different ways. For example, some data stores prioritize consistency over availability, while others prioritize availability over consistency.

Considering the system’s specific needs is essential when evaluating how a data store handles the CAP theorem. For example, a system that requires strong consistency may prioritize a data store that prioritizes consistency over availability. In contrast, a system that requires high availability may prioritize a data store that prioritizes availability over consistency.

Usage

Understanding a data store’s best, neutral, and worst usage scenarios can help determine if it fits a particular system correctly. It is essential to consider the system’s specific needs and how the data store can meet them.

For example, a data store optimized for high read throughput may not be the best choice for a system requiring high write throughput. Similarly, a data store optimized for consistency may not be the best choice for a system that requires high availability.

It is crucial to evaluate the usage scenarios of a data store in the context of the system’s specific needs. For example, a system requiring high write throughput may prioritize a data store optimized for write performance. In contrast, a system that requires strong consistency may prioritize a data store optimized for consistency.

Data Model

The data model used by a data store can be relational or non-relational. Understanding the advantages and disadvantages of the data model can help determine if it is the right fit for a particular system.

Relational data stores use a table-based model, while non-relational data stores use a variety of models, including document-based, key-value, and graph-based models. Each model has its own unique characteristics and trade-offs.

For example, relational data stores are well-suited for systems that require complex queries and transactions. In contrast, document-based data stores are well-suited for systems that require flexible schema and high write throughput.

It is essential to evaluate the data model of a data store in the context of the system’s specific needs. For example, a system that requires complex queries and transactions may prioritize a relational data store. In contrast, a system that requires flexible schema and high write throughput may prioritize a document-based data store.

Query Language

The query language used by a data store can be SQL or NoSQL. Understanding the advantages and disadvantages of the query language can help determine if it is the right fit for a particular system.

SQL is a standardized query language used by many relational data stores. NoSQL refers to a variety of non-relational data stores that use a variety of query languages.

For example, SQL is well-suited for systems that require complex queries and transactions, while NoSQL is well-suited for systems that require flexible schema and high write throughput.

It is essential to evaluate the query language of a data store in the context of the system’s specific needs. For example, a system requiring complex queries and transactions may prioritize a SQL data store. In contrast, a system that requires flexible schema and high write throughput may prioritize a data store that uses NoSQL.

Scalability

Scalability is an essential consideration for any data store. Understanding how to make a data store performant, handle high traffic, and scale can help ensure that it can meet the needs of a growing system.

Many strategies for scaling a data store include indexing, sharding, and replication. Each strategy has its own unique characteristics and trade-offs.

For example, indexing can improve query performance, while sharding can improve write throughput. Replication can improve availability and fault tolerance.

It is crucial to evaluate the scalability of a data store in the context of the system’s specific needs. For example, a system that requires high write throughput may prioritize a data store that can be easily shared. In contrast, a system that requires high availability may prioritize a data store that can be easily replicated.

In Practice

Best Practices

There are best practices to follow when using any data store. Understanding these best practices can help avoid common pitfalls and ensure the data store is used effectively.

For example, indexing a data store to improve query performance is vital. It is also important to properly configure replication to ensure data is appropriately distributed across nodes.

It is crucial to evaluate the best practices of a data store in the context of the system’s specific needs. For example, a system requiring high query performance may prioritize a data store with solid indexing capabilities. In contrast, a system requiring high availability may prioritize a data store with solid replication capabilities.

Common Pitfalls

There are common pitfalls to avoid when using any data store. Understanding these pitfalls can help ensure the data store is used effectively and efficiently.

For example, avoiding over-indexing a data store is important, as this can decrease write performance. It is also essential to avoid over-replicating data, leading to increased network traffic and decreased performance.

It is essential to evaluate the common pitfalls of a data store in the context of the system’s specific needs. For example, a system that requires high write performance may prioritize a data store that can handle high write throughput without over-indexing. In contrast, a system that requires high availability may prioritize a data store that can handle high replication without over-replicating.

Similar Databases

Many data stores have similar characteristics. Understanding these data stores can help determine whether they fit a particular system better.

For example, MongoDB and Couchbase are document-based data stores well-suited for systems requiring flexible schema and high write throughput. MySQL and PostgreSQL are relational data stores well-suited for systems requiring complex queries and transactions.

It is essential to evaluate similar databases in the context of the system’s specific needs. For example, a system that requires high write throughput may evaluate MongoDB and Couchbase to determine the best fit for its needs.

Conclusion

Choosing the right data store is essential for system design considerations. You can make informed decisions about system design by understanding the key characteristics of different data stores, including the CAP theorem, data model, query language, scalability, best practices, common pitfalls, and similar databases. It is essential to evaluate each of these characteristics in the context of the system’s specific needs to ensure that the data store fits the system.

PS: Hi there! My name is Yoka, and I am a software engineering manager with a deep passion for software engineering craftsmanship. If you like this article, you might like https://www.ayokoding.com, a website where I write about software engineering craftsmanship, engineering management, and my learning experience. Or, if you are in the Twitter space, feel free to follow AyoKoding at https://twitter.com/ayokoding.

--

--

Wahidyan Kresna Fridayoka (Yoka)
AyoKoding

Yoka is a software engineering manager with a diverse portfolio and more than two years of experience leading software engineering teams.