An Overview of Databases — Part 4: How to Choose a Database?

5 min readJul 24, 2024

Part 1: DBMS Flow
Part 2: Non-Relational DB vs Relational
Part 3: CAP and BASE Theorem
Part 4: How to choose a Database?
Part 5: Different Solutions for Different Problems
Part 6: Concurrency Control
Part 7: Distributed DBMS
>> Part 7.1: Distributed DBMS (Apache Spark, Parquet + Pyspark + Node.js)
Part 8: Clocks
>> Part 8.1: Clocks (Causal Consistency With MongoDB)
>> Part 8.2: Clocks (MongoDB Replica and Causal Consistency)
Part 9: DB Design Mastery
Part 10: Vector DB
Part 11: An interesting case, coming soon!

Discovering available DBs

Picking the right database for your app or service isn’t just about matching patterns; You gotta get into the down of what your app needs and what each type of database brings to the table. A database that seems like a perfect match at the outset might not keep up as your app scales up, or it might throw in some twists and turns down the line that you could have avoided with a different pick.

So, before you commit, take a deep dive into your app’s demands, really understand the kind of data you’re dealing with, how much it’s gonna grow, and how fast you need to access it. Also, think about how flexible you need your database to be. Can it handle a few changes without a problem? Then, peek into the future and consider where your app is headed.

Selecting the right database as a senior engineer involves a strategic and deeply informed approach. Here’s a guide:

Deep Dive into Requirements: Understand your data intimately. Is it structured or unstructured? Anticipate the volume and growth. Discuss with your team and stakeholders to align on current needs and future goals.
Assess Workload and Performance Needs: Analyze whether your workload leans more towards reads or writes, and consider the importance of transactions.
Research and Compare: Explore a variety of databases, focusing on their scalability, performance under similar workloads, and compatibility with your existing tech stack.
Scalability and Integration: Ensure the database can scale with your needs and integrate seamlessly with other tools and services you use.
Operational Demands: Consider the maintenance and operational requirement, Are they within your team’s capacity?
Prototype: Test your top choices with real world scenarios to see how they perform with your specific data and queries.
Community, Support, and Cost: Look into the community support, vendor stability, and the total cost of ownership, including maintenance and operational expenses.

Make your choice based on a balance of technical capabilities, scalability, ease of integration, support ecosystem, and cost-effectiveness, always with an eye toward future needs and goals.

After taking all above points into account, I usually go with the following three steps to choose a few correct options for my problems. It is a good practice to document your finding under these three topics:

Step 1: Assess Application Requirements

Data Structure and Complexity:

Relational (SQL) databases, like MySQL, PostgreSQL, and Microsoft SQL Server, when dealing with structured data and complex queries.
They enforce data integrity and relationships through table joins, primary and foreign keys, and transactions. For instance, an e-commerce platform can benefit from a relational database to keep track of users, orders, and products, where relationships between these entities are crucial.

NoSQL databases handle unstructured, semi-structured, or schema-less data. They are classified into types like document (MongoDB, CouchDB), key-value (Redis, DynamoDB), wide-column (Cassandra, HBase), and graph (Neo4j, Amazon Neptune). These databases scale well horizontally, making them suitable for applications with large amounts of diverse data or variable schemas. A content management system can use a document-oriented database to store diverse content types more flexibly.

Scalability and Performance

For applications that expect significant growth or have highly variable workloads, scalability becomes a critical factor.

Cassandra and DynamoDB are designed for linear scalability and high availability without compromising performance.

Redis and Memcached are in-memory data stores used for high-speed operations and caching. Consider using these for session stores or caching frequently accessed but rarely updated data to reduce the load on your primary database.

Step 2: Technical Considerations

Consistency, Availability, Partition Tolerance (CAP Theorem)

Applications requiring strong data consistency (all database reads receive the most recent write) might prioritize SQL databases or certain configurations of NoSQL systems.
For services where availability is paramount (every request receives a response, regardless of individual node failures), some NoSQL databases designed for high distribution, like Cassandra, might be more appropriate.

Transaction Support

Complex financial systems or applications requiring atomic operations should consider databases that support ACID transactions (e.g., PostgreSQL, MySQL, SQL Server).
For applications where eventual consistency is acceptable, and performance is valued over strict atomicity, NoSQL databases offer flexibility.

Step 3: Operational and Business Considerations

Cost

Open-source databases like PostgreSQL and MongoDB can reduce licensing costs, but consider operational expenses.
Managed database services (AWS RDS, Google Cloud SQL, Azure SQL Database, MongoDB Atlas) to reduce operational complexity but incur monthly fees.

Maintenance and Community Support

Consider the ease of maintenance and the availability of expertise. Popular databases like MySQL, PostgreSQL, MongoDB, and Redis have large communities and extensive documentation.

Examples

1. Real-time Analytics Platform:
Use Cassandra for its ability to handle large volumes of data spread across many nodes with minimal latency.

Use Redis as a caching layer to store and quickly access frequently read but infrequently updated data.

2. Online Retail Application:
PostgreSQL can manage user data, inventory, and order transactions with its strong consistency model and support for complex queries.

Integrate Elasticsearch for full-text search capabilities across product descriptions to improve the user’s search experience.

3. Social Media Application:
Neo4j or another graph database to efficiently model and query complex social relationships and networks.

MongoDB for storing user-generated content and activity logs due to schema flexibility.

4. IoT Application:
InfluxDB for time-series data generated by IoT devices for its efficient storage and querying of time-stamped data.

Cassandra for managing data across many devices with the need for horizontal scalability.

Conclusion

Choosing the right database for your app means carefully looking at what each option offers and how well it fits with what your app needs now and what it will need as it grows. There isn’t always one perfect choice; often, you have to weigh different pros and cons.

Using more than one type of database to make the most of their strengths is becoming popular for complicated apps. Before you decide on a database, testing it thoroughly to see if it works well with your app’s tasks is crucial. This helps ensure the database can handle your needs without going over your budget or making things too complicated to manage.

When I talk about testing it thoroughly that does not necessarily mean you should write a program to test your production access patterns, you could have done your own testing by a wrong choice a few years ago and lessons you learned, or maybe you can search and check online benchmarks to see the how different DBs work. It is up to you.