Using Ramen DB as an example to explain bad practices in database design (logical design)

Published in

Goalist Blog

6 min readMay 20, 2024

Recently, I read Chapter 7 of “達人に学ぶDB設計徹底指南書” and found it extremely educational. To reinforce my understanding, I decided to summarize the bad practices covered in the chapter, along with my own examples.

0. “Do Not” in Logical Design

“You can’t fix a strategic failure with tactics.” Bad designs arise from “not thinking about anything.”

1. Non-Scalar Values (Below 1NF)

Do not use array types ➡ Adhere to the First Normal Form (1NF)

Bad Example

Although array types are included in standard SQL, adopting them requires considering not only the internal design of the database but also its consistency with applications and middleware. As a rule, arrays should be represented in rows, not columns.

Row-Based Table

| Ramen ID | Sub-ID | Toppings |
| — — — — — | — — — — | — — — — — — |
| 0001 | 1 | Chashu |
| 0001 | 2 | Ajitama |
| 0003 | 1 | Nori |
| 0003 | 2 | Spinach |
| 0003 | 3 | Quail Egg |

Designing with a column-based table is also an option, but it has drawbacks like “difficulties in adding/removing columns” and “having to use unnecessary NULL values,” limiting its use cases.

It is best to store information as divided as possible (without breaking its meaning). For example, names can be divided into last and first names:

- (Name) Ichiro Suzuki ➡ (Last Name, First Name) “Suzuki” “Ichiro”
- (Menu Name) Tonkotsu Ramen ➡ (Soup Type, Dish Type) “Tonkotsu” “Ramen”
- (Phone Number) 03–1234–5678 ➡ (Area Code, Exchange Code, Subscriber Number) “03” “1234” “5678”

Reason: It’s easy to combine divided information later, but it’s relatively difficult to separate combined information afterward. However, avoid breaking down meanings like splitting “Suzuki” into “Su” and “Zuki.”

2. Double Meanings

Columns are not variables. Once a meaning is assigned, it cannot be changed. If the same column has more than one meaning, it becomes ambiguous and can cause bugs.

Bad Example

Improved Table

| Ramen ID | Noodle Type | Noodle Firmness | Soup Richness | Oil Amount | Toppings |
| — — — — — | — — — — — — — — | — — — — — — — — -| — — — — — — — -| — — — — — — | — — — — — |
| 0001 | Thick Noodles | Hard | | | |
| 0002 | Medium Noodles | | Rich | | |
| 0003 | Thin Noodles | | | Extra | |
| 0004 | Thin Noodles | | | | Chashu |

3. Single Reference Table

Polymorphism is unnecessary in tables. A single reference table aggregates various types of master tables with structurally similar shapes. Single reference tables have the following pros and cons:

Pros

- Reduces the number of master tables, simplifying ER diagrams and schemas.
- Allows common SQL for code searches.

Cons

- The required column length for “Code Type,” “Code Value,” and “Code Content” varies by code system, so large variable-length strings must be declared.
- Aggregating records into one table can degrade search performance if there are many types and numbers of code systems.
- Mistakes in specifying code type or code value in SQL searches do not trigger errors, making bugs harder to detect.
- Although the ER diagram looks simpler, it lacks accuracy as an ER model, reducing readability.

Thus, the cons outweigh the pros, presenting significant issues in practice.

Single Reference Table (Miscellaneous Code Systems)

4. Table Partitioning

Table partitioning can be horizontal or vertical.

Horizontal Partitioning

- Divides the table by rows.

Vertical Partitioning

- Divides the table by columns.

Both have more cons:

1. No semantic reason for partitioning.
2. Poor scalability.
3. There are alternative methods:
— Horizontal partitioning ➡ Partitioning
— Vertical partitioning ➡ Aggregation

Partitioning

Instead of dividing the table, separate the physical storage based on a partition key. This reduces the amount of data SQL accesses to 1/n (where n is the number of partitions). Generally, partitioning uses columns with lower cardinality than indexes and infrequent changes as keys. For example, in a ramen order table, partitioning by date by month allows each month’s data to be managed independently, optimizing queries for specific months.

Order Table

| Order ID | Customer ID | Type | Date | Amount |
| — — — — — | — — — — — — -| — — — — — — — | — — — — — — | — — — — |
| 1 | 101 | Shoyu Ramen | 2024–01–15 | 800 |
| 2 | 102 | Miso Ramen | 2024–01–20 | 850 |
| 3 | 103 | Shio Ramen | 2024–02–05 | 750 |
| 4 | 101 | Tonkotsu Ramen| 2024–02–25| 900 |
| 5 | 104 | Shoyu Ramen | 2024–03–10 | 800 |
| 6 | 105 | Miso Ramen | 2023–03–15 | 850 |
| 7 | 106 | Shio Ramen | 2023–03–20 | 750 |
| 8 | 107 | Tonkotsu Ramen| 2023–03–30| 900 |

Aggregation

There are two types of aggregation:

1. Column Reduction: Create tables that simply reduce the columns maintained. Such tables are called data marts or marts. When using marts, consider two points: Storage capacity and Data synchronization. Marts are often updated in batches one to several times a day. However, this can lead to inconsistencies between the original table and the mart, so it’s important to carefully consider this against the requirements. For example, if only date and amount are frequently used in the order table, create a data mart like this:

Order Table (Date and Amount Only)

| Order ID | Date | Amount |
| — — — — — | — — — — — — | — — — — |
| 1 | 2023–01–15 | 800 |
| 2 | 2023–01–20 | 850 |
| 3 | 2023–02–05 | 750 |
| 4 | 2023–02–25 | 900 |
| 5 | 2023–03–10 | 800 |
| 6 | 2023–03–15 | 850 |
| 7 | 2023–03–20 | 750 |
| 8 | 2023–03–30 | 900 |

2. Summary Tables: Maintain records in an aggregated state using aggregate functions. This allows simple SELECT queries without performing aggregation each time. Like column reduction, consider storage capacity and data synchronization. For example, if business requires ramen sales by type, create a summary table like this:

Ramen Sales by Type Summary Table

| Type | Total Sales | Orders |
| — — — — — — — | — — — — — — -| — — — — |
| Shoyu Ramen | 1600 | 2 |
| Miso Ramen | 1700 | 2 |
| Shio Ramen | 1500 | 2 |
| Tonkotsu Ramen| 1800 | 2 |

5. Inappropriate Keys

Variable-length strings lack stability, making them

unsuitable as keys. Variable-length string columns do not possess the stability required for keys. Confusing with fixed-length strings (CHAR). Fixed-length string “code” columns are desirable as keys. Fixed-length and variable-length strings do not become the same “value” even if they use the same string physically because fixed-length strings pad with spaces.

6. Double Masters

What is a Double Master? A case where two master tables with the same role exist. Double masters often arise from system consolidation. Occurs when masters used in separate systems become part of the same domain due to system integration. The process of scrutinizing and consolidating entities is called data cleansing. This process plays a crucial role during system consolidation or existing system renovations. Proper data cleansing prevents double masters from occurring.

— -

Here is the article in Japanese. Please take a look and press the heart button!

https://zenn.dev/ka2/articles/2392f450c851d6

— -

Reference Book:「達人に学ぶDB設計徹底指南書」ミック. 翔泳社