Database Design for Tagging Service — Learn how the load on your system can impact your design.

Published in

Technical Insights

2 min readMar 15, 2024

Whenever we create a Medium post, a Jira ticket or an issue on StackOverflow, we assign tags to our post.

Today, let's discuss various approaches on how data should be stored and how database should be selected based on read and write load.

I will not cover complete system design for this problem but will only discuss database design. Do comment and clap for the post if you find it helpful.

Requirements for Tagging Service:

Add a tag to an item (Jira ticket, Confluence page, StackOverflow question etc)
Search for a tag and get all items that have that tag

Use cases

We will consider 3 use cases and will see the approach for each use case.

Low read and low writes
Heavy reads and low writes (read heavy)
Heavy reads and heavy writes (read-write heavy)

Use case 1 (low read and low writes)

Use SQL db as data is relational.
Tags and items will have many-to-many mapping.
Schema and Queries

-- Table for posts
CREATE TABLE posts (
    post_id SERIAL PRIMARY KEY,
    post_content TEXT,
    tags TEXT  -- this is optional
);

-- Table for tags
CREATE TABLE tags (
    tag_id SERIAL PRIMARY KEY,
    tag_name VARCHAR(255)
);

-- Junction table to associate posts with tags
CREATE TABLE post_tags (
    post_id INT,
    tag_id INT,
    FOREIGN KEY (post_id) REFERENCES posts(post_id),
    FOREIGN KEY (tag_id) REFERENCES tags(tag_id),
    PRIMARY KEY (post_id, tag_id)
);

-- Data insertion
insert into post (post_content, tags)
values
('post_1', '#t1 #t2 #t3'),
('post_2', '#t1 #t4 #t5'),
('post_3', '#t2 #t4 #t5');

insert into tags (tag_name)
values
('t1'),
('t2'),
('t3'),
('t4'),
('t5');

insert into post_tags (post_id, tag_id)
values
(1,1),
(1,2),
(1,3),
(2,1),
(2,4),
(2,5),
(3,2),
(3,4),
(3,5);


-- Query to get list of posts based on a particular tag
SELECT p.post_id, p.post_content
FROM posts p
JOIN post_tags pt ON p.post_id = pt.post_id
JOIN tags t ON pt.tag_id = t.tag_id
WHERE t.tag_name = 't1';

Use case 2 (heavy reads and low writes)

Similar to use case 1. Have multiple read replicas of your RDBMS to handle heavy read load.

Use case 3 (read-write heavy)

SQL won’t work here as the system is write-heavy. So we will use NoSQL DB. Cassandra seems to be a good option here based on the data and query pattern.
The partitioning key can be tag_id so that all data related to a tag is stored in a single partition.
Schema and Queries

CREATE TABLE posts (
    post_id UUID PRIMARY KEY,
    title TEXT,
    content TEXT,
    tags SET<TEXT>
);

CREATE TABLE tags_by_post (
    post_id UUID,
    tag TEXT,
    PRIMARY KEY (tag, post_id)
);

-- Query TagsByPost table to get post IDs for a specific tag
SELECT post_id FROM tags_by_post WHERE tag = 'your_tag';

-- Fetch posts based on post IDs retrieved
SELECT * FROM posts WHERE post_id IN (post_id1, post_id2, ...);

Feel free to leave your comments or suggestions if any.

I hope you found this helpful. Let me know.