[System Design Study] Designing a URL Shortening service like TinyURL, goo.gl.

Jason Lee
8 min readAug 10, 2020

--

What is URL Shortening?

  • used to create shorter aliases (“short links”) for long URLs
  • Users are redirected to the original URL when they hit these short links
  • services: TinyURL, bit.ly, goo.gl, qlink.me

Problem Solving

[Step 1] — Requirement Elicitation

What are the Functional requirements?

should generate a shorter and unique alias of it

when users access a short link, our service should redirect them to the original link

should optionally be able to pick a custom short link for their URL

links will expire after a standard default timespan; or users should be able to specify the expiration time

What are the Non-Functional requirements?

Should be highly available (because if the service is down, all the URL redirections will start failing)

URL redirection should happen in real-time with minimal latency

Shortened links should not be guessable (not predictable)

Possible Extended Requirements

Storing analytic information; e.g. how many times a redirection happened for this original URL?

Our service should also be accessible through REST APIs by other service

Analysis

“shorter and unique”, “custom short link” (should see whether it is already in use or not), “should not be guessable”, “redirect (mapping concept)” → concept of “hashing” and “mapping”

“highly available”, “expiration time” → There should be many application server (prevent a single point of failure at the application level), there should primary-secondary DB(redundant) which are “replicated” (prevent a single point of failure at the application level and consistency for right direction + in order to be available while garbage cleaning the half of the db. No MongoDB(it is Consistency +Partition Tolerance)

[Step2] — Estimation of the scale of the system

Usually big organizations like company or school uses URL shortening a lot. And they would like to send this mail to a group of people like 10–1000 people, so averagely about 500 people. So there will be 1 : 500 ratio of “create” : “read”. Let say there is about 100 million of such organizations using URL shortening and let say each of them would use this service once a month on average. Then for one month there will be 100 million new original URL and so about 100M /30 30,000 (3M) a day and 3M / (24hr * 60 min * 60 s) == about 30 new original URL per second. Then there will be about 30 * 150 = 4500 URL redirection per second (QPS — Query Per Second). So I would say we would need about 20 servers to provide a fast service to users

Since we won’t store any static media in this service, we wouldn’t need a big storage for those static media. And let say we store every data from URL shortening request for one year because people usually use this service for the link that is temporary like survey or post that won’t be visited a lot later + any other data we will use for analytics won’t be that valuable after a long time. So, the total number of objects we expect to store will be 30 * 24hr * 60min * 60s * 365 = about 1 billion. By ballpark estimation, I would say each new URL request data will take up about strings with length of 10(10 + 1(null terminator) byte) * 4 (user_token, original_URL, short_link, location) and extra 100 byte for extra information, which is about 50B + 100B = 150B. So I will need some storage for 150B * 1 billion ==about 150 TB. And the bandwidth will be 30(new original URL per second) * 150B = 5KB/s. And for read requests, it will be 4500(redirection per second) * 150B = about 675 MB/s.

For cache, since we will have 675 MB/s * 60s * 60min * 24 hr = about 58 GB per day, I would cache about 20% of these request which is about 10GB of memory for cache. But I think we would need 20 *2 many (40) DB servers to store user’s information where half of them will be a primary and rest of them are secondary where each of them is monitoring each other to maintain availability in failure case. Since there will be many duplicate requests, I would say we can choose smaller cache than 10GB.

To sum up,

New URL requests: 30 url per second

URL redirections: 4500 url per second

Incoming Data(due to new URL request): 30 * 150B = 4.5 KB/s

Outgoing Data(due to redirection): 4500 * 150B = 675 KB/s

Storage for one year: about 150TB

Memory for cache: about 10GB (at max)

[Step 3] — System interface definition

To secure better availability with good latency and to prevent our storage from being filled quickly by malicious users, we will ask users to create an account and will generate user_token which we will use to detect a malicious user creating link indefinitely.

create_account(user_id, user_pw, user_email) // Create an account for this user and generate token

generate_token(user_id) // Generate a unique token for a user

We will also need create a short_link and redirect_to_short_link to support create and redirect service

create_shortlink(user_token, original_url, custom_link=None, expire_date=default_date) //return a shortlink on success, OW return error

redirect_to_shortlink(shortlink) // return the shortlink on success, OW return 404 error (Not found)

delete_shortlink(user_token, shortlink) //deletes the shortlink OW return error

[Step 4] — Define DB model (data entities)

As we saw, we will store a lot of data and each object is pretty small.

Also there isn’t a strong relationship between data; for redirection we will just receive short_url and calculate value using hash value and then retrieve.

And our service is very read intensive

Data Schema

Since we don’t have to worry about relationship very much and have to be scalable, I would say a NoSQL with good availability like Cassandra.

[Step 5] — High-level Design

The key problem I need to solve for this service is how to generate a short_link. I can think of two ways to solve this and here are the pros and cons

1 Encoding the URL using hash

compute hash_func like MD5 or SHA256

pro(s):

Quite straightforward, simple

con(s):

need some heuristic to make the result fit in short length of the short_link

due to one-to-one mapping, it is possible that multiple user will get same short_link if they enter same orignal_url (improvement: append with some unique value like user_token. But, then when the user wants to create multiple short_link of the same origianl_url using default behavior, the user will get same result everytime. I can pad with a sequence for each trial in this case, hoping that the user won’t create millions of short_url from this original_url through default behavior)

2 Pre-computed Key Generation Service (KGS)

Pre-compute short keys through KGS and assigns them whenever there is a create request

pro(s):

KGS can guarantee uniqueness of the key

Lightness of the key size (short length) → this can also let us create a lot keys which is helpful when creating another new light key when application server dies before receiving an incoming generated key

Fast and simple by using KGS

con(s):

Concurrency issue. To resolve this, we have to separate the database for unused and used and for this one I would use RDBMS for ACID compliance

Introducing a new single point of failure, which is KGS. To improve, we would need multiple KGS to increase the points.

Another Problem: Consistent length of custom URL

For the consistency of the short_url (key) length, we will require a max_len for the custom URL to users

System Diagram

[Step 6] — Detailed Design

For the scalability, we will need multiple DB and partitioning.

Choice A) Round-Robin partitioning

pro(s):

uniform distribution of data

simple

con(s)

If we use cache between the application server and DB, we won’t exploit the advantage of cache very well; we are focusing on the equal distribution so much that we will have more misses

Choice B) hash-based partitioning:

pro(s):

We can simply re-use the short_link(key)

con(s):

it is possible that certain range of the key will be requested more, causing unbalanced load. To improve this, we can use “Consistent Hashing” with “replica” to increase cache hit and achieve good load balance.

For the Cache,

Use Memcached (which is distributed memory-caching system) to ease complexity

Our estimation says we would need about 10GB at max and this can fit in a single machine as modern-day server can have 256GB memory

Since we still have a space to use cache, we can even spare some of them for hot URLs

I would prefer LFU over LRU. Since a shortlink is usually for temporary use and they will be used actively at a certain time period. So I think rather than LRU, LFU can be more effective.

I would put this cache in both load balancer between client and the server and the server and the DB.

For the Load Balancer,

Assuming our DB is well replicated and consistent and servers are working on the same type of tasks, I will focus on even distribution solely; Round Robin seems to be a good policy. This can also manage the case when recipient server or DB is dead as it receives the health status from them periodically.

[Step 7] — Identifying and resolving bottlenecks

1 when to delete?

For the sake of the efficiency, instead of regularly checking which are expired, we will wait for until a user tries to access an expired link. If it is found to be expired, we will delete it from our DB and update cache as well. If a key is deleted, this will go back to DB for unused keys to reuse which can also prevent us from reaching the capacity of the number of keys a KGS can create.

2 What about those unused keys (shortlink)?

For those keys unused for a long time, we will let it stay in our DB because we are not heavily using memory.

3 How about the cleanup?

When we have delete those garbage data (expired ones), we have to stop the DB for the cleanup, which will make us lose availability. To resolve this, I would like to have a shadow(secondary) DB or each DB where primary DB and secondary DB can monitor each other. This not only improves failure tolerance but also allow us to stay available during the cleanup by swapping those two groups.

[Step 8] — Others

We can collect other user data such as location, date and time of access, and count of URL. We can also analyze which URL is popular. Like search engines, we can collect those valuable data.

We can also allow users to create private URL, where will require invitation key which can generated by computing hash function on the host’s unique key like user_token.

It was fun to apply my knowledge to design this system. Although there could be still existing problems or better solution, I think that is the point where I can learn more from experienced developer’s feedbacks and have a great engineering talk! Feel free to give me a feedback, I would be so excited to learn more from others!

--

--

Jason Lee

University of Michigan(2020) / Software Engineer @ ByteDance — TikTok