Cloud storage system design

3 min readJul 18, 2019

Lets design a file storage system like google drive, dropbox. Cloud storage enables the user to store their data in the remote server and made the file available through the network.

Why?

Cloud storage became popular as they simplify the storage and synchronize the file across the multiple subscribed devices.

The main motto of the cloud storages are,

Availability
Reliability and durability
Scalability

Requirements

User should be able to upload/ download the files from cloud storage.
The file should synchronize with the other devices which the user has subscribed.
User can able to share the file with other users.
Offline editing should be enabled.

Design considerations:

We should expect huge volume of read and writes.
Read and write ratio nearly will be the same.
Internally files can be stored as chunks. Because when the editing is made to the file we can upload/download the particular chunks or when the read/write operations fail the retry will happen only to the particular chunk, not to the whole file.

High-level Design :

For eg: user will specify the folder as documents on their devices. Any modification to the specific folder will be updated on the cloud storage . The user can specify similar workspaces on all their devices and any modification done on one device will be propagated to all other devices to have the same view of the workspace everywhere.

We can have some servers that can help clients to upload/download the file and some servers to notify the changes in one device.

As shown in the diagram below, Block servers will work with the clients to upload/download files from cloud storage and Metadata servers will keep metadata of files updated in a SQL or NoSQL database. Synchronization servers will handle the workflow of notifying all clients about different changes for synchronization.

Component design :

Major components:

A. Client

Here are some of the essential operations for the client:

Upload and download files.
Detect file changes in the workspace folder.
Handle conflict due to offline or concurrent updates.

B.Synchronization services

It processes the file updates by the client and applies these changes to the other subscribed clients. And also it updates the local db with the information stored in the remote metadata db.

It should be designed in such a way that it transmits fewer data between the client and cloud storage to achieve better performance

How can clients efficiently listen to changes happening with other clients? One solution could be that the clients periodically check with the server if there are any changes. The problem with this approach is that we will have a delay in reflecting changes locally as clients will be checking for changes periodically compared to a server notifying whenever there is some change. If the client frequently checks the server for changes, it will not only be wasting bandwidth, as the server has to return an empty response most of the time, but will also be keeping the server busy. Pulling information in this manner is not scalable.

A solution to the above problem could be to use HTTP long polling. With long polling the client requests information from the server with the expectation that the server may not respond immediately. If the server has no new data for the client when the poll is received, instead of sending an empty response, the server holds the request open and waits for response information to become available. Once it does have new information, the server immediately sends an HTTP/S response to the client, completing the open HTTP/S Request. Upon receipt of the server response, the client can immediately issue another server request for future updates.