The Dropbox challenge for the AWS cloud

Krishna SL
@DivumCorp
Published in
3 min readAug 10, 2018

What if suddenly a million users were to sync data on their mobile device to the cloud at the same second? Can the current AWS infrastructure support it? How do we scale? How do we store such that it gets synced across multiple devices?

We were game for it and we took this up to deliver a solution for India’s largest telecom provider.

Challenge:

  1. Million User will sync data in the morning- at 3 am

2. Data can be photos, videos, contacts, sms and documents

3. Data to be available across all the user’s devices

4. No user’s data can be accessible by other users (TRAI regulations)

5. On the day of launch, we will have the million people use it up as the user base was already set up for the main application and this was an add on feature.

The solution included AWS core component of using S3 as the storage unit and the challenge is to get the entire solution at scale, performant and secure.

To reduce cost and scale more, we decided to use the server less architecture using AWS.

Here is the data flow diagram of how the tokens are created, exchanged and the data transfer happens.

Scalable Architecture to store Data in AWS S3

EC2 instances are pinged once in a lifetime to get the request token for the user. This request token is exchanged with cognito to get the identity id and identity pool id.

This is then exchanged with AWS STS to get the short term token.

A sample policy for denying unauthorised access

Using the short term credentials created, it would assume an IAM role that restricts the users from accessing other folders apart from its own.

This token is valid for 15 mins to 1 hour and it was used in AWS S3 sdk to sync the data to the cloud. Upon expiry, the tokens were refreshed again before further upload is triggered.

Security of the data was the next to be considered. We had used data encryption at rest and transit using S3 sdk to achieve it.

Not everything works the same way as expected, so how do we support the concurrent uploads that happen in the morning. How do we handle failover conditions? We started exploring exponential backoffs at the client SDKs along with jitter to distribute the load within a span of 10 minutes when the sync starts. This helped us handle the massive sync at 3 am in the morning.

Optimisation of cost is another key to be considered and S3 IA(infrequent access) was the best choice. S3 normal has half the charges for access of S3 IA but overall when the data size was high, storage took precedence.

Scaling up is required for any established brand and with the innovations we continue to perform, it has not just been made easy but also made affordable.

--

--