Offloading traffic from servers by allowing clients to directly access blob storage

Published in

Tresorit Engineering

9 min readSep 27, 2019

Have you ever faced a problem where the ever-growing user base requires an ever growing number of server instances? Of course you horizontally scale your service, but what if you are forced to do it more than you wish to? That is exactly what we faced a few years ago here at Tresorit. This is the story of how we managed to drastically reduce the number of API instances we need for our service.

What is Tresorit?

So, Tresorit is an end-to-end encrypted file sync and sharing solution which safeguards confidential information by design. What we do is:

Make unauthorized file access technically impossible by encrypting all information on the client side — before it leaves your device. So we are working with files, lots and lots of files that basically look like random trash when uploaded to our server.
Empower seamless work from anywhere at anytime. Our end user client runs from the browser and we also have native clients across all major desktop and mobile platforms.
Leverage powerful control and monitoring options. So the basic use case isn’t backup, it is collaboration — many users download and modify the same files. We keep all copies until the user explicitly deletes them and provide a log of who modified what.

Our service is completely cloud-native, initially written for AWS, but first released with a backend running completely on Microsoft Azure. We highly utilize the platform elements of Azure, using Cloud Services, App Services & Azure Functions to run our API and web applications. For storing the uploaded encrypted content we use Azure Blob Storage, Table Storage for the metadata and Azure SQL for data wherever Table storage doesn’t make sense (which is a whole different topic deserving a full article).

The extremely detailed architectural diagram of our service

The Problem

At the core of our service, we store files on the Azure Blob Storage. All these file versions are uploaded as a different blob and never modified again until they are deleted. We use Azure Table Storage to store all metadata related to these changes (e.g. file ID — blob URL pairings) as well as all other information required to serve our clients (ACL, activity history, exact order of changes). The exact location URL on the blob always contains random characters, generated by our server to ensure that in case of parallel uploads of a new file version only one can get committed into the table storage, but no upload blocks another.

Ideally what we want is that our clients can upload/download all data directly from/to the blob storage while maintaining the same control over the storage access as when accessing it through our servers. Just to give you an idea of things that we want to keep on the server side:

Permission checking. Naturally, we don’t want to allow any client to upload or download any content that they shouldn’t have access to. Even though everything is encrypted on the client side we don’t want to give the encrypted content to anyone.
We don’t want to allow a faulty (or malicious) client to be able to overwrite a file version that was already uploaded earlier. To be able to give full version history to our clients we never allow modifications of these files.
We want to stay in control of exactly where we upload a file to the blob. We don’t want to tie our hands regarding the exact storage account/container/blob URL etc.
We want to write the change log from the server-side once a change is made and not allow that to be faked.
We don’t want a file to be created without us knowing about it.

However we want to make sure that the data traffic does not go through our servers because of two reasons:

Performance. If the files are not proxied through an additional server it should be faster — especially with the current performance targets of Blob storage.
Costs. Regardless of what technology we use (Cloud Services, App Services, Functions) it does cost a lot of money to relay TBs of data through it every day.

A deep dive into the Azure Blob API

Azure storage offers three types of blobs: block blobs, append blobs and page blobs. Each are optimized for different scenarios, we use block blobs as our main use-case is to allow uploading of non-mutable large files.

The blocks will committed after the Put Block List API call

Smaller blobs can be uploaded via a single call (Put Blob), but we are interested in the larger ones for now.

All block blobs consist of separate blocks. These can be uploaded in parallel via the Put Block API call. This is where the content is uploaded to the storage. Once uploaded these blocks are discarded automatically if they are not committed within 7 days.

They all have a unique block ID, which can be committed at the end via calling the Put Block List API call and specifying the exact order of these.

For download the Get Blob API call can be used, which has an optional Range header to be able to download separate chunks of a file in parallel.

It looks clear that the API calls we want our clients to be able to call are the Put Block and the Get Blob. The tricky part is the authorization. Currently there are 4 ways to do this:

Anonymous access: this is the easiest way, however, only read access can be given this way. If you think back to the constraints above it is also not sufficient, since anyone could download the encrypted content this way.
Azure Active Directory: allows you to give fine grained access to users, groups or apps. However you cannot limit access to uploading or downloading a single blob.
Shared Access Signatures (SAS): allows the server to delegate access to a resource with specified permissions for a specified time interval via a small signed token that should be appended to the URL of the request by the client. This is almost good for us. To be honest if we would design our solution now (and not a few years ago) this could be the path we would choose. However, what it doesn’t allow us is to limit our clients in calling the Put Block List API while allowing the Put Block API call. So by providing a SAS token a malfunctioning/malicious client it would be able to upload as big of a file as he wishes without notifying the server afterwards (thus causing us to store data we don’t know about) — so some kind of custom delayed cleanup logic would be required.
Shared Key: can be used to sign the authorization header of each request towards the storage account. Sharing the key with the client would be an extremely bad idea, however we are able to calculate this signature on the server-side and provide it to the client, which can use it for 15 minutes to make the API call towards the storage account.

Solution

Let’s look into the download use-case first!

Instead of downloading the data through our API server, the client requests a signed request from that same server. After proper permission etc. checking the server responds with is the following information:

The URL and VERB that the client should call e.g. GET to https://myaccount.blob.core.windows.net/mycontainer/myblob
Additional headers that the client must provide for the request, x-ms-date: which is the current date, thus the request is valid for ±15 minutes in case of time skew, Range: optionally the client can request only a part of the file
And the Authorization header which is a HMAC-SHA256 of the account key, and the canonicalized resource and the 2 headers above. See the documentation if you want to implement it by yourself.

With this information the client is now able to download the data directly from the blob storage.

The logic is pretty similar in the upload case as well. The client requests a signed request from the server while providing 2 parameters: the hash & length of the block it wants to upload. The server returns with the URL, VERB and headers (including the Authorization header) that should be used to make the request. After uploading all blocks to the blob storage, the client calls a different API endpoint on the server for committing these blocks while providing the proper block ID-s and the server will execute the Put Block List call, as well as writing/modifying the proper entries in Azure Table Storage.

Uploading data directly to the blob storage

Compatibility with Amazon S3

If you are running your application on Amazon and/or utilizing Amazon S3 you can also use a very similar solution. At Tresorit, we are also able to store the encrypted data in Amazon S3 storage (or any other storage provider that provides a similar API). This is how the direct upload/download looks like in case of S3:

The construction of the Authorization header is very similar to Azure. No extra information is required to sign a given request.
For downloading data it is again the same case. Using the GET Object API call requires the exact same headers (Date, Range, Authorization)
The major difference is in the upload logic, which is 3 API calls instead of the 2 above. The Initiate Multipart Upload call returns an upload ID which is later used to associate all of the parts of the specific upload. For transferring the chunks of the file Upload Part API call is used where you must supply the upload ID, the part number (which uniquely identifies a part and also defines its position within the created object). Finally, you have to call the Complete Multipart Upload call to complete the upload of the object.

It is easy to see what extensions must be made on the service API to make it compatible with both Amazon & Azure: by adding one additional call to initialize the upload of a file the client does not even need to know which cloud provider it is uploading its content to. It is able to communicate with either storage service, without directly being dependent on its API. This setup can also be a big step towards preventing vendor lock-in: if the price of either provider changes drastically in either direction you are able to react and start storing data on the other one.

Conclusion

By introducing additional calls from the client side the data traffic can be easily offloaded from the application servers and content can be uploaded directly from & to the storage service of the cloud provider. What you have to consider is when it is worth it for your use-case: since instead of a single API call you might need two extra requests (as in case of uploading to Amazon). Of course, the logic can still be dynamically chosen based on the file size. An other easy optimization point is to acquire the signed requests in a batch either for the separate chunks of the same file or multiple files should it be needed.

This setup also adds extra complexity into an application (as mentioned above, nowadays SAS tokens provide a very similar level of control if it is suitable for your application logic). On top of that the client no longer connects to a single server for uploading/downloading data, which causes many other things to think about such as setting up CORS headers properly or even notifying your customers about additional domains/IPs they need to whitelist on their firewall.

Although measuring/limiting traffic of a single user is also tricky in a setup like this (since we have to rely on logs from the underlying storage system), we also gain a lot of things, such as a natural speed up in network if the chosen storage account is geographically closer to the customer than the API server and also we can directly use the bandwidth of the Azure storage accounts.

We were also able to cut down costs with this method since we need much fewer servers. And this logic was also our first step to provide data residency options for our customers who need to satisfy a company policy or industry and jurisdiction specific regulation. On top of everything, this setup also allows us to store data on any storage location our customer supplies — as our client will work exactly the same way with any underlying storage.