Optimizing performance with virtual file systems

Peter Eisenberg
Tresorit Engineering
5 min readSep 11, 2020

--

Cloud services provide multiple ways to reach the stored files. Whenever users want to reach their files from the cloud, they would like to access them the way they are used to accessing them in a local system. But the direct access from the cloud is only comfortable if the files can be reached without any performance issues.

In this post I would like to share what the Tresorit development team faced as a performance bottleneck to access files directly from cloud.

Virtual File Systems in a nutshell

Modern operating systems support an interface between the kernel and the application’s concrete file system (FS), known as virtual file system (VFS). This VFS can be attached (mounted) like a normal local FS.

A VFS is expected to handle the following features:

  • Permission handling
  • File / directory operations (create, remove, open, close, read, write)
  • Extended attributes (preferred, but not necessary)

When a user tries to open a file for reading, the following kernel calls are made:

  1. Validate existence and permission: Checks whether the requested path and its parent directories exist and the user has sufficient permissions to access them.
  2. Open the file.
  3. Read the necessary chunk(s) of the file.

Tresorit’s virtual file system: Tresorit Drive

Tresorit supports VFS, also known as Tresorit Drive, for direct access where the user can manage their files in the cloud from native file managers, such as Windows Explorer or Finder.

In our case, the client builds up the directory structure from top to down by downloading, decrypting and parsing all encrypted metadata of files and directories. These metadata contain the necessary information to represent the files and directories in its folder (e.g.: file name, size, creation and modification dates).

OK, you may say: That sounds simple, so what can be the problem?

I would say nothing, if you have a simple directory structure, which is represented in the cloud and on your disk with a similar directory structure. In that case the file or directory can be verified with a simple query (e.g. in Linux this query is an Inode check).

However, our philosophy in Tresorit is to know nothing about the user’s stored data and their file structure. To support end-to-end encryption, the client side handles all the necessary logic to build up the directory structure from the encrypted data. On the cloud storage all the items (metadata, directories and files) are stored as encrypted file chunks in a flat directory hierarchy. These items are identified by randomly generated IDs.

So let’s go through the above mentioned 3 steps again:

  1. Validate existence and permission
  • Download the root directory or its sub-directories’ metadata.
  • Decrypt and validate whether the sub-directory or file exists.
  • Download the directory or file metadata and do the previous step again if it has a child item.
  • Repeat the former steps until the requested file or directory is found.

2. Open the file

  • Download the necessary file chunks.
  • Decrypt the downloaded file chunks and write them to the local disk as a temporary file.

3. Read the necessary chunk(s) of the file from the temporary file

As you can see, instead of a single step Tresorit does multiple, and way more complex, steps to reach the same result.

Reducing resource usage

To reduce the unnecessary resource usage for each directory change, Tresorit stores an encrypted cache in the profile directory. This cache contains the current state of the cloud and its necessary metadata (children, files, size and version). The client maintains the cache when the user is logged in by refreshing it on user interactions (e.g. modifying a file) and periodically to see changes made on different devices.

This cache is used by the client to give a faster response for the user. In other words, the previously mentioned first two steps can be replaced with a simple query from the cache. However, the cached metadata may not represent the current state of the cloud’s directory structure.

The problem with non-existent files

We noticed that our Drive solution on macOS was much slower than on Linux. By “much slower”, we mean minutes, or even more time.

The use case was quite simple: use Finder or ls to list a directory which contained thousands files.

After investigating the problem, we realized that Finder and ls were checking for at least two non-existent elements for each file. The reason for these queries was that the file managers store attributes and meta-information of the directory content for their internal logic (e.g. desktop.ini, .thumbnails or .git).

That time the client did not trust in the cache’s current parent folder state. Therefore, when the kernel looked up these non-existent elements, the client tried to query them from the cloud. In other words, when the client listed 100 files, the kernel fetched 100 cached elements and requested another 200 queries from the cloud.

The problem with cache updates

When the kernel called the function which gets the permission and attribute information (stat on Linux) for files, our Drive requested an ad-hoc metadata update on the current root directory. This update is done only when the most recent successful update on the directory was older than 15 seconds. This update did the above mentioned steps for the root directory’s metadata.

Here I have to highlight one word, the root of the evil: successful. Tresorit and also the FS interface are working on a multi-threaded environment, hence when the kernel called the stat function for multiple files, the application handled it in parallel. This queued a refresh for each thread which was handled sequentially. That happened because the timestamp of the latest update was set only when the update has been succeeded. In other words every 15 seconds it did an unnecessary look up N times, where N was the count of threads.

The solution: using prefetched cache data

To improve the performance, we had to rethink the workflow of file handling on the Drive. Before that, as it was described above, the Drive requested a cache update by ad-hoc queries.

In the background, Tresorit does lot of internal logic to keep the cache up to date. This mechanism, which is called prefetch, runs periodically and iterates through all of the directories. In every iteration, the algorithm checks whether anything has changed in the current root directory and updates the cache. However, on user interaction, the prefetch algorithm prioritizes the update of the actively browsed directory.

This logic was already implemented for Tresorit’s internal file browser. The challenge was to activate it for Drive operations as well. Because the cache is maintained independently from Drive operations, the new implementation does not require any cloud query anymore to look up file attributes and permissions.

--

--