File-like access to IBM Cloud Object Storage using s3fs

Or Ozeri
Or Ozeri
Jun 7, 2018 · 7 min read

s3fs is an open-source tool for mounting a S3-compatible bucket as a local file system on a Linux machine. With the s3fs mountpoint, you can access your objects as if they were local files, i.e. list them with ls, copy them with cp, and access them seamlessly from any application built to work with local files. Unlike many file-to-object solutions, s3fs maintains a one-to-one mapping of files to objects.

s3fs can yield good performance results when used with workloads reading or writing relatively large files (say, 20MB+) sequentially. On the other hand, you probably do not want to use s3fs with workloads accessing a database (as file locking is not supported), or workloads requiring random read or write access to files (because of the one-to-one file to object mapping). Moreover, s3fs is not suitable for accessing data that is being mutated (other than by the s3fs instance itself).

Next, I will discuss how to install and customize s3fs for use with IBM Cloud Object Storage.

Installation

I recommend installing from source code, as you get the latest version, including all the latest bug-fixes and performance improvements. You can also install from a pre-built package, by runningsudo apt-get install s3fs.

To install from source, follow these instructions:

Install dependencies

sudo apt-get install automake autotools-dev fuse g++ git libcurl4-openssl-dev libfuse-dev libssl-dev libxml2-dev make pkg-config

Note that the official s3fs readme suggests installing libcurl4-gnutls-dev instead of libcurl4-openssl-dev. I found the OpenSSL version to be much faster than GnuTLS; however, either should work.
Update: In commit 7a4696 the readme was changed to recommend OpenSSL.

Download s3fs Source Code

git clone https://github.com/s3fs-fuse/s3fs-fuse.git

Build s3fs from source code

cd s3fs-fuse
./autogen.sh
./configure
make

Install s3fs Binary

sudo make install

Basic Usage with IBM Cloud Object Storage

First, create a file holding your credentials, either an API key or HMAC keys (similar to AWS’s access key and secret key; see instructions on generating HMAC keys here). The file content should be:

<access_key>:<secret_key>
or
:<api_key>

The credentials file must have limited access. Thus, you must run:

chmod <credentials_file> 0600

You can now mount your COS bucket via s3fs with the following command:

s3fs <bucket_name> <mountpoint> -o url=http{s}://<COS_endpoint> –o passwd_file=<credentials_file>

where <mountpoint> is the (existing) directory where you want to mount your bucket.
Performing ls <mountpoint> should print the list objects inside your bucket and pseudo directories (but excluding objects that reside in pseudo-directories).

Advanced Configuration — Optimize for Performance

I recommend running s3fs with the following advanced options, in order to achieve better throughput performance.

s3fs <bucket_name> <mountpoint> -o url=http{s}://<COS_endpoint> –o passwd_file=<credentials_file> -o cipher_suites=AESGCM -o kernel_cache -o max_stat_cache_size=100000 -o multipart_size=52 -o parallel_count=30 -o multireq_max=30 -o max_background=1000 -o dbglevel=warn -o sigv2

Some details about the advanced options used above:

  1. -o cipher_suites=AESGCM
    This option is only relevant when using an HTTPS endpoint.
    By default, secure connections to IBM COS use the AES256-SHA cipher suite. Using an AESGCM suite instead greatly reduces the CPU overhead on your client machine, caused by the TLS crypto functions, while offering the same level of cryptographic security. See OpenSSL documentation for the full list of supported cipher suites.
  2. -o kernel_cache
    This option will enable the kernel buffer cache on your s3fs mountpoint. This means that objects will only be read once by s3fs, as repetitive reading of the same file can be served from the kernel’s buffer cache. The kernel buffer cache will only use free memory which is not in-use by other processes. The only case where I would not recommend setting this option is if you expect the bucket objects to be overwritten from another process/machine while the bucket is mounted, and your use-case requires live access to the most up-to-date content. Note that even if you set this option off, you are still not guaranteed to get the most up-do-date content from the object store service.
  3. -o max_stat_cache_size=100000
    This setting reduces the number of redundant HTTP HEAD requests sent by s3fs and reduces the time it takes to list a directory or retrieve file attributes. Typical file system usage makes frequent access to a file’s metadata via a stat() system call which maps to HEAD request on the object storage system. By default, s3fs caches the attributes (metadata) of up to 1000 objects. Each cached entry takes up to 0.5 KB of memory. Ideally, you would want the cache to be able to hold the metadata for all of the objects in your bucket. However, you may want to consider the memory usage implications of this caching. Setting it to 100000 as I suggest above will take no more than 0.5 KB * 100000 = 50 MB. You should also consider how many objects are in your bucket in setting this value.
    Update: In commit 4a72b6 the default was increased to 100000.
  4. -o multipart_size=52
    This option will set the maximum size of requests and responses sent and received from the COS server, in MBs. s3fs sets this to 10 MB by default. Increasing this value also increases the throughput (MB/s) per HTTP connection. On the other hand, the latency for the first byte served from the file will increase respectively. Therefore, if your use-case only reads a small amount of data from each file, you probably do not want to increase this value. Furthermore, for large objects (say, over 50 MB) throughput increases if this value is small enough to allow the file to be fetched concurrently using multiple requests. I find that the optimal value for this option is around 50 MB. COS best-practices suggest using requests that are multiples of 4 MB, and thus my recommendation is to set this option to 52 (MB).
  5. -o parallel_count=30
    This option will set the maximum number of requests sent concurrently to COS, per single file read/write operation. By default, this is set to 5. For very large objects, you can get more throughput by increasing this value. As with the previous option, keep this value low if you only read a small amount of data of each file.
  6. -o multireq_max=30
    When listing a directory, an object metadata request (HTTP HEAD) is sent per each object in the listing (unless the metadata is found in cache). This option limits the number of concurrent such requests sent to COS, for a single directory listing operation. By default it is set to 20. Note that this value must be greater or equal to the parallel_count option above.
  7. -o max_background=1000
    Setting this option improves s3fs concurrent file reading performance. By default, FUSE supports file read requests of up to 128 KB. When asking to read more than that, the kernel split the large request to smaller sub-requests and lets s3fs process them asynchronously. The max_background option sets the global maximum number of such concurrent asynchronous requests. By default, it is set to 12. I recommend setting it to an arbitrary high value (1000), so that your read requests won’t be blocked, even when reading a large number of files simultaneously.
  8. -o dbglevel=warn
    s3fs prints logging messages to /var/log/syslog. By default, the logging level is set to crit (critical messages only). For better health monitoring, I recommend setting the debug level to warn or error. This should not result in excessive log printing.
  9. -o sigv2
    This option sets s3fs to use AWSv2 signatures instead of AWSv4 signatures. AWSv4 signatures are much more compute intensive, since they include a SHA256 hash of the request payload. Thus, setting this option can greatly boost s3fs write performance.

Is s3fs Code Stable for Production Workloads?

I have tested s3fs extensively by executing long-running deep learning workloads using an s3fs mount point as input. Some of these workloads lasted for more than a week. While I initially encountered a small number of issues when using the -o use_cache option or issues related to SSL connectivity, these were easily fixed, and I contributed patches into the s3fs repository. My fixes were committed to the master branch, and no further issues were found since commit dbe98dc (on May 6th 2018). s3fs continues to be used in production by IBM’s Deep Learning Service, since its official launch on March 2018.

You should note that my testing, and the deep learning workloads in general, are mostly read-only, and I have spent less time evaluating other workflows.

Performance

I tested s3fs performance with IBM Cloud Object Storage from a bare-metal machine in IBM Cloud.

Single-threaded file reading

I measured the throughput of reading a single file using a single client thread for a non-cached object. The throughput improved almost linearly with the size of the read object. Reading of small files was obviously relatively slow (comparing to reading from a local disk), due to time-to-first-byte from the object store service. As I read bigger files, s3fs used more concurrent connections under the single thread, and thus I got a better overall throughput. I was able to achieve a maximum throughput of 540 MB/s from a single client thread.

Concurrent file reading

I was able to get an aggregate throughput of 1000 MB/s with s3fs installed on a bare-metal machine on IBM Cloud, when reading multiple files at once (each file was about 200 MB). The machine was equipped with a 10 Gbit/s NIC. For comparison, the same set of objects were downloaded using Python’s HTTP library (httplib) at 1158 MB/s. While s3fs achieves very good performance, its performance is limited due to overhead from the FUSE file system and client side copies.

Conclusion

s3fs allows you to conveniently access your data on IBM Cloud Object Storage via the common POSIX interface. This opens up many opportunities for easy integration of IBM Cloud Object Storage with many POSIX-compatible applications.

It is important to remember that s3fs may not be suitable for all applications, as object storage services have high-latency for time to first byte and lack random write access. On the other hand, workloads that only read big files, like deep learning workloads, can achieve good throughput using s3fs. My long-running tests suggest that s3fs may be used in production, at least for read-only workloads.

Thanks to Ronen Kat

Or Ozeri

Written by

Or Ozeri

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade