A comparative analysis of Mountpoint for S3, S3FS and Goofys

Maksym Lutskyi
14 min readSep 6, 2023

--

In the realm of cloud computing, Amazon Web Services (AWS) has revolutionized how organizations manage and store their data. Amazon S3 (Simple Storage Service) stands out as a cornerstone of this ecosystem, providing scalable, durable, and highly available object storage. While S3 offers unparalleled flexibility, its interaction model differs from traditional file systems. Enter Mountpoint for Amazon S3 service, a tool that bridges this gap by enabling users to mount Amazon S3 buckets as file systems on their machines. AWS announced the general availability of Mountpoint for Amazon S3 on 9 August 2023 and claimed it as a production-ready open source file client that makes it easy for Linux-based applications to connect directly to Amazon S3 buckets and access objects using file APIs.

With Mountpoint for Amazon S3, applications can access objects stored in Amazon S3 through file operations like “open” and “read”. Mountpoint for Amazon S3 automatically translates local file system API calls to REST API calls on S3 objects and is optimized for applications that need high read throughput to large objects, potentially from many clients at once. It supports sequential and random read operations on existing files and writes new objects sequentially from a single client at a time. It is ideal for large-scale analytics applications that read and generate large amounts of S3 data in parallel but don’t require the ability to write to the middle of existing objects. This means it’s a great fit for applications that use a file interface.

It is a powerful tool that can be used to accelerate a variety of workloads, including:

  • Data lakes.
  • Machine learning training.
  • Image rendering.
  • Autonomous vehicle simulation.
  • Extract, transform, and load (ETL) processes.

While Mountpoint may seem like an excellent choice for solving the issue of mounting S3 Buckets as a local file system, it is not the first tool to address such challenges. After all, aside from it there were other open-source projects that offered similar functionality. Among the most well-known are s3fs, goofy, RioFS, ObjectiveFS, etc. Therefore, it’s possible to conduct a comparative analysis of AWS’s mounting tool as well as two of the most popular community open-source solutions — s3fs and goofy.

S3FS is an open-source FUSE (Filesystem in Userspace) implementation and popular command-line client for managing object storage files quickly and easily on Linux, macOS and FreeBSD machines. It is frequently updated and has a large community of contributors on GitHub. It enables users to mount Amazon S3 buckets as local file systems. S3FS aims to simplify data migration, processing, and manipulation tasks by offering a file system abstraction layer over S3 objects.

Goofys is another open-source FUSE-based tool for Linux and macOS designed to mount Amazon S3 buckets as local file systems. It’s characterized by its simplicity and lightweight nature and is developed in GO language. It provides almost the same functionality as s3fs-fuse but with better performance.

Comparing AWS Mountpoint for S3, S3FS and Goofys:

Ease of setup:

S3FS: On Linux pre-built packages are accessible for installation via the builtin system package manager. Further configuration involves setting up ${HOME}/.aws/credentials file or separate ${HOME}/.passwd-s3fs file with AWS credentials.

Goofys: On Linux Goofys can be installed with pre-built binaries from its GitHub repository. Further configuration involves setting up ${HOME}/.aws/credentials file or setting up environment variables with AWS credentials.

MountPoint for S3: Mountpoint utility named mount-s3 can be installed via a pre-built package from Amazon repository, as well as built from sources if needed. Further configuration involves setting up ${HOME}/.aws/credentials file or setting up environment variables with AWS credentials.

Performance:
Performance testing of mounting tools was conducted using the FIO utility which is a widely used tool for such purposes as well as the JuiceFS bench utility. Sequential ‘write’ and ‘read’ operations were performed on files with sizes of 4 GB and 16 GB by using FIO therefore mixed ‘read’ and ‘write’ functions. Such testing was conducted in single process and 16 processes modes along with random ‘read’ and ‘write’ functions testing.

Testing was being performed for ‘write’ and ‘read’ operations for a 1GB file with a block size of 1MB and ‘write’ and ‘read’ operations for 100 files each with size of 128 KB in single process and 16 processes modes via JuiceFS bench. The tests were run on EC2 m5.4xlarge instance with Amazon Linux 2 OS in us-east-1a connected to the bucket in us-east-1 regions.

Next applications versions were used during throughput testing:

  • Fio — 3.7
  • JuiceFS — 1.0.4
  • Goofys — 0.24
  • S3FS — 1.93
  • Mount-s3 — 1.0.0

Command that was used for testing:

fio --name=seq_1_thread --directory=<mounted_directory> --size=<size_of_file> --rw=<type_of_IO_pattern> --bs=128k --direct=1 --numjobs=<number_of_jobs> --ioengine=libaio --iodepth=16

Size — 4 Gb and 16 Gb values were applied

rw — write, read, rw, randwrite, randread options were tested

bs — 128 kb block size was used as average value

direct — non-buffered I/O

numjobs — Number of processes to perform operations.

ioengine — Defines how the job issues I/O to the file. Libaio was chosen as Linux native asynchronous I/O.

iodepth — Number of I/O units to keep in flight against the file. Value of 16 was chosen for emulation of asynchronous access to files

Commands that were used for mounting:

Goofys

goofys test-bucket /tmp/goofys

S3FS

s3fs test-bucket /tmp/s3fs -o bucket_size=1PB,parallel_count=400,ensure_diskfree=1024,del_cache,use_cache=/tmp/

mount-s3

mount-s3 test-bucket /tmp/mount --allow-delete

Test results:

Goofys demonstrated high performance in sequential ‘write’ and ‘read’ operations but this behavior is a result of certain limitations in its functionality: the absence of file metadata transfering, weak POSIX compatibility. In addition, Goofys does not store file mode/owner/group as well as symlink or hardlink and supports only sequential writes. In general, during the testing Goofys demonstrated high stability and reliability while working with files. There weren’t interruptions or failures observed during its operation.

S3FS showed the slowest file operational throughput among other tools. Instead, it has a set of distinguishing features that make work with a mounted S3 bucket more similar to an ordinary file system. Metadata transfer is accomplished by using headers and results with lower data transfer speed. It maintains a high level of POSIX compatibility. S3FS supports a large subset of POSIX including reading/writing files, directories, symlinks, mode, uid/gid and extended attributes, renames via server-side copy. Another important feature of S3FS is data caching, where data is cached on the local disk. Caching enables support for random ‘write’ operation and appending data to existing files. Goofys also supports adding data to a file but only by recreating it while mount-s3 doesn’t have this function at all. This can be observed on the dashboard where only S3FS was able to perform random and mixed ‘read’ and ‘write’ operations by using the caching approach. However, caching may also lead to issues during operations and this function can’t be disabled. While working with large files, the local file system may run out of disk space and makes further operations impossible. During the throughput testing several issues were noticed and they had a negative effect on data operations speed. The first issue is related to caching. After running several tests with 16 GB file operations in 16 parallel processes, the local disk became empty with all free space occupied by S3FS cache, causing data operations to halt. This problem can be resolved only by manually deleting the cache because S3FS mounting options only allow to reserve free space on the disk and cache deletion after unmounting the directory. The second issue is related to the handler pool of the program. S3FS may run out of handlers in the pool after some time during performing operations on large files in multipart upload mode depending on the block size and the number of requests. Such behavior is represented in S3FS logs:

During the entire time that handlers are being re-created and added back to the pool, no data operations are taking place and the average data transfer rate is significantly reduced.

Mountpoint for S3 showed very similar to Goofys results in terms of data throughput and overall features and capabilities. Mountpoint for S3 also has very low POSIX compatibility, supporting only the most basic operations like ‘open’, ‘close’, ‘read’, ‘write’, ‘create’, and ‘opendir’. It doesn’t have file metadata transfer and caching approach as well. It also does not support features like directory renaming, symlinks, hardlinks, file mode/owner/group and random writing. However, the most significant drawbacks include the inability to append or modify existing files. Every file created in the mounted file system becomes immutable, requiring manual deletion and rewriting for any changes. Goofys also has similar behavior, although it can automatically recreate files when appending. Additionally, there is no compatibility with fstab file options for Linux systems making it challenging to automate mounting on OS startup. Another limitation is the inability to delete files from the mounted directory by default, although this can be addressed by mounting the directory with the appropriate key. The FIO utility was used instead of the JuiceFS bench utility to test Mountpoint for S3 for writing and reading a large number of small files. This is due to the fact that Mountpoint for S3 does not have support for changing or overwriting already created files and in addition JuiceFS bench creates files first and only then writes to them and does not support changing this behavior. There are also no options to write into a file immediately after it was created, as it is possible in FIO. Therefore, the test results are less accurate for Mountpoint for S3 on the picture 3.

Scalability:

S3FS: Potentially, S3FS can accommodate multiple client mounts. However, due to the absence of coordination among multiple clients while mounting the same bucket. This could potentially carry a certain level of risk during ‘write’ operations as the concurrency of such actions remains uncontrolled. Nevertheless, some level of data consistency is maintained. This applies to concurrent read and write requests from multiple clients simultaneously. However, this is ensured by the AWS read-after-write consistency approach from December 2020.

Goofys: Certainly, since goofys leverages the same FUSE (Filesystem in Userspace) implementation, its scalability capability is equivalent to that of s3fs.

Mountpoint for S3: Mountpoint can scale up and down over thousands of instances. Mountpoint for S3 builds on the same AWS Common Runtime (CRT) library that is used by most AWS SDKs. For Amazon S3, the CRT includes a client that implements performance design patterns, including timeouts, retries, and automatic request parallelization for high throughput.

Flexibility of setting:

Goofys doesn’t have too many options for mounting an S3 bucket, it has only basic ones. There are the ability to enable caching, support for system-specific mount options, allowing the use of fstab for automounting among the important features. Available options also include cache TTL management, ACL for objects, encryption including SSE-C. On the other hand, there are no support for configuring multipart uploads, S3 requests or proxies and insufficient documentation.

S3FS offers an exceptionally large number of options for mounting. Particularly useful features are the fine-tuning cache configuration, its storing on local disks and utilization, the ability to add an HTTP header for each file, configuring the number of requests to S3 and multipart upload options for each request. S3FS provides the option to disable multipart approach entirely, to choose which APIs to work with, to select which IAM to use, to enable xattr usage and to configure proxies for operation. Additionally, it offers the option to use fstab for automounting, system-specific mount options, support for multi-threaded operation and can process in foreground mode. In general, S3FS offers a multitude of options for fine-tuning but default settings show average performance. Also, usage of these options are often required for performance improvement.

Mountpoint for S3 has limited mounting configuration capabilities, roughly the same as Goofys. However, it shows a higher level of integration with S3 among other tools. Some of the most important features include region detection which attempts to automatically determine the region of the bucket at startup, support for S3 access points which allows to mount a bucket via its access point ARN and has great control over security, support for S3 Multi-Region Access Points, S3 Object Lambda Access Points and S3 Transfer Acceleration when accessing S3. It can operate in multi-threaded or foreground modes. At this time the main drawback is the lack of support for system-specific mount options which prevents the use of fstab for automounting. Additionally, SSE-C encryption is not supported as well as IAM and ACL. Their configurations must be done on the S3 side. The ability to enable caching is absent as well as the ability to enable a proxy or configure requests.

In addition, all three tools offer support for choosing the storage class for new objects, enabling requester-pays buckets, the ability to mount a bucket prefix and connecting to an S3 bucket via an endpoint. Furthermore, these tools allow editing of permission bits for directories and files the same way as their UID and GID.

Use Cases:

S3FS allows mount S3 buckets as if they were directories on local machine. This can be beneficial if there is a need to access S3 data using familiar file operations and compatibility with a wider range of applications and utilities that expect a standard POSIX file system. S3FS is suitable for streaming large media files or caching frequently accessed data locally which can reduce data transfer costs and enhance performance likewise creating backups and archives directly to S3 seamlessly. On the other hand S3FS may not be as performant as other methods like the AWS CLI or SDKs especially for high-throughput applications. Setting up and S3FS configuration can be more complex than using Goofys or Mountpoint.

Goofys is known for its simplicity. It’s an excellent choice if there is a demand for lightweight, easy-to-use tools to mount S3 buckets. Goofys can be an efficient way to access S3 data without extensive setup for quick prototyping or small-scale projects. Goofys is less resource-intensive compared to S3FS which can be crucial for low-resource environments, so it’s better for projects where simplicity, lightweight operation and resource efficiency are primary concerns, especially for small-scale projects or when resources are limited. On the other hand, Goofys may lack some advanced features available in S3FS such as performance tuning options and due to its lightweight nature and weak POSIX compatibility it might not be the best choice for projects where applications rely on file mode/owner/group permissions or random operations.

Mountpoint for S3 is conceptually very simple and strives to make good use of network bandwidth, increases throughput and allows reducing compute costs by getting more work done in less time with focus on performance and stability. It supports file-based workloads that perform sequential and random reads, sequential (append only) writes. Mountpoint is the only one solution in projects where an official, enterprise-ready and production-ready client for performant access to S3 at scale is needed because any other FUSE implementation isn’t claimed as production-ready. On the other hand, absence of caching, POSIX compatibility and write appending approach without automatic rewriting makes this tool applicable only for heavy-read applications with some exceptions.

In general, any FUSE implementation or other bridge between object and file storages has a limited scope of application due to its inherent constraints and is not an universal solution, omitting rare exceptions. Such solution can be used for legacy applications that co-exist with cloud-native applications. These legacy applications can be costly and time-consuming to migrate to a cloud environment, so they are forced to communicate with cloud storage.

There are also cases where data storage has been moved to the cloud, but legacy applications that rely on it can only perform ‘read’ and ‘write’ operations with a traditional file system. Even in such scenarios, the application must meet certain conditions to work productively and reliably with object storage.

  • First and foremost, the application that will perform file operations should not require high processing power and should not have a strong dependence on POSIX semantics.
  • If an application is both reading and writing files, it’s best to use a real file system for working data and copy only the final results to an object store. This is because concurrent ‘read’ and ‘write’ operations are either not supported or have low throughput.

A FUSE client is most useful and practical when only a small part of the workflow requires a classic file system to work with, primarily conducting simple ‘read’ and ‘write’ operations, while the majority of the workflow involves direct interactions with object storage. It’s essential to keep in mind the limitations of FUSE implementations.

  • Avoid relying on ownership or permissions within the FUSE client; instead, manage permissions through S3 key policies.
  • No atomic renames of files or directories (such as the ‘mv’ command).
  • Minimize directory listing operations because of low efficiency of this operation
  • Only sequential operations are allowed, random writes or appends to files require rewriting the entire object, optimized with multi-part upload copy
  • Do not utilize hard links and symlinks.
  • Keep realistic expectations regarding consistency across clients, and avoid sharing files through multiple clients with FUSE mounts because there are no coordination between multiple clients mounting the same bucket
  • Consider that very large files (1TB or larger) may not be well-suited for this approach.

In essence, the application utilizing the FUSE filesystem should primarily function as a straightforward reader or writer of files.
It’s very easy to think of S3 like a file system, but it’s not. S3 is a distributed NoSQL database for large objects that does not support custom attributes or content searching. It’s fully atomic, eventually consistent, higher latency than GlusterFS or even NFS. S3 doesn’t immediately guarantee consistency. Eventual consistency can create very odd, transient situations such as one server serving new content and others serving old. If web app is properly RESTful and not session based, then users may not even get a ‘consistent set’ of site files. S3FS tries to provide a very POSIX-compliant filesystem, but it can only do so much. For example one of the limitations of S3FS is ‘inotify detects only local modifications, not external ones by other clients or tools’ might be confusing. If web server software caches content in RAM and relies on watching the disk via inotify, the web server will continue to serve old content indefinitely. Even worse, if s3fs mount was run without use_cache option, the latency on requests goes up dramatically, slowing down the site, as it fetches from S3 during each request. This also can increase costs via per-request fees and negatively affects user experience, as page load speed is considered. Generally, serving static assets from web applications will likely be served slower as throughput for high quantities of small files is low, that can be seen on the dashboard. Indeed, while FUSE clients like Mountpoint for S3 usage is free, there are costs associated with each request made to Amazon S3 (except for requests within the same region) and for data storage in the bucket. Frequent queries to S3 bucket can significantly increase the cost of storage usage depending on where requests to S3 will originate from.

Conclusion:

Mountpoint for S3, goofys and s3fs are clients that enable the use of an object store with the applications that expect files. S3FS is a powerful tool for seamlessly integrating Amazon S3 into Linux environment, making it an excellent choice for scenarios that involve mounting S3 as a file system that supports a certain level of compatibility with POSIX-based applications where performance level is not so critical. It allows advanced configuration with decent features and can manage backup and archiving, streaming, caching.

Goofys can be opted when simplicity, lightweight operation, performance and resource efficiency are primary concerns, particularly for smaller-scale projects or resource-constrained environments.

In cases where a straightforward, official, enterprise-ready client with high performance for reliable access to S3 is needed, Mountpoint for S3 is the single choice as it ensures high-speed data throughput. It is particularly well-suited for read-heavy applications such as data lakes, machine learning training, and especially when dealing with scalable applications.

Consider the specific use case during choosing between S3 access methods whether the convenience and compatibility of FUSE clients align with project’s requirements.

--

--