Exploring Apache Ozone Snapshots

Prashant Pogde
4 min readJun 25, 2024

--

Overview

The purpose of this blog post is to explore the snapshot feature for Apache Ozone object store in detail. It will cover details about the feature, how it’s exposed in the object store namespace, operations that it offers and storage space implications for this feature. This blog is part of the series on Apache Ozone Snapshots and subsequent blogs will cover more details and various aspects of this feature.

Apache Ozone Snapshots

Apache Ozone is a highly scalable, highly available, distributed and secure object store that can handle billions of keys. It is a fully n-way replicated and strongly consistent system that offers both Object-Storage as well File-System semantics. Apache Ozone doesn’t have any single point of failure either for the metadata or the data. It is compatible with Amazon S3 APIs as well as Hadoop Compatible FileSystem (HCFS) interface. It integrates seamlessly with YARN, Hive, Impala, Spark and more out of the box and is a preferred choice for storage on-prem at large enterprises for analytics and machine learning workloads. It’s also gaining accelerated adoption for a variety of use cases including backup, storage, archival and incremental scale out storage.

Ozone Snapshots

The very first release of Apache Ozone Snapshot feature allows users/applications to take snapshots at a bucket granularity. Snapshot of a bucket captures the point in time image of the active object store bucket at the time of snapshot creation. This is explained in the picture below.

In the picture above you can see that in Apache Ozone you can take two snapshots of a bucket “/vn/bn/”. You can access these snapshots through a hidden namespace “.snapshot”. For example, you can access Snapshot “snap1” through the path “/vn/bn/.snapshot/snap1”.

Ozone Snapshot Operations

Apache Ozone supports following basic operations on the Snapshots

Snapshot Creation

Users and Applications can create new Snapshots at bucket level. Snapshot creation operation is an instantaneous operation regardless of the size of the object store. An Apache Ozone object store may contain billions of keys and yet the snapshot create operation is designed to complete instantaneously. Snapshot create operation is synchronous. When the snapshot create operation returns it captures the current version of all the objects in the bucket at this instant. There are no background activities associated with a snapshot create operation.

Snapshot List

Users and Applications can list all the Snapshots created for a given bucket.

Reading and Restoring from Snapshots

Users and Applications can start reading and restoring from the snapshot as soon as the snapshot-create operation returns.

Snapshots Diffs

Given any two snapshots on the same bucket, Users and applications can efficiently identify the differences between these two snapshots. Snapshot diff is an efficient operation. The amount of time taken to compute Snap diff is proportional to the actual difference between the two snapshot. For example, the bucket may contain 10 million objects but if there are only 1000 objects that changed between the two snapshots, then the snap diff operation will take time only proportionate to iterating over 1000 objects.

Snapshot Delete

Users and applications can delete a specific snapshot for a given bucket. Snapshots are inherently read-only and it is not possible to delete an object within the Snapshot.

Active Object store in presence of Snapshots

Delete operation on active object store is snapshot aware. Any delete from an active object store would not reclaim the space associated with the object if that object is referenced by any of the snapshots.

Space usage in presence of Snapshots

Snapshots only consume an incremental amount of space depending on the number of keys that they share between active object store and other snapshots in the system. This is explained in the picture below.

What’s next ?

Apache Ozone Snapshot feature is designed to offer a meaningful and easy to use primitive for users, system administrators and applications. The feature hides a lot of complexities from users and applications by doing a bulk of work inside the object store implementation. The very next blog in the series would discuss various use cases that the Snapshot feature enables. In subsequent blogs on Apache Ozone, you can learn more about Apache Ozone Snapshot feature as well how to use this feature effectively.

Previous Blogs in the Series:

Introducing Apache Ozone Snapshots

Object Stores: The Case for Snapshots vs Object Versioning

Next in the Series:

Apache Ozone Snapshots : Addressing different use cases

Apache Ozone: Using the Snapshot feature

--

--