Secure Multi-Tenancy in Apache Ozone

Prashant Pogde
8 min readAug 21, 2023

--

What is Apache Ozone?

Apache Ozone is a highly scalable, highly available, distributed and secure object store that can handle billions of keys. It is a fully n-way replicated and strongly consistent system that offers both Object-Storage as well File-System semantics. Apache Ozone doesn’t have any single point of failure either for the metadata or the data. It is compatible with Amazon S3 APIs as well as Hadoop Compatible FileSystem (HCFS) interface. It integrates seamlessly with YARN, Hive, Impala, Spark and more out of the box and is a preferred choice for storage on-prem at large enterprises.

Apache Ozone is architected to scale horizontally as well as vertically, which makes it well-suited for large and growing datasets. It can accommodate dense nodes with 500+TB of storage attached to a single datanode. This allows for cost optimization on top of query optimization.

Being built for modern large enterprises that put an emphasis on data isolation and data security, Apache Ozone has security and isolation features baked into its core architecture. It offers data encryption at rest with unique encryption keys per object as well as strong access controls. This helps to ensure that sensitive data is protected from unauthorized access at all times. Ozone multi-tenancy feature highlights the data isolation mechanism that Ozone offers on top of encryption and access control.

What is Multi-Tenancy?

Let’s take an example of a large enterprise. It has multiple sub-organizations e.g. Engineering, Product management, IT, Support, Sales, Finance, etc. They all have different and growing storage needs over time. There could be too much overhead to provision and manage separate storage clusters for each of these sub organizations. Further, within each sub-organization, there could be multiple sub-sub-organizations e.g. Engineering can be further divided into various product groups. One of the key constraints in such an organization is also to keep tight isolation between data that belongs to various sub organizations.

Now instead of provisioning several storage clusters, Apache Ozone can accommodate the scale and capacity requirements of growing organizations with a single cluster. It offers strict data isolation between various sub groups within the organization using the Multi-Tenancy feature. Each of these sub organizations can be represented by a unique tenant in this cluster. Each tenant in the shared Ozone cluster gets a unique namespace where they can create their own buckets. It’s possible to create a bucket with the same name across two different tenants because they are completely isolated namespaces. Ozone makes it possible with the abstraction of volumes as namespaces. Similarly, access to each namespace is tightly restricted by default. Only the users that belong to a tenant, can access the bucket namespace a.k.a. volume of that tenant.

Before we introduced this feature in Cloudera Data Platform (CDP) 7.1.8, Ozone volumes, other than the special s3v volume, were not accessible with AWS S3 APIs. The reason being, S3 APIs do not have the concept of bucket-namespace. All customers on AWS S3 share the same namespace. As you can guess, such a limitation on the offered abstraction has created its own set of unique challenges on AWS S3 platform. With the introduction of Apache Ozone Multi-Tenancy feature, each tenant gets their own volume or bucket namespace. Each user on the system can be assigned to one or more tenants. The following sequence diagram illustrates the end to end flow:

It is possible for a Kerberos principal on an Ozone cluster to access multiple tenants. However for every tenant that a principal is attached to, the user gets a unique S3 credential pair i.e. Access Key ID and Secret Access Key pair. An S3 credential is always tied to a bucket namespace aka. Ozone volume. An S3 API access request always gets routed and jailed into the corresponding Ozone volume based on the S3 credential the request provides. All bucket accesses in the S3 API request are relative to that specific Ozone volume only.

Ozone Multi-tenancy can be used over Ozone shell and HCFS (Ozone FS, ofs) APIs. However, the authentication for these APIs has to be done through Kerberos. If a Kerberos principal is tied to more than one tenant, the access for that Kerberos principal over these APIs will, by default, be limited to only the volumes that the Kerberos user is assigned to.

Ozone Multi-tenancy requires that we use Apache Ranger as the authorization engine. Required access policies are automatically set up in Apache Ranger by Ozone Manager during tenant operations. The diagram below gives an overview of the Ozone multi-tenant interactions in a cluster:

  • Ozone cluster administrators. They have unrestricted access to the entire Ozone cluster. They can create new tenants as well as assign and revoke users in any tenant.
  • Ozone tenant administrators. They have unrestricted access within their tenancy and the associated bucket namespace (Ozone volume). They can also assign and revoke users in tenants where they are an admin.
  • Ozone tenant users. These are Kerberos principals that are assigned as users in specific tenants. They can only access volumes that are associated with tenants they are assigned to.

Using Ozone Multi-Tenancy in Practice

Ozone Multi-Tenancy is available starting in Cloudera Runtime 7.1.8.

Prerequisites

  • Kerberized cluster running Ozone parcels, or Cloudera Runtime 7.1.8.0+.
  • Ozone is set up to use Ranger as the ACL provider.

On a Kerberized cluster running Cloudera Runtime 7.1.8 or later, an administrator can enable Ozone multi-tenancy feature by ticking “Enable Ozone S3 Multi-Tenancy feature (ozone.om.multitenancy.enabled)” in Ozone service configuration page in Cloudera Manager.

To start, authenticate as an Ozone administrator with kinit. Assuming MIT Kerberos 5 client config is correctly set up in /etc/krb5.conf, to authenticate with a keytab:

~]# kinit -kt /path/to/ozoneadmin.keytab ozoneadmin@EXAMPLE.COM

Or to authenticate with a password, without keytabs:

~]# kinit ozoneadmin@EXAMPLE.COM
~]# klist
Default principal: ozoneadmin@EXAMPLE.COM

Now you are ready to create your first tenant, run the following command:

~]# ozone tenant — verbose create tenantone — om-service-id=ozone1

You should expect a similar output to this:

23/05/30 22:05:53 INFO rpc.RpcClient: Creating Tenant: ‘tenantone’, with new volume: ‘tenantone’23/05/30 22:05:53 INFO rpc.RpcClient: Creating Tenant: 'tenantone', with new volume: 'tenantone'
{
"tenantId": "tenantone"
}

Note here we assume Ozone Manager HA is enabled. Otherwise, — om-service-id=ozone1 should be removed from all tenant command examples listed in this post. And here in this post we assume the cluster’s Ozone service ID is ozone1.

To list the tenant just created, along with others that are already there (if any):

~]# ozone tenant list — om-service-id=ozone1

Output:

tenantone

Or if you would like more details or a machine-readable format:

~]# ozone tenant list — json — om-service-id=ozone1

Output:

[
{
“tenantId”: “tenantone”,
“bucketNamespaceName”: “tenantone”,
“userRoleName”: “tenantone-UserRole”,
“adminRoleName”: “tenantone-AdminRole”,
“bucketNamespacePolicyName”: “tenantone-VolumeAccess”,
“bucketPolicyName”: “tenantone-BucketAccess”
}
]

To assign a user to the tenant, run:

~]# ozone tenant — verbose user assign testuser — tenant=tenantone — om-service-id=ozone1

If the command returns TENANT_AUTHORIZER_ERROR with an error message of description “user with name: testuser does not exist”, check Ranger Admin Web UI to make sure the user testuser exists in Ranger. Typically, Ranger Usersync service should take care of syncing the user list from AD or LDAP when using either.

Upon successful user assignment, Access Key ID and Secret Access Key pair would be emitted. Example output:

~]# export AWS_ACCESS_KEY_ID=’tenantone$testuser’
~]# export AWS_SECRET_ACCESS_KEY=’<SECRET>’
Assigned 'systest' to 'testuser' with accessId 'tenantone$testuser'.

Note the dollar sign ‘$’ in the Access Key ID. A pair of single quotation marks is crucial in order to prevent bash parameter expansion.

With that, the tenant user would be able to access the tenant volume with S3. Here is a short list of awscli command you could try out at this point after the credentials above are imported:

~]# alias aws-ozone=”aws s3api — endpoint-url <S3G>”
~]# aws-ozone create-bucket — bucket bucket1
~]# aws-ozone list-buckets
~]# aws-ozone put-object — bucket bucket1 — key key1 — body /local/file1
~]# aws-ozone list-objects — bucket bucket1
~]# aws-ozone get-object — bucket bucket1 — key key1 /tmp/file1-get

Confirm tenant user list with:

~]# ozone tenant user list — json tenantone — om-service-id=ozone1

Output:

[
{
“user”: “testuser”,
“accessId”: “tenantone$testuser”
}
]

Or in the other direction, to check which tenants the user has already been assigned to, and the user’s admin status in each tenant:

~] ozone tenant user info — json testuser — om-service-id=ozone1

Output:

{
“user”: “testuser”,
“tenants”:
[
{
“accessId”: “tenantone$testuser”,
“tenantId”: “tenantone”,
“isAdmin”: false,
“isDelegatedAdmin”: false
}
]
}

To assign a new admin for a tenant, that tenant’s admin or Ozone admin may use the following command:

~]# ozone tenant user assignadmin ‘tenantone$testuser’ — tenant=tenantone — om-service-id=ozone1

Similarly, to revoke admin permission for a tenant user:

~]# ozone tenant user revokeadmin ‘tenantone$testuser’ — om-service-id=ozone1

For more information on usages and outputs of various tenant command parameters and more advanced usage like bucket links, check out Ozone documentation and CDP Private Cloud documentation.

Glossary

Active Directory (AD)

Active Directory (AD) is a directory service developed by Microsoft for Windows domain networks. Windows Server operating systems include it as a set of processes and services. Originally, only centralized domain management used Active Directory. However, it ultimately became an umbrella title for various directory-based identity-related services.

Amazon S3

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.

Apache Ozone

Apache Ozone is a highly scalable, distributed storage for Analytics, Big data and Cloud Native applications. Ozone supports S3 compatible object APIs as well as a Hadoop Compatible File System implementation. It is optimized for both efficient object store and file system operations.

Cloudera Data Platform (CDP)

Cloudera Data Platform (CDP) is a hybrid data platform designed for unmatched freedom to choose — any cloud, any analytics, any data.

High availability (HA)

High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

Kerberos

Kerberos is a computer-network authentication protocol that works on the basis of tickets to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner.

LDAP

The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network.

Multi-Tenancy / Multitenancy

Software multitenancy is a software architecture in which a single instance of software runs on a server and serves multiple tenants.

Ozone S3 Gateway (S3G)

Ozone S3 Gateway is a component in Ozone which exposes S3 compatible APIs to be consumed by Amazon S3 clients. One or more S3 Gateways can be started in addition to the regular Ozone components.

--

--