Rook: Kubernetes Storage Adventures

Omer Slik
7 min readDec 13, 2023

--

As organizations increasingly deploy Kubernetes in on-premise clusters, addressing the storage challenges becomes paramount. This Medium Story is tailored for those navigating the complexities of on-premise Kubernetes clusters, providing insights into the transformative capabilities of Rook in meeting the demands of industry trends, such as AI edge workloads.

Start with the basics

What is Rook?

Rook turns distributed storage systems into self-managing, self-scaling, self-healing storage services. It automates the tasks of a storage administrator: deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management.

Rook uses the power of the Kubernetes platform to deliver its services via a Kubernetes Operator for Ceph.

(From Rook Official Website)

Rook

While Rook acts as the orchestrator for distributed storage solutions within Kubernetes, it is tied to the robust capabilities of Ceph, an open-source distributed storage system that underlies Rook’s functionality.

What is Ceph?

Reliable and scalable storage designed for any organization

Use Ceph to transform your storage infrastructure. Ceph provides a unified storage service with object, block, and file interfaces from a single cluster built from commodity hardware components. (From Ceph website)

Moreover, Ceph is an open-source distributed storage system, revolutionizes storage infrastructure by offering unified solutions for object, block, and file storage within a single, cohesive cluster. Its distributed architecture, featuring monitors, OSDs, and MDSs, ensures high availability and fault tolerance. At the core of Ceph lies RADOS, the Reliable Autonomic Distributed Object Store, utilizing the CRUSH algorithm for intelligent data distribution and replication. Ceph excels in providing scalable, fault-tolerant storage, making it a foundational component leveraged by solutions like Rook to streamline storage management in Kubernetes environments.

from https://rook.io/docs/rook/v1.12/Getting-Started/storage-architecture/

Before Rook

Before Rook, our storage landscape was fragmented and posed numerous challenges.

We had a Minio cluster, although crucial for a fraction of our services, proved to be a complex installation, adding a layer of difficulty to our operations. Limited to object storage, it couldn’t fulfill the diverse storage needs of our entire ecosystem.

Meanwhile, other services with persistent storage requirements resorted to writing directly into disks, introducing a host of drawbacks. These services were tied to specific hosts, making mobility a significant hurdle. If a service needed to migrate to a different host, the potential loss of data was large.

Moreover, the lack of Kubernetes-native visibility deprived us of essential insights into the storage consumption of these services, hindering our ability to set meaningful limits. Compounding these issues, the data stored on a single disk lacked the resilience needed for a highly available system.

Our storage infrastructure was not just decentralized; it was plagued by inefficiencies, posing substantial operational and scalability challenges.

Before Rook

Enter Rook

To alleviate these challenges, we discovered Rook. Through its seamless integration with Ceph, Rook not only substitute the installation of Minio but also enhances our capabilities well beyond object storage.

With Rook, we now have a unified storage solution that caters to all our services, eliminating the need for disparate storage setups. Services no longer tethered to specific hosts can now move seamlessly across the cluster without the fear of data loss.

Rook’s integration with Kubernetes provides native visibility into storage consumption, enabling us to set precise storage limits for each service.

Most importantly, Rook ensures data is replicated across the storage cluster, providing the high availability our critical workloads demand. Our storage landscape, once riddled with complexities, has been transformed into a cohesive and resilient environment, thanks to Rook’s intervention.

The following architecture diagram gives a glimpse into how Rook and Ceph plays together to provide an highly available storage system.

After Rook

Implementation

Rook simplifies the implementation journey with two Helm charts:

  1. rook-ceph
  • This official Helm Chart installs the Rook Operator, a key component streamlining the installation and administration of Ceph clusters within Kubernetes environments.
  1. rook-ceph-cluster
  • This Helm Chart focuses on installing and configuring the Custom Resource Definition (CRD) of a single Ceph Cluster. Once the CR for the Ceph cluster is in place, the Rook Operator springs into action. Detecting the new “CephCluster” CR, it orchestrates the creation of the Ceph cluster in alignment with the CR’s configuration.

These Helm charts provide a seamless experience, allowing for efficient management and scalability. Moreover, Rook’s support for a GitOps approach enhances the installation across multiple clusters, bringing a higher level of automation to storage management.

Operational Benefits

Embracing Rook in your Kubernetes cluster brings plenty of advantages, particularly for those navigating the pain of Kubernetes On-Premise environments. While users of managed Kubernetes solutions might find these benefits obvious, the impact on On-Premise setups is truly remarkable.

Seamless Pod Migration between Nodes: Without Rook or any Container Storage Interface (CSI), relocating pods between nodes becomes a cumbersome task, often involving manual data copying and a touch of hope for data consistency. Rook transforms this process by directing all persistent data to Ceph, a distributed storage system responsible for seamless data replication between nodes. Consequently, moving a pod to a different node becomes a breeze, with the CSI ensuring data availability — eliminating concerns and complexities associated with manual data handling.

Granular Visibility into Storage Usage at the Service Level: In the absence of a CSI, our reliance on the “local-path-provisoner” service limited us to basic storage insights at the host level, that is the total storage consumption of the host. Contrastingly, Rook elevates visibility by providing a breakdown of usage percentages for each Persistent Volume Claim (PVC) object. This granular insight not only facilitates easy alerting but also empowers administrators with a precise understanding of storage consumption for every application — a significant enhancement in monitoring and resource allocation.

Automated Management of Physical Disks: Rook’s magic unfolds during the cluster’s inception, courtesy of its special component, the “Rook Discovery Daemon.” This entity detects new disks connected to a node, scrutinizes their cleanliness and availability, and seamlessly prepares them for Rook and Ceph utilization. In the absence of this component, flaky operations were once required to automate this crucial process, making Rook’s presence truly transformative in simplifying the management of physical disks.

Effortless Storage Classes and Dynamic Resizing: Rook not only simplifies the deployment of storage clusters but also takes care of the intricacies associated with Kubernetes storage classes. Setting up three distinct storage classes — Object, Block, and File — is seamlessly orchestrated by Rook, offering users a hassle-free experience. Rook designates block storage as the default storage class, alleviating the need for users to explicitly specify the storage class in their Persistent Volume Claims (PVCs). This thoughtful logistics management enhances user experience, providing a default setup that aligns with the common use case. Furthermore, Rook’s integration with Container Storage Interface (CSI) adds an extra layer of versatility. By incorporating CSI-addons, users gain the ability to resize PVCs on the fly, responding dynamically to evolving storage requirements without disruption. Rook’s commitment to user-centric design and flexibility extends beyond the cluster setup, ensuring a smooth and adaptive storage experience within the Kubernetes environment.

Community Support

Rook stands out in the realm of storage solutions, notably due to its vibrant and responsive developer community. A firsthand experience with the Rook community underscores their unwavering commitment to user satisfaction.

For instance, when we encountered an issue with the Rook Operator failing to create an S3 secret in our app namespace, we promptly opened an issue.

The Rook creators demonstrated remarkable responsiveness by swiftly opening a pull request to address the problem.

Within less than two weeks, the pull request was merged, released, and ready for use. This collaborative and responsive atmosphere not only resolves user challenges effectively but also contributes to the ongoing enhancement of the Rook project.

Final Thoughts

In the ever-evolving landscape of Kubernetes storage, Rook emerges as a transformative force, reshaping the way we perceive and manage storage solutions. From overcoming the intricacies of fragmented storage landscapes to ushering in an era of unified storage management, Rook’s journey is a testament to innovation in Kubernetes environments. As we navigate the complexities of on-premise clusters, Rook not only addresses existing pain points but also introduces new possibilities — seamless migrations, granular visibility, automated disk management, disaster recovery resilience, and the promise of optimized resource utilization. Whether you are embarking on your Kubernetes journey or looking to enhance your existing infrastructure, Rook stands as a beacon of efficiency, resilience, and simplicity. As we embrace the future of storage adventures, Rook remains at the forefront, shaping the narrative of Kubernetes storage with its transformative capabilities.

--

--

Omer Slik

Senior DevOps Engineer At Trigo, company for running autonomous retail stores using AI and computer vision.