Expanding Your Kubernetes Toolbox: The Power of CRDs

CRDs fill the gap when you need a specific feature that Kubernetes lacks. Follow this step-by-step guide to see how straightforward it is to create one.

10 min readSep 21, 2023

I don’t have any other platform to publish my photos. “Tuscany Morning”

Everything as Kubernetes (EaK)

What used to be a "Container Orchestrator" is now a rich — arguably the richest — ecosystem and the true champion in the public cloud arena.

These days, more and more companies are adopting the trend of managing everything as K8S native objects, be it secrets, certificates, monitoring and alerts, databases, networks, CI/CD pipelines, or even Infrastructure as Code (IaC).
The benefits include remaining cloud-agnostic (although now tied to K8S) and having a uniform platform to manage all aspects of the application.

But as rich as it is, there are times when you need to perform specific tasks or add new functionality, and no off-the-shelf solutions are available.

This article is not about the "everything," but rather about how to deal with this frustrating "nothing" with CRDs and Operators. To illustrate, we will walk through a short demo.

Custom Resources (CRDs) and Custom Controllers

A custom resource is an extension of the Kubernetes API that is not necessarily available in a default Kubernetes installation. It represents a customization of a particular Kubernetes installation.
On their own, custom resources let you store and retrieve structured data. When you combine a custom resource with a custom controller, custom resources provide a true declarative API.

I won't go too deep, but I do encourage you to read the official docs.
In general, CRDs are an extension mechanism in Kubernetes that allows you to define new resource types tailored to your application's needs. Once a CRD is created, you can manage it like any other native Kubernetes object using kubectl.

Operators (custom controllers) watch for Custom Resource Definitions (CRDs) changes. Based on their defined behavior, they take appropriate actions to manage the resource lifecycle, ensuring optimal operation.

It's worth noting that all these 'Everything' functionalities we've been discussing are CRDs with custom controllers.
To get a sense of what CRDs are currently deployed in your cluster, you can run the following:

kubectl get crds

Don't Reinvent the Wheel

Before you rush to develop your CRDs, it's important to ask yourself a few key questions:

Have we searched online for a ready-made solution we can use?
Are we doing the right thing? Are CRDs the right answer? If it's truly the best approach, others would have already implemented something similar.
Are we capable of maintaining the code we create?

If the answer to these questions is 'no,' you probably need to stop and rethink your requirements.
However, if the answer to the above is 'yes,' let me show you how easy it is to write your own K8S extension.

WorkSchdule

Update: Just 10 minutes after publishing this article, I came across kube-green on LinkedIn. This tool accomplishes the same goal as my demo CRD, which conflicts with my “Don’t reinvent the wheel” rule. I wasn’t aware of it when writing this piece, and the name “kube-green” wasn’t immediately obvious in its function. Nevertheless, the simplicity and educational value of this demo, as well as its guiding principles, remain intact.

The following GitHub repository contains all the code featured in this demo.

The pain

In our Kubernetes environment, we often encounter the issue of long-living developer deployments. These are crucial for our work but come with two main challenges.
First, constructing a new namespace from scratch wastes valuable developer time. Second, keeping these deployments up and running 24/7 incurs costs that add up quickly. We needed functionality to automatically stop and start workloads in these namespaces based on each developer's working hours, balancing time efficiency and cost-effectiveness.

The solution

The objective is to create an automated system that scales Kubernetes deployments based on pre-defined working hours, optimizing resource usage. Here's how the solution works in more detail:

Create WorkSchedule Object: We introduce a new custom Kubernetes resource named WorkSchedule. This object holds information about the working hours and the time zone.
Use Annotations on Deployment: Link the WorkSchedule to a deployment via annotations ( workschedule.example.com/policy ).
Auto-Scale Down: An operator monitors all deployments with an WorkSchedule annotation. The Operator scales the deployment to zero replicas if the current time is outside the defined working hours.

Prerequisites

Before diving into the demo, make sure you meet the following prerequisites:

Python: Ensure Python is installed on your machine.
Test Kubernetes Environment: You will need a Kubernetes environment for testing. This can be set up using Docker Desktop, Rancher, k3s, EKS, or any other Kubernetes distribution of your choice.
Basic Kubernetes Knowledge: A fundamental understanding of Kubernetes is required to follow along with this demo.

Creating the CRD Definition

We'll create the CRD for our WorkSchedule custom resource in this initial step. This will define the WorkSchedule object that we will use to specify working hours for automating the management of workloads in developer namespaces.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition

metadata:
  name: workschedules.example.com

spec:
  group: example.com
  scope: Cluster
  names:
    plural: workschedules
    singular: workschedule
    kind: WorkSchedule
    shortNames:
    - ws  

  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required:
              - startTime
              - endTime
              properties:
                startTime:
                  type: string
                  pattern: '^([0-1]?[0-9]|2[0-3]):[0-5][0-9]$'
                  description: "Start time for the pod to be active, in HH:MM format."
                endTime:
                  type: string
                  pattern: '^([0-1]?[0-9]|2[0-3]):[0-5][0-9]$'
                  description: "End time for the pod to be active, in HH:MM format."
                timeZone:
                  type: string
                  description: "Time zone for the start and end times, in IANA Time Zone Database format. Defaults to UTC."
                  default: "Etc/UTC"
                  pattern: '^[A-Za-z]+/[A-Za-z_]+$'
      additionalPrinterColumns:
        - name: "Start-Time"
          type: "string"
          jsonPath: ".spec.startTime"
        - name: "End-Time"
          type: "string"
          jsonPath: ".spec.endTime"
        - name: "Time-Zone"
          type: "string"
          jsonPath: ".spec.timeZone"

Spec: Within the spec, we set the group (example.com) and scope (Cluster). The scope: Cluster means this CRD will be cluster-scoped, i.e., applicable across all namespaces in the cluster.
Names: We define how our custom resource will be referred to. We provide a plural and singular name, as well as the kind (WorkSchedule). We also define a short name (ws) for easier reference.
Versions: We specify the versions of the custom resource that can be used. Here, it's v1.
Schema: The schema defines the structure of the WorkSchedule resource. It uses OpenAPI v3 schema to enforce data integrity. For example, startTime and endTime are mandatory fields and must comply with a specific time format (HH:MM). The timeZone field is optional, defaulting to UTC.
additionalPrinterColumns: This part allows us to customize the output when running kubectl get ws. We show columns for Start-Time, End-Time, and Time-Zone based on the JSON path of the attributes in the WorkSchedule object.

To create the CRD in your cluster, run the following command:

kubectl apply -f https://raw.githubusercontent.com/eliran89c/k8s-work-schedules-demo/main/kubernetes/crd.yaml

You can now create your first WorkSchedule policy:

apiVersion: example.com/v1
kind: WorkSchedule
metadata:
  name: ny-work-hours
spec:
  startTime: "09:00"
  endTime: "17:00"
  timeZone: America/New_York

To apply, run the following kubectl command:

kubectl apply -f https://raw.githubusercontent.com/eliran89c/k8s-work-schedules-demo/main/examples/ny-working-hours.yaml

After this, you should be able to see your WorkSchedule by running:

kubectl get ws

The output should look something like this:

NAME           START-TIME   END-TIME   TIME-ZONE
ny-work-hours  09:00        17:00      America/New_York

Now we have custom resources in place, defined by the WorkSchedule object. However, these custom resources alone won't do anything yet. To manage the workloads based on this schedule, we need to move on to the second part of our demo: creating the Operator.

The Operator

While the first language that often comes to mind for writing Kubernetes Operators is Go, I wanted to showcase that it's possible to write an Operator in other languages. Specifically, I chose Python for this demo, as DevOps engineers widely adopt it.

We'll use the KOPF framework for this demo, the leading Python framework for building Kubernetes Operators. KOPF handles much of the heavy lifting, allowing us to focus solely on the core logic of our Operator.

Below is a focused explanation of the relevant parts of the Python-based Kubernetes Operator code using the KOPF framework.

@kopf.timer('deployments',
            interval=60, # 1 minute
            annotations={'workschedule.example.com/policy': kopf.PRESENT},
            )
async def deployment_timer_handler(meta: Dict, spec: Dict, **kwargs):

This snippet showcases a KOPF timer decorator applied to the function deployment_timer_handler. This decorator performs several roles:

Resource Filtering: The decorator actively listens to all Kubernetes deployments. It filters these deployments based on annotations, specifically those that contain the annotation defined by POLICY_NAME_ANNOTATION_KEY (workschedule.example.com/policy)
Interval-Based Execution: The function is set to trigger at regular intervals, every 60 seconds in this example, as specified by the interval=60 parameter.
Dynamic Parameters: When the function is triggered, KOPF automatically passes relevant details of the filtered deployments, such as metadata (meta) and specifications (spec), as arguments to deployment_timer_handler.
Extensibility: While this example uses a timer-based event, KOPF also provides other triggering options like on.create and on.delete, enabling you to act on resources based on different lifecycle events.

This is the magic of KOPF: the decorators seamlessly handle many complexities, eliminating the need for you to code these functions from scratch.

# Get the WorkSchedule name from the deployment annotations
ws_policy = meta['annotations'][POLICY_NAME_ANNOTATION_KEY]

Here, the Work Schedule policy name is extracted from the deployment annotations. This allows the code to know which WorkSchedule object it needs to follow.

try:
        ws = custom_api.get_cluster_custom_object(
            group=API_GROUP,
            version=API_VERSION,
            plural='workschedules',
            name=ws_policy
        )
        return ws
    except client.exceptions.ApiException as e:
        if e.status == 404:
            logger.warning(f"WorkSchedule {ws_policy} not found")
        else:
            logger.error(f"Failed to get WorkSchedule {ws_policy}: {e}")
        return None
    except Exception as e:
        logger.error(f"An unexpected error occurred: {e}")
        return None

The get_work_schedule function retrieves the WorkSchedule custom resource based on the name specified in the deployment annotation.

Now, we can compare the working-hour policy with the current time

# Check if the current time is within the working hours
if current_time_obj < start_time_obj or current_time_obj > end_time_obj:
    go_to_sleep(meta, current_replicas, logger)
else:
    wake_up(meta, current_replicas, logger)

Based on the time specified in the WorkSchedule, the Operator will either put the deployment to sleep by scaling it down or wake it up by scaling it back up.

def patch_deployment(name: str, namespace: str, body: Dict, logger):
    """
    Patches the Kubernetes deployment with a given body.
    """
    try:
        v1.patch_namespaced_deployment(
            name=name,
            namespace=namespace,
            body=body
        )
        logger.debug(
            f"Successfully patched deployment {name} in namespace {namespace}")
    except client.exceptions.ApiException as e:
        logger.error(
            f"Failed to patch deployment {name} in namespace {namespace}: {e}")
    except Exception as e:
        logger.error(
            f"An unexpected error occurred while patching deployment: {e}")

The operator scales up or down the deployment using the Kubernetes SDK patch_namespaces_deployment method.

Deploying the Operator

Before deploying the Operator to the cluster, two primary components require our attention:

A Deployment file that will run the Operator as a Pod inside the cluster.
Necessary RBAC permissions for the Pod so it can watch and patch deployments and interact with the Custom Resource Definitions (CRDs).

Here is the snippet that handles permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: workschedule-operator-clusterrole
rules:
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["patch", "watch", "list", "get"]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create"]
- apiGroups: ["example.com"]
  resources: ["workschedules"]
  verbs: ["get"]
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: workschedule-operator-clusterrolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: workschedule-operator-clusterrole
subjects:
- kind: ServiceAccount
  name: workschedule-operator
  namespace: workschedule-operator  # Change this if your ServiceAccount is in another namespace

The WorkSchedule Operator is granted permission to watch and patch deployments for scaling and adding annotations, create events to log its activities via KOPF, and interact with CustomResourceDefinitions to read and manage WorkSchedules.

You can deploy the Operator by running the following command:

kubectl apply -f https://raw.githubusercontent.com/eliran89c/k8s-work-schedules-demo/main/kubernetes/deployment.yaml

Test the Operator

Now that we have the Operator up and running let's put it to the test. We'll use a dummy deployment linked with the Custom Resource Definition (CRD) for the working schedule we created earlier.

Here is an example of a dummy deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  annotations:
    workschedule.example.com/policy: ny-work-hours
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        resources:
          limits:
            memory: "128Mi"
            cpu: "100m"
        image: nginx:1.21
        ports:
        - containerPort: 80

Apply this deployment with the command:

kubectl apply -f https://raw.githubusercontent.com/eliran89c/k8s-work-schedules-demo/main/examples/dummy-service.yaml

To see what the Operator is doing, watch its logs:

kubectl logs -f <operator_pod_name>

Keep an eye on the log output; it should indicate changes to the deployment's replica count based on the working schedule defined in the ny-work-hours policy.

To further test the Operator's behavior, try modifying the working hours in real-time:

kubectl edit workschedules ny-work-hours

Change the startTime and endTime under the spec section to different values. Save and exit the editor.

With the modified working hours, observe how the Operator acts both within and outside of working hours. You should see the number of replicas scale down to zero outside working hours and scale back up during working hours.

Improvements

While the Operator does perform its intended function, it's important to note that this is a very basic example and is not intended for production usage.

To make this Operator production-ready, several considerations must be addressed:

Cover Edge Cases: The current implementation doesn't handle scenarios such as daylight saving time, leap years, or variations in time zones.
Write Tests: Comprehensive unit and integration tests are essential to ensure the Operator behaves as expected under different conditions.
Enhanced Error Handling and Logging: The logging and error handling mechanisms could be more robust.
Additional Functionality: There's room for adding more features, such as recognizing working days and holidays or even integrating with a calendar service for dynamically fetching working schedules.

Cleanup

After you’ve finished running through this demo and have verified the functionality of the Operator and the custom WorkSchedules, it’s important to clean up the resources to avoid any unexpected behavior in your cluster.

Delete the Dummy Service

kubectl delete -f https://raw.githubusercontent.com/eliran89c/k8s-work-schedules-demo/main/examples/dummy-service.yaml

2. Delete the Operator

kubectl delete -f https://raw.githubusercontent.com/eliran89c/k8s-work-schedules-demo/main/kubernetes/deployment.yaml

3. Delete the WorkSchedule Resource

kubectl delete workschedules ny-work-hours

4. Delete the CustomResourceDefinition (CRD)

kubectl delete crd workschedules.example.com

Conclusion

Kubernetes' strength lies in its extensibility through CRDs and custom controllers. However, custom solutions should be a well-considered choice. Always look for existing solutions first and weigh the long-term maintenance cost. If you've done your homework and still see a gap, I hope this article has provided a solid starting point for extending Kubernetes to meet your needs.