Using Rancher Fleet to deploy an IoT Platform and edge modules

Published in

ELCA IT

8 min readFeb 13, 2023

Authors: Kevin Fairise, Florent Martin

In a previous article, we introduced the ELCA open-source MVP IoT Health Platform, moving toward a robust and industrialized platform, we have decided to tackle continuous integration and deployment.

The objective is to provide an end-to-end solution, i.e. supporting deployment for both server and edge (Gateway) components. In this article, we will describe the setup and usage of Fleet in the context of our IoT platform. We will conclude with the limitations we faced.

For modern platforms, continuous integration and deployment (aka. CI/CD) is a key feature to accelerate time to market, ensure resiliency and ease the developer’s life. The most simple solution would be to create a CI/CD pipeline with tools like Jenkins or GitHub Actions, this pipeline would execute Helm commands to deploy the charts on our Kubernetes clusters. Even if this solution would work for some use cases, it won’t fit requirements such as:

Performance:
- Support for potentially a lot of clusters
- Lightweight footprint to run on constrained gateways/devices at the edge
- Fast and flexible deployment of edge processing modules at the edge (gateway), depending on the connected devices and business logic
Security:
- No inbound connections are authorized on the gateway for code deployment.

With these requirements (and a few more) in mind, we evaluated some GitOps solutions and found out that Rancher Fleet was perfectly matching all our criteria.

the next section, we will introduce Fleet and depict how we use it to deploy our IoT Platform and edge modules.

Fleet Introduction

Fleet is a GitOps solution developed by Rancher, it is specially designed for large scale and thus can support up to a million clusters, which is more than enough for our use case. Like most GitOps solutions, Fleet is composed of a controller deployed on the server and an agent deployed on each managed cluster. The controller is the centralized component that orchestrates the deployments of Kubernetes assets from git. The agent is running in the downstream cluster and communicates back to the Fleet controller. It is responsible for the deployment of Kubernetes assets in its cluster. It is very interesting in our context because it allows Fleet to work in pull mode, i.e the agent is listening for changes and pulls the state when needed. Thus it avoids an inbound connection to the cluster where Fleet is installed, which was an important security aspect for our platform.

Fleet Rancher — source *https://fleet.rancher.io/*

Fleet is responsible for the synchronization between the cluster state ( all the deployed resources ) and the bundles defined by the user in the Git repository, if a resource is modified in the cluster Fleet will automatically synchronize it with the state described in the Git repository.

Fleet Setup

We describe here the installation of Fleet for one server and one gateway. It is done in two steps:

Controller installation on the server
Agent installation on the gateways

Both steps require Helm installed on your computer as well as access to the Kubernetes clusters through kubectl.

Controller Setup

To run, Fleet relies on a set of resources/objects, i.e. GitRepo, cluster. These objects are custom resources and their definition must be installed using Custom Resources Definitions:

helm -n fleet-system upgrade --install --create-namespace --wait fleet-crd https://github.com/rancher/fleet/releases/download/v0.3.8/fleet-crd-0.3.8.tgz

Once the definition is imported, the controller can be installed:

API_SERVER_CA=$(kubectl config view -o json --raw  | jq -r '.clusters[0].cluster["certificate-authority-data"]' | base64 -d)
API_SERVER_URL=$(kubectl config view -o json --raw  | jq -r '.clusters[0].cluster.server')
 
helm -n fleet-system upgrade --install --create-namespace --wait \
    --set apiServerURL="${API_SERVER_URL}" \
    --set apiServerCA="${API_SERVER_CA}" \
    fleet https://github.com/rancher/fleet/releases/download/v0.3.8/fleet-0.3.8.tgz

/!\ The 2 first lines will work only if jq is installed on your computer and if the cluster you want to deploy Fleet on is the first cluster in your kubeconfig file. Otherwise you need to provide the correct server CA and server URL by yourself when installing your controller.

Installing the controller will also automatically install the agent on your server which will be useful since we also want to use Fleet to manage resources on the server cluster.

Once the controller is installed we need to register our Git Repository with GitRepo Custom Resource, creating the yaml file below:

kind: GitRepo
apiVersion: fleet.cattle.io/v1alpha1
metadata:
   name: fleet-gateway
   namespace: fleet-default
spec:
   repo: ssh://<ssh-key-id>@git-codecommit.eu-west-3.amazonaws.com/v1/repos/<repo>
   branch: feat/orion
   clientSecretName: my-ssh-key
paths:
   - modules
   - base
targets:
   - name: gateway
   clusterSelector:
      matchLabels:
         type: gateway

The example above shows the configuration required when using AWS CodeCommit to store the state of your cluster. We notice that with this configuration, bundles from this repository will be deployed only on clusters tagged with label type: gateway i.e not on our server.

Finally, we need to create a secret holding a private ssh key to allow Fleet to authenticate to your repository.

kubectl create secret generic my-ssh-key -n fleet-default --from-file=ssh-privatekey=/home/kevin/.ssh/id_rsa_fleet_agent  --type=kubernetes.io/ssh-auth

Agent Setup

First, we need to create a ClusterRegistration Token on the cluster where the Fleet controller is running. The ClusterRegistration Token is required for the agent to authenticate to the controller.

cat <<EOF | kubectl apply -f -
       kind: ClusterRegistrationToken
       apiVersion: "fleet.cattle.io/v1alpha1"
       metadata:
           name: "token-gateway-<gateway name>"
           namespace: fleet-default
       spec:
           ttl: 1h
       EOF

Then export the secret to a values.yaml file.

kubectl -n fleet-default get secret token-gateway-{{ inventory_hostname_short }} -o 'jsonpath={.data.values}' | base64 --decode > values.yaml

The values.yaml file is used by helm to configure the releases, based on labels. The labels are key to managing deployment on the right target type, e.g. only specific gateway type. The values.yaml file can be extended with custom labels.

'label': {
            'name': <your gateway name>,
            'type': 'gateway',
            'host-type': 'raspberry-pi'
            <add all the tags you want>
}

Once the configuration is fine on server side. We need to switch kubectl to execute commands on the gateway instead of the server. This is done by changing the “current-context” variable on the kubeconfig file.

Finally, the command below will install the agent on the gateway.

helm upgrade --install --create-namespace true -n fleet-system -f values.yaml https://github.com/rancher/fleet/releases/download/v0.3.8/fleet-agent-0.3.8.tgz

Fleet Usage

Here comes the most interesting part, how do we use Fleet to deploy our platform?

First, to deploy resources on our Kubernetes cluster we need to create bundles. A bundle is defined as a fleet.yaml file, for example:

defaultNamespace: auth
helm:
   releaseName: keycloak
   chart: ./keycloak
   values:
      keycloakEndpoint: keycloak.iot-health.aws.elca-cloud.com
      keycloak:
         service:
            type: ClusterIP
      resources:
         limits:
            cpu: "100m"
            memory: "128Mi"
         requests:
            cpu: "100m"
            memory: "128Mi"

More information on fleet.yaml file can be found here

We use two Git repository to deploy all our infrastructure:

fleet-server: Hosting all the bundles deployed on the server. This repository is mainly used to manage the configuration of deployed Helm charts, all the bundles defined in the repository are deployed
fleet-gateway: Hosting all the modules that can be deployed on gateways. It is the most interesting repository. We will focus on this in the following

When deploying multiple gateways to connect our IoT devices, we need to ensure that each gateway will have only the required modules to communicate with the devices. To solve this problem we created a repository with two bundles, one bundle base which contains all the modules that must be deployed on every gateway, no matter what devices will be connected to it. It includes a monitoring module, MQTT broker module and some others. The other bundle called module contains all the available modules. Each module is a single helm template yaml file containing all the needed Kubernetes resources ( deployment, service, PVC, … ), all the content of the yaml file must be put inside a helm if the condition:

{{ if .Values.moduleName.enabled }}
< module resources >
{{ end }}

With this syntax, we can easily decide whether to enable or disable each module. To achieve this we will use the targetCustomizations option on Fleet. In the example below we use this to deploy the http2mqtt module on gateways running on Raspberry PI and to deploy the mqttTester module on the gateways running inside a virtual machine.

defaultNamespace: default #
helm:
   chart: ./chart
   values:
      rootDomain: "iot-health.aws.elca-cloud.com"
      registry:
         url: <AWS account id>.dkr.ecr.eu-west-1.amazonaws.com
      gatewayName: global.fleet.clusterLabels.nametargetCustomizations:
   - name: raspberry-pi-gateway
     helm:
       values:
         http2mqtt:
           enabled: true
     clusterSelector:
        matchLabels:
          host-type: raspberry-pi
   - name: virtual-machine-gateway
     helm:
       values:
         mqttTester:
           enabled: true
     clusterSelector:
       matchLabels:
         host-type: virtual-machine

Fleet is able to know if each gateway is running inside Virtual Machine or on Raspberry PI thanks to the tags or labels applied to each gateway. Those tags must be added when registering the gateway as explained in the previous part. In the context of our IoT Platform, we use an Ansible script to setup the gateway and register the gateway in Fleet. Those tags are automatically added by ansible in the values.yaml. A user only needs to define them once in the inventory file.

all:
  children:
    gateways:
      children:
        virtual-machine:
          hosts:
            gateway1:
              ansible_host: 192.168.56.163
              ansible_user: user
            gateway2:
              ansible_host: 192.168.56.162
              ansible_user: user
          vars:
            tags:
              host-type: virtual-machine
              test: test2
        raspberry-pi:
          hosts:
            gateway3:
              ansible_host: 192.168.137.71
              ansible_user: pi
          vars:
            tags:
              host-type: raspberry-pi

When running the Ansible script to provision gateway, the tags dictionary is added to the labels of the cluster and can be used in our target customization. For further customization, we can add a different tag to each entry in the inventory. For our use case, we just needed to have different tags on a virtual machine and on raspberry pi.

Fleet Limitation

After a few months of intensive Fleet usage, westarted to reach some limits of the tool. The first thing we struggled with is the bundle diffs.

When a bundle is deployed it can be in 3 different states:

Ready: when the resources in your cluster match the state described in your git repository
Not Ready: when modifications are currently being applied to your cluster or when there is an error in your bundle resources
Modified: when your cluster state is different from the state described in your Git repository

If some resources are modified at runtime, it is detected by Fleet and the related bundle appears as Modified on the server side. It can be a problem as the deployment of other linked bundles is blocked. In fact, a bundle is deployed only if all of its dependencies are in a Ready state.

The solution to avoid unnecessary bundle diffs and unwanted Modified status is to filter resource attributes. To proceed, we need to add a diff.comparePatches option to the fleet.yaml file and define filtering rules with JSON pointer patch syntax. This can lead to complex configurations and it is very verbose. Moreover in the version, we are currently using (v0.3.8) Fleet only supports one filtering rule per resource. Fortunately, this problem is solved in the last release ( v0.3.9 ), but it introduces another bug breaking CodeCommit ssh authentication, as mentioned in this issue. It must be a question of time before this feature works properly!

Conclusion

In conclusion, we would say that fleet is a really powerful tool, it is the most suitable tool for our IoT Platform and let the users have a lot of flexibility to manage a large number of gateways. For now, Fleet does the job and covers our requirements. But some very interesting features are still on the roadmap and we have a good hope that we will be able to integrate them into the platform to improve user experience, like the Fleet image scanner that detects automatically new version of images and update deployment.