How to test your ArgoCD apps with Kind and k8s e2e-framework (and why you probably don’t want to do that anyway)

--

Photo by JJ Ying on Unsplash

Testing in Kubernetes can be a very nuanced topic, but I’ll try to be broad without losing precision:

First, let me introduce this post

Initially, I just wanted to do a very simple post explaining how to use k8s e2e-framework to test your ArgoCD app but while making this post I realized that this might not be a desirable testing scenario for many.

This made me pivot the purpose of this post into a conversation about the tradeoffs of speed vs more integration with the added hope of introducing more people to the powerful k8s e2e-framework.

Let’s go over some glossary first

KinD: is a tool for running Kubernetes clusters in Docker.

k8s e2e-framework: is a testing framework by Kubernetes that allows you to run tests against a Kubernetes cluster that’s running on your machine on top of KinD.

What will we learn here?

How to load an ArgoCD app into a KinD cluster using k8s e2e-framework, then test that the configuration for this app can become healthy… And why that may be too much testing.

The setup

Psst! You can check the whole code for this test here:

For this lab, I wrote a test that loads up an ArgoCD Application, applies that application into a kind cluster and then waits for the ArgoCD app to sync.

e2e-framework tests offer this standardized way to configure tests in which you divide your tests into features. Each feature has a setup, multiple assessments that you can perform and a teardown. This allows for compartmentalization of the tests into smaller digestible units.

For this test, we will be using a single feature.

In the setup section of the code, we do the following things:

  1. We initialize the e2e-framework provided Helm management tool. This allows us to perform calls to helm from the shell.
  2. We initialize a helper management tool that I created to manage Argo applications.
  3. We install the ArgoCD in the already-running KinD cluster.
  4. We apply the ArgoCD app that will install an Nginx server chart to the cluster.
Setup(func(ctx context.Context, t *testing.T, config *envconf.Config) context.Context {
err := argo.AddResourcesToScheme(config)
require.NoError(t, err)

helmMgr := helm.New(config.KubeconfigFile())
argoClient := argo.NewResourceManager(config)

err = helmMgr.RunRepo(helm.WithArgs(
"add",
"argo",
"https://argoproj.github.io/argo-helm",
))

err = helmMgr.RunInstall(
helm.WithName("argo-cd"),
helm.WithNamespace(argocdNamespace),
helm.WithReleaseName("argo/argo-cd"),
helm.WithVersion("5.34.1"),
)
require.NoError(t, err)

argoNginxAppSpec, err := os.ReadFile(filepath.Join(currentDir, "..", "argo-apps", "nginx", "app.yaml"))
require.NoError(t, err)

nginxApp, err := objectfromfile.GetArgoApplicationFromYAML(argoNginxAppSpec)
require.NoError(t, err)

err = argoClient.CreateApplicationWithContext(ctx, nginxApp)
require.NoError(t, err)

return ctx
})

This setup prepares the stage for our assessment.

Assess(
"Testing the app syncs",
func(ctx context.Context, t *testing.T, config *envconf.Config) context.Context {
app := &applicationV1Alpha1.Application{ObjectMeta: metav1.ObjectMeta{
Name: "nginx-server",
Namespace: argocdNamespace,
}}

var isAppHealthyAndSynced = func(object k8s.Object) bool {
argoApp := object.(*applicationV1Alpha1.Application)

return string(argoApp.Status.Health.Status) == "Healthy" &&
string(argoApp.Status.Sync.Status) == "Synced"
}

err := wait.For(
conditions.New(config.Client().Resources()).ResourceMatch(app, isAppHealthyAndSynced),
wait.WithTimeout(time.Minute*5),
)
assert.NoError(t, err, "Error waiting for ArgoCD app to sync")

return ctx
}).

This assessment takes advantage of the wait.For utility provided by k8s-e2e. This utility allows us to periodically check for a Kubernetes resource until a particular condition is met.

In this case, we want to validate that the ArgoCD app is both Healthy and Synced. The wait.For utility will call the function isAppHealthyAndSynced until it returns true or the time out is exceeded.

Running these tests quickly gives a PASS result.

=== RUN   TestNginxAppWithArgo
--- PASS: TestNginxAppWithArgo (214.63s)
=== RUN TestNginxAppWithArgo/Nginx_server
--- PASS: TestNginxAppWithArgo/Nginx_server (214.63s)
=== RUN TestNginxAppWithArgo/Nginx_server/Testing_the_app_syncs
--- PASS: TestNginxAppWithArgo/Nginx_server/Testing_the_app_syncs (210.02s)
PASS

Process finished with the exit code 0

So what’s the problem with this setup?

By it’s own, nothing. It checks that the app can sync and that’s its intended behavior, but this comes with a big tradeoff: speed.

This test is trying to test the integration with ArgoCD in the cluster and while it’s true that having more complete tests is always desirable, there’s a point in which that can become redundant or wasteful.

If we think about it, ArgoCD is a dependency, as a developer that’s using ArgoCD you have very little say on how ArgoCD grows and changes.

And of course, you should always voice your concerns and offer help through issues and PRs but I digress.

If we don’t write the code for ArgoCD why should we test that it integrates properly with Kubernetes?

The most likely scenario is that you’re using a pinned version of ArgoCD that you know already how it behaves and that has been validated by the maintainers and the community as a working version. Therefore, removing the need to test the ArgoCD layer in this integration.

Trying to test this layer is making us spend valuable test time on a component we cannot change. We must be very mindful that the tests we run justify the time they take to run. Slow tests penalize developers and make them not want to run tests as often.

So how can improve this test then?

It’s possible to improve this test by taking the ArgoCD app definition and just applying its contents directly to the cluster. We would effectively be testing that the Helm chart defined in that app can be successfully installed in our cluster, which is what we care about.

You can check the whole code for this version here:

For the setup, we remove the need to add ArgoCD to the cluster. and just load all the data from the ArgoCD app file instead. Then we install the chart as it is defined by the ArgoCD application.

Setup(func(ctx context.Context, t *testing.T, config *envconf.Config) context.Context {
err := argo.AddResourcesToScheme(config)
require.NoError(t, err)

helmMgr := helm.New(config.KubeconfigFile())

argoNginxAppSpec, err := os.ReadFile(filepath.Join(currentDir, "..", "app", "nginx", "app.yaml"))
require.NoError(t, err)

nginxApp, err := objectfromfile.GetArgoApplicationFromYAML(argoNginxAppSpec)
require.NoError(t, err)

helmRepoURL := nginxApp.Spec.Source.RepoURL
helmRepoName := path.Base(helmRepoURL) // "https://charts.bitnami.com/bitnami" => "bitnami"

err = helmMgr.RunRepo(helm.WithArgs(
"add",
helmRepoName,
helmRepoURL,
))
require.NoError(t, err)

destinationNamespace = nginxApp.Spec.Destination.Namespace
helmChartName := nginxApp.Spec.Source.Chart
releaseName = helmChartName
helmChartVersion := nginxApp.Spec.Source.TargetRevision
fullChartName := fmt.Sprintf("%s/%s", helmRepoName, helmChartName)

err = helmMgr.RunInstall(
helm.WithName(releaseName),
helm.WithNamespace(destinationNamespace),
helm.WithChart(fullChartName),
helm.WithVersion(helmChartVersion),
)
require.NoError(t, err)

return ctx
})

Finally, we run our assessment:

Assess(
"Testing the chart is installed correctly",
func(ctx context.Context, t *testing.T, config *envconf.Config) context.Context {
deployment := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{Name: releaseName, Namespace: destinationNamespace},
}

var isDeploymentFullyRunning = func(object k8s.Object) bool {
dep := object.(*appsv1.Deployment)

return dep.Status.AvailableReplicas == dep.Status.ReadyReplicas
}

err := wait.For(
conditions.New(config.Client().Resources()).ResourceMatch(deployment, isDeploymentFullyRunning),
wait.WithTimeout(time.Minute*5),
)
assert.NoError(t, err, "Error waiting nginx-server to start")

return ctx
})

This assessment is pretty similar to the previous one, with the difference that we’re checking for the Kubernetes Deployment object status instead of the ArgoCD Application status.

In this case, we will wait for the Deployment available pods to be the same as the ready pods.

And when we run the tests we see that the runtime is considerably shorter:

=== RUN   TestNginxAppWithHelm
--- PASS: TestNginxAppWithHelm (5.96s)
=== RUN TestNginxAppWithHelm/Testing_nginx_helm_chart_no_ArgoCD_sync
--- PASS: TestNginxAppWithHelm/Testing_nginx_helm_chart_no_ArgoCD_sync (5.96s)
=== RUN TestNginxAppWithHelm/Testing_nginx_helm_chart_no_ArgoCD_sync/Testing_the_chart_is_installed_correctly
--- PASS: TestNginxAppWithHelm/Testing_nginx_helm_chart_no_ArgoCD_sync/Testing_the_chart_is_installed_correctly (5.01s)
PASS

Process finished with the exit code 0

From 214.63s to 5.96s. A mere drop of 97% in total run time…

Conclusion

Integration tests have this fine line of too much integration being tested. It’s very important to be mindful of the tradeoffs we incur when we add more layers to the integration.

This isn’t to say that having a slower test with more integration layers is always undesirable. For critical applications, a deeper level of integration is expected. Reliability takes over speed in such cases.

I hope that seeing this demonstration shows why keeping these tradeoffs in mind is important.

References

You can check all the code referenced in this post in the following repo: k8s-e2e-demos

--

--

No responses yet