What’s next with Istio Service Mesh?

Charley Eveno
11 min readJul 29, 2024

--

Note: Istio Ambient Mesh is still in Beta since the release of Istio 1.22 (May 2024) and should not be used in production environments until its readiness is General Availability (GA)

Prerequisites

Once upon a time

Few weeks ago, we celebrated Istio’s 7th anniversary. Indeed seven years ago, IBM, Google and Lyft teamed up to define a new service mesh approach.

But if you have just started to enter into the microservices technology, service mesh such as Istio might be new to you. In one sentence, Istio is an open-source service mesh that lets you connect, monitor and secure microservices. If you would like to know more about service mesh, I do recommend Istio’s website as a starting point.

Until recently, Istio’s architecture has been based on the sidecar pattern, where an Envoy proxy is deployed alongside each application pod. This proxy intercepts all incoming and outgoing traffic, enabling features like traffic management, security and observability without requiring changes to the application itself.

Even if Istio has given many capabilities to the developers and platform teams, the sidecar deployment model has introduced additional resource overhead as each pod requires an Envoy proxy container. This can increase memory and CPU consumption, especially in large-scale deployments. Additionally, managing, deploying and upgrading sidecar proxies can add complexity to the operational aspects of the system.

That’s the reason that in 2022, Google and Solo.io teamed up to design a new Istio data plane to simplify operations, reduce infrastructure cost with an even broader application compatibility vision. Ambient Mesh was born!

The ambient mesh dataplane in Istio eliminates the need for sidecar proxies by introducing a shared proxy layer at the node level, known as ztunnel, along with optional namespace-level proxies called waypoint for handling Layer 7 traffic. This architecture reduces resource overhead and simplifies deployment and management compared to the sidecar model.

Ambient Mesh architecture with the control and data plane and the respective L4 and L7 features

Platform setup — Google Kubernetes Engine

  1. Create a new private GKE cluster for Istio in one zone in order to reduce the costs. In my example, I have created a 3 node cluster but you can also deploy your own with 2 nodes minimum to spread the pods and observe the traffic between the worker nodes.
export PROJECT_ID=`gcloud config get-value project` && \
export M_TYPE=e2-standard-4 && \
export REGION=us-central1 && \
export ZONE=us-central1-a && \
export CLUSTER_NAME=${PROJECT_ID}-${RANDOM} && \
gcloud services enable container.googleapis.com && \
gcloud container clusters create $CLUSTER_NAME \
--cluster-version latest \
--machine-type=$M_TYPE \
--num-nodes 3 \
--zone $ZONE \
--project $PROJECT_ID

After a few minutes, your cluster will be up-and-running. Please authenticate and connect to your cluster by running the following command:

gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE --project $PROJECT_ID

2. Create a Cloud NAT for your private GKE cluster in order for your pods to access Internet

gcloud compute routers nats create NAT_CONFIG \
--router=NAT_ROUTER \
--region=REGION \
--auto-allocate-nat-external-ips \
--nat-all-subnet-ip-ranges \
--enable-logging

3. Install the Kubernetes Gateway API CRDs, which don’t come installed by default on most Kubernetes clusters as you will use it for north-south traffic into your cluster:

kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null || \
{ kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd/experimental?ref=v1.1.0" | kubectl apply -f -; }

Download and install istioctl

  1. Here download the latest version of istioctl:
curl -L https://istio.io/downloadIstio | sh -

2. Move to the directory. For example, if the package is istio-1.22.1 (available version when publishing this blog post):

cd istio-1.22.1

3. Add the istioctl client to your path (Linux or MacOS):

export PATH=$PWD/bin:$PATH

4. For this installation, we use the demo configuration profile. It’s selected to have a good set of defaults for testing, but there are other profiles for production or performance testing. If you want to know more about configuration profiles, here is more information.

istioctl install --set profile=ambient --skip-confirmation

5. Verify the installed components (control and data plane of istio) using the following commands:

kubectl get pods,daemonset -n istio-system

Istioctl comes with a set of tools that help to observe the service mesh such as Prometheus and Kiali.

kubectl get pods,daemonset -n kube-system | grep cni

Access to the Kiali interface for observability

If you don’t know Kiali, I suggest you read the documentation of this very nice and handy tool.

  1. Install Kiali via the Istio Addons from the previous istioctl repository mentioned above
kubectl apply -f ${ISTIO_HOME}/samples/addons/kiali.yaml

Access to the UI, by running the following command:

kubectl port-forward svc/kiali 20001:20001 -n istio-system

Then, access Kiali by visiting https://localhost:20001/ in your preferred web browser or from the GCP cloudshell port forward option.

Activate the security badge in the Kiali’s display in order to visualize the mTLS information that you will set up in the next steps.

Deploy the Istio sample application

No need to search or build an application for testing Ambien Mesh as Istio comes with some samples to install in the service mesh. Here you will use the sample bookinfo application.

  1. Navigate in the directory of istioctl and apply the following commands to install the application.
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f samples/bookinfo/platform/kube/bookinfo-versions.yaml

Add the following applications running inside your cluster in order to run some curl commands and simulate some traffic:

kubectl apply -f samples/sleep/sleep.yaml
kubectl apply -f samples/sleep/notsleep.yaml

2. For the North-South traffic coming in your cluster, you will install the Kubernetes Gateway with its respective HTTPRoute:

kubectl apply -f samples/bookinfo/gateway-api/bookinfo-gateway.yaml

Get the public IP address of the bookinfo-gateway:

kubectl get getaway

#use your public ip address as a variable for some curl commands later
export GATEWAY_HOST=X.X.X.X

And access it from your preferred browser: http://YOUR_PUBLIC_IP/productpage

If you refresh your browser several times, the application will display several review versions. Transition to Kiali to check the traffic graph:

3. Test the application from outside and inside point of view by running the following curl commands:

kubectl exec deploy/sleep -- curl -s "http://$GATEWAY_HOST/productpage" | grep -o "<title>.*</title>"
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"

In Kiali, from the left menu part, by clicking on Applications, you can examine the list of pods and notice that the details column informs if it is part of the mesh or not.

Kiali Dashboard with the bookinfo pods running in the default namespace

Ambient Mesh in Action at Layer 4

Here is the wow moment effect of Ambien Mesh.

  1. You can enable all pods in a given namespace to be part of an ambient mesh by simply labelling the namespace:
kubectl label namespace default istio.io/dataplane-mode=ambient

That’s it! All your pods in the default namespace have been added to the mesh. The best part is that you did not have to restart or redeploy anything.

Double check by running kubectl get pods and you will notice that we still have 1 container in our pod compared to the sidecar approach where you would have read 2/2 and if you check the age column, no restart!

On the Kiali side, please check the Application Menu. In the details column, there is no longer the “out of mesh” information.

2. Now let’s send some traffic to our application, the same ones as previously:

kubectl exec deploy/sleep -- curl -s "http://$GATEWAY_HOST/productpage" | grep -o "<title>.*</title>"

kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"

kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"

All the commands are successful. So what are the differences? Now thanks to Istio Ambient Mesh, all the communications are with mTLS and L4 telemetry.

Let’s double check with Kiali. Head over to the application graph. Now, the graph shows mTLS communication (arrow with a locker) between the pods.

Where should we get more logs about what’s going on? The answer is the ztunnel pod. As seen during the istioctl installation, ztunnel runs as a daemonset on each worker node.

kubectl get pods -n istio-system

Pick up one of your ztunnel pod and explore the logs

kubectl logs -f YOUR_ZTUNNEL_POD -n istio-system

3. Layer 4 Authorization Policy

Beyond mTLS, Ambient Mesh with the help of ztunnel will also bring secure application access using Layer 4 authorization policies. This feature gives you access control to and from a service based on client workload identities, but still at layer 4 level. (Layer 7 policies will be part of the next component of Ambient Mesh with waypoint proxies)

kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: productpage-viewer
namespace: default
spec:
selector:
matchLabels:
app: productpage
action: ALLOW
rules:
- from:
- source:
principals:
- cluster.local/ns/default/sa/sleep
- cluster.local/bookinfo-gateway-istio
EOF

Let’s double check how this L4 authorization policy is applied to our application:

# this should succeed
kubectl exec deploy/sleep -- curl -s "http://$GATEWAY_HOST/productpage" | grep -o "<title>.*</title>"

# this should succeed
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"

# this should fail with a connection reset error code 56
kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"

Beyond what we have just learnt with the ztunnel’s capabilities, if we now analyze the ztunnel’s pod infrastructure footprint, an another advantage will shop up, any guess?

kubectl top pod -n istio-system | grep ztunnel
ztunnel-l7pdh 3m 4Mi
ztunnel-llx8v 1m 2Mi
ztunnel-n8k29 3m 3Mi

The memory and CPU footprint of the ztunnel pods request less infrastructure resources. A real game changer if you only need L4 capabilities in your mesh, compared to the previous sidecar architecture where the Envoy proxies are deployed for L4 or L7 features with much more resources for CPU and memory.

Ambient Mesh in Action at Layer 7 (optional component)

For any implementation that needs to go beyond layer 4, Ambient Mesh offers a new component to add in the dataplane with the introduction of the waypoint proxy. Again if you don’t need layer 7 capabilities, no need to deploy this part (or maybe do it at a later stage). This is also one of the benefits of Ambient Mesh with its incremental deployment model.

But what is behind the scene of this waypoint proxy? This is a layer 7 proxy that runs on a per-namespace on a per-service account basis and handles all traffic entering that namespace. The waypoint proxy is based on the Envoy proxy also used in the sidecar dataplane.

  1. Using the Kubernetes Gateway API, you can deploy a waypoint proxy for your namespace:
istioctl x waypoint apply --enroll-namespace --wait

2. View the waypoint proxy; you should see the details of the gateway resource with Programmed=True status:

kubectl get gtw waypoint

3. Update the previous AuthorizationPolicy to explicitly allow the sleep service to GET the productpage service, but perform no other operations:

kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: productpage-viewer
namespace: default
spec:
targetRefs:
- kind: Service
group: ""
name: productpage
action: ALLOW
rules:
- from:
- source:
principals:
- cluster.local/ns/default/sa/sleep
to:
- operation:
methods: ["GET"]
EOF

4. Let’s generate some traffic to verify the L7 policies:

# this should fail with an RBAC error because it is not a GET operation
kubectl exec deploy/sleep -- curl -s "http://productpage:9080/productpage" -X DELETE

# this should fail with an RBAC error because the identity is not allowed
kubectl exec deploy/notsleep -- curl -s http://productpage:9080/

# this should continue to work
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"

5. If you have already deployed Istio in the past, you might know that it also brings various control traffic mechanisms such as canary deployment, circuit breaking, fault injection and many more. Here we will use the same waypoint to control traffic to reviews with 90% of requests to reviews v1 and 10% to reviews v2:

kubectl apply -f samples/bookinfo/gateway-api/route-reviews-90-10.yaml

6. Let’s generate 200 curl requests to the application and observe if the load balancing looks correct (most of the requests are with v1 and sometimes with v2):

kubectl exec deploy/sleep -- sh -c "for i in \$(seq 1 200); do curl -s http://productpage:9080/productpage | grep reviews-v.-; done"

7. If you want to test other Layer 7 features, I do recommend the Istio User guides with a couple of examples about Traffic routing, Security and Observability capabilities. Here is the guide.

Cleanup

  1. Remove the ambient and waypoint labels if you no longer the application to be part of the mesh:
kubectl label namespace default istio.io/dataplane-mode-
kubectl label namespace default istio.io/use-waypoint-

2. Remove waypoint proxies

istioctl x waypoint delete --all

3. Uninstall Istio

istioctl uninstall -y --purge
kubectl delete namespace istio-system

4. Remove the sample application

kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/bookinfo/platform/kube/bookinfo.yaml
kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/bookinfo/platform/kube/bookinfo-versions.yaml
kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/sleep/sleep.yaml

5. Remove the Kubernetes Gateway API CRDs

kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd/experimental?ref=v1.1.0" | kubectl delete -f -

6. Delete your GKE cluster by running:

gcloud container clusters delete $YOUR_CLUSTER_NAME --zone $YOUR_ZONE

7. And/Or you call delete from the Google Cloud Portal your project.

Conclusion

Flexibility and efficiency come to my mind after playing around Istio Ambient Mesh and comparing it with the other sidecar deployment mode.

The flexibility of mixing both dataplanes (Sidecar and sidecarless) in the same environment, the fact of deploying only L4 proxies and the option to inject L7 ones later will help existing customers to deploy Istio according to their needs, scale and budget from an infrastructure perspective.

One operational aspect that I foresee as a key aspect about Ambient Mesh is in regards of upgrading future versions of the service mesh as you will no longer need to restart your applications, this is one of the killer feature 🙂 When demoing it few weeks ago at the Kubernetes Community Day in Zürich, it was one of the wow effect.

Let me know if you have found this article useful and interesting in the comments. Do not also hesitate to share any other topics that you would like to read.

Happy Meshing!

--

--

Charley Eveno

Enterprise Customer Engineer @ Google Cloud Content and opinions expressed here are my own