Can I use Istio to upgrade to mTLS my HTTPS Kubernetes services ?

Published in

Microsoft Azure

7 min readAug 25, 2023

If the containers within your Kubernetes clusters expose plaintext HTTP endpoints, installing Istio and adding sidecar containers into the Pods to enforce mTLS encryption for both north-south and east-west traffic becomes a straightforward process.

However, when your Pods exclusively serve HTTPS endpoints and you intend to inject an Istio sidecar to enhance mTLS capabilities, this particular scenario lacks comprehensive documentation. As a result, I’ve chosen to detail my experimentation and findings to address this gap.

Sidecar proxy network connections. Image from official istio.io documentation. The red arrow indicates the HTTPS endpoint of the workload container

Install Istio

For this article, I have employed Istio version 1.18.2. Its installation will be facilitated through the Helm Terraform provider, allowing for a streamlined setup process. As part of the installation procedure, I will be making specific adjustments to the accessLogFormat. Additionally, I will incorporate cert-manager to procure valid certificates. While addressing the matter of north-south traffic, I will opt for the Kubernetes Gateway API instead of the classic Istio ingress-gateway API.

resource "helm_release" "istio-base" {
  chart            = "base"
  namespace        = "istio-system"
  create_namespace = "true"
  name             = "istio-base"
  version          = "1.18.2"
  repository       = "https://istio-release.storage.googleapis.com/charts"
  #force_update     = var.force_update
  #recreate_pods    = var.recreate_pods
}

resource "helm_release" "istiod" {
  depends_on        = [helm_release.istio-base]
  name              = "istiod"
  namespace         = "istio-system"
  dependency_update = true
  repository        = "https://istio-release.storage.googleapis.com/charts"
  chart             = "istiod"
  version           = "1.18.2"
  atomic            = true
  lint              = true

  postrender {
    binary_path = "${path.module}/istiod-kustomize/kustomize.sh"
    args        = ["${path.module}"]
  }
  values = [
    yamlencode(
      {
        meshConfig = {
          accessLogFile     = "/dev/stdout",
          accessLogEncoding = "JSON",
          accessLogFormat: "{\n   \"authority\": \"%REQ(:AUTHORITY)%\",\n   \"bytes_received\": \"%BYTES_RECEIVED%\",\n   \"bytes_sent\": \"%BYTES_SENT%\",\n   \"connection_termination_details\": \"%CONNECTION_TERMINATION_DETAILS%\",\n   \"downstream_local_address\": \"%DOWNSTREAM_LOCAL_ADDRESS%\",\n   \"downstream_remote_address\": \"%DOWNSTREAM_REMOTE_ADDRESS%\",\n   \"duration\": \"%DURATION%\",\n   \"method\": \"%REQ(:METHOD)%\",\n   \"path\": \"%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%\",\n   \"protocol\": \"%PROTOCOL%\",\n   \"request_id\": \"%REQ(X-REQUEST-ID)%\",\n   \"requested_server_name\": \"%REQUESTED_SERVER_NAME%\",\n   \"response_code\": \"%RESPONSE_CODE%\",\n   \"response_code_details\": \"%RESPONSE_CODE_DETAILS%\",\n   \"response_flags\": \"%RESPONSE_FLAGS%\",\n   \"route_name\": \"%ROUTE_NAME%\",\n   \"start_time\": \"%START_TIME%\",\n   \"trace_id\": \"%REQ(TRACE-ID)%\",\n   \"upstream_cluster\": \"%UPSTREAM_CLUSTER%\",\n   \"upstream_host\": \"%UPSTREAM_HOST%\",\n   \"upstream_local_address\": \"%UPSTREAM_LOCAL_ADDRESS%\",\n   \"upstream_service_time\": \"%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%\",\n   \"upstream_transport_failure_reason\": \"%UPSTREAM_TRANSPORT_FAILURE_REASON%\",\n   \"user_agent\": \"%REQ(USER-AGENT)%\",\n   \"downstream_tls_version\": \"%DOWNSTREAM_TLS_VERSION%\",\n   \"downstream_local_subject\":\"%DOWNSTREAM_LOCAL_SUBJECT%\",\n   \"downstream_peer_subject\":\"%DOWNSTREAM_PEER_SUBJECT%\",\n   \"downstream_peer_issuer\":\"%DOWNSTREAM_PEER_ISSUER%\",\n   \"downstream_local_uri_san\":\"%DOWNSTREAM_LOCAL_URI_SAN%\",\n   \"downstream_peer_uri_san\":\"%DOWNSTREAM_PEER_URI_SAN%\"\n}"

        }
      }
    )
  ]
}

resource "helm_release" "cert-manager" {
  name              = "cert-manager"
  namespace         = "cert-manager"
  create_namespace  = "true"
  dependency_update = true
  repository        = "https://charts.jetstack.io"
  chart             = "cert-manager"
  atomic            = true
  values = [
    yamlencode(
      {
        installCRDs = true,
        extraArgs   = ["--feature-gates=ExperimentalGatewayAPISupport=true"]
      }
    ),
  ]
}

Here is the readable detail of how my accessLogFormat looks like :

"meshConfig":
  "accessLogEncoding": "JSON"
  "accessLogFile": "/dev/stdout"
  "accessLogFormat": |-
    {
       "authority": "%REQ(:AUTHORITY)%",
       "bytes_received": "%BYTES_RECEIVED%",
       "bytes_sent": "%BYTES_SENT%",
       "connection_termination_details": "%CONNECTION_TERMINATION_DETAILS%",
       "downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%",
       "downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%",
       "duration": "%DURATION%",
       "method": "%REQ(:METHOD)%",
       "path": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%",
       "protocol": "%PROTOCOL%",
       "request_id": "%REQ(X-REQUEST-ID)%",
       "requested_server_name": "%REQUESTED_SERVER_NAME%",
       "response_code": "%RESPONSE_CODE%",
       "response_code_details": "%RESPONSE_CODE_DETAILS%",
       "response_flags": "%RESPONSE_FLAGS%",
       "route_name": "%ROUTE_NAME%",
       "start_time": "%START_TIME%",
       "trace_id": "%REQ(TRACE-ID)%",
       "upstream_cluster": "%UPSTREAM_CLUSTER%",
       "upstream_host": "%UPSTREAM_HOST%",
       "upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%",
       "upstream_service_time": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%",
       "upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%",
       "user_agent": "%REQ(USER-AGENT)%",
       "downstream_tls_version": "%DOWNSTREAM_TLS_VERSION%",
       "downstream_local_subject":"%DOWNSTREAM_LOCAL_SUBJECT%",
       "downstream_peer_subject":"%DOWNSTREAM_PEER_SUBJECT%",
       "downstream_peer_issuer":"%DOWNSTREAM_PEER_ISSUER%",
       "downstream_local_uri_san":"%DOWNSTREAM_LOCAL_URI_SAN%",
       "downstream_peer_uri_san":"%DOWNSTREAM_PEER_URI_SAN%"
    }

The workload

I will be using the echoserver gcr.io/google_containers/echoserver:1.10 container. This container at boot will generate a self-signed certificate for CN=example.com and will serve an HTTPS endpoint on port 8443.

The echoserver namespace

Let’s begin by creating the echoserver namespace and appropriately labeling it for injection. Subsequently, we will deploy the echoserver Pod.

kubectl create namespace echoserver
kubectl label namespace echoserver istio-injection=enabled
kubectl apply -f - <<EOF
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echoserver
  namespace: echoserver
spec:
  replicas: 1
  selector:
    matchLabels:
      run: echoserver
  template:
    metadata:
      labels:
        run: echoserver
    spec:
      containers:
      - name: echoserver
        image: gcr.io/google_containers/echoserver:1.10
        ports:
        - containerPort: 8443
          protocol: TCP
          name: https-port
        resources:
          requests:
            memory: "40Mi"
            cpu: "20m"
---
apiVersion: v1
kind: Service
metadata:
  name: echoserver
  namespace: echoserver
spec:
  ports:
  - port: 8443
    protocol: TCP
    targetPort: 8443
    appProtocol: https #this seems to have no effect in Istio https://pkg.go.dev/k8s.io/api/core/v1#ServicePort
  selector:
    run: echoserver
EOF

To confirm the injection worked look for the 2/2 in the READY column to confirm that the Pod is running with 2 containers, the echoserver and the istio-proxy Istio sidecar.

NAME                                READY   STATUS        RESTARTS   AGE
echoserver-7666645894-2476q         2/2     Running       0          7s

The east west traffic scenario

Lets run a curlclient Pod in the same echoserver namespace to generate some HTTPS requests:

kubectl run --rm -ti -n echoserver curlclient --image=nicolaka/netshoot /bin/bash

Inside the Pod shell run curl to make a HTTPS request:

curl -kv https://echoserver:8443

As I generate HTTPS requests, which consistently yield successful HTTP 200 OK responses, I will concurrently capture specific logs originating from the Envoy sidecar within the echoserver Pod:

stern --only-log-lines -o raw -c istio-proxy --tail=2 echoserver-7666645894-2476q |jq

This is what the log line of a single request looks like:

{
  "downstream_remote_address": "10.52.0.137:51156",
  "downstream_local_uri_san": "spiffe://cluster.local/ns/echoserver/sa/default",
  "response_flags": "-",
  "upstream_local_address": "127.0.0.6:46199",
  "method": null,
  "response_code_details": null,
  "downstream_peer_uri_san": "spiffe://cluster.local/ns/echoserver/sa/default",
  "downstream_peer_issuer": "O=cluster.local",
  "downstream_peer_subject": null,
  "bytes_sent": 3558,
  "route_name": null,
  "user_agent": null,
  "requested_server_name": "outbound_.8443_._.echoserver.echoserver.svc.cluster.local",
  "upstream_host": "10.52.0.152:8443",
  "duration": 7,
  "start_time": "2023-08-23T15:14:23.079Z",
  "upstream_service_time": null,
  "upstream_transport_failure_reason": null,
  "authority": null,
  "bytes_received": 2245,
  "protocol": null,
  "downstream_local_subject": null,
  "trace_id": null,
  "connection_termination_details": null,
  "upstream_cluster": "inbound|8443||",
  "request_id": null,
  "path": null,
  "response_code": 0,
  "downstream_local_address": "10.52.0.152:8443",
  "downstream_tls_version": "TLSv1.3"
}

The values of downstream_peer_uri_san and downstream_peer_issuer are confirming that a mTLS connection is happening. Because both Pods are running with the default service account in the echoserver namespace, the values of downstream_peer_uri_san and downstream_local_uri_san are identical. The original simple TLS request is tunneled in a mTLS connection between the sidecars. Because the original request is not terminated or decrypted, any VirtualService resource will have no effect.

The north-south traffic scenario

For conducting the north-south scenario evaluation, let’s initiate the creation of both a Gateway and an HTTPRoute. Thanks to the installation of cert-manager, the annotation cert-manager.io/issuer: letsencrypt can be employed to facilitate the automatic acquisition of a legitimate TLS certificate. In addition, the Azure Load Balancer annotations integrated into the Gateway object will seamlessly transmit to the Kubernetes Service with the LoadBalancer type, subsequently exposing the Istio Gateway Pod.

---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: echoserver
  namespace: echoserver
  annotations:
    service.beta.kubernetes.io/azure-dns-label-name: "echoserveristio"
    service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: "/healthz/ready"
    service.beta.kubernetes.io/port_80_health-probe_protocol: http
    service.beta.kubernetes.io/port_80_health-probe_port: "15021"
    service.beta.kubernetes.io/port_443_health-probe_protocol: http
    service.beta.kubernetes.io/port_443_health-probe_port: "15021"
    cert-manager.io/issuer: letsencrypt
spec:
  gatewayClassName: istio
  listeners:
  - name: http
    hostname: echoserveristio.westeurope.cloudapp.azure.com
    protocol: HTTP
    port: 80
    allowedRoutes:
      namespaces:
        from: All
  - hostname: echoserveristio.westeurope.cloudapp.azure.com
    name: https
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: All
    tls:
      mode: Terminate # Can be only Terminate or Passthrough
      certificateRefs:
          - name: echoserver-tls
            kind: Secret

---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt
  namespace: echoserver
spec:
  acme:
    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates, and issues related to your account.
    email: youremail@domain.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Secret resource that will be used to store the account's private key.
      name: issuer-account-key
    solvers:
      - http01:
          gatewayHTTPRoute:
            parentRefs:
              - name: echoserver
                namespace: echoserver
                kind: Gateway
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: echoserver
  namespace: echoserver
spec:
  parentRefs:
  - name: echoserver
    namespace: echoserver
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: echoserver
      port: 8443

Now it is possible to test if the nort-south connection works:

curl -v https://echoserveristio.westeurope.cloudapp.azure.com

The expected result is an HTTP 400 Error:

< HTTP/2 400
< server: istio-envoy
< date: Wed, 23 Aug 2023 17:26:54 GMT
< content-type: text/html
< content-length: 271
< x-envoy-upstream-service-time: 1
<
<html>
<head><title>400 The plain HTTP request was sent to HTTPS port</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<center>The plain HTTP request was sent to HTTPS port</center>
<hr><center>nginx/1.13.3</center>
</body>
</html>
* Connection #0 to host echoserveristio.westeurope.cloudapp.azure.com left intact

The problem lies in the necessity for our Gateway to terminate the TLS connection. Unlike the transparent packet interception via iptables that occurs in the east-west scenario, where the original connection is tunneled over mTLS, here the Gateway needs to establish a new connection with the backend echoserver Pod. While the connection between the Gateway and the Sidecar remains protected by mTLS encryption, the Sidecar subsequently directs a plaintext HTTP request towards the HTTPS endpoint of the workload.

Looking at the Sidecar logs we see the following

{
  "trace_id": null,
  "upstream_service_time": null,
  "downstream_local_address": "10.52.0.152:8443",
  "downstream_local_uri_san": "spiffe://cluster.local/ns/echoserver/sa/default",
  "request_id": null,
  "start_time": "2023-08-23T17:41:57.761Z",
  "connection_termination_details": null,
  "upstream_cluster": "inbound|8443||",
  "upstream_local_address": "127.0.0.6:42829",
  "downstream_local_subject": null,
  "response_flags": "-",
  "route_name": null,
  "duration": 1,
  "downstream_peer_uri_san": "spiffe://cluster.local/ns/echoserver/sa/echoserver-istio",
  "downstream_peer_subject": null,
  "bytes_received": 2278,
  "authority": null,
  "bytes_sent": 423,
  "response_code_details": null,
  "method": null,
  "user_agent": null,
  "protocol": null,
  "requested_server_name": "outbound_.8443_._.echoserver.echoserver.svc.cluster.local",
  "upstream_transport_failure_reason": null,
  "path": null,
  "downstream_tls_version": "TLSv1.3",
  "downstream_remote_address": "10.52.0.143:40892",
  "downstream_peer_issuer": "O=cluster.local",
  "response_code": 0,
  "upstream_host": "10.52.0.152:8443"
}

The traffic between the Gateway and the Sidecar has been verified as mTLS-secured. As expected the downstream_peer_uri_san, which corresponds to the Service Account of the Istio Gateway, has the value spiffe://cluster.local/ns/echoserver/sa/echoserver-istio.

To address the issue and achieve an HTTP 200 response instead of an HTTP 400, we can implement a solution involving a DestinationRule. The DestinationRule enables the application of policies for a specific destination when a client initiates a connection.

The following DestinationRule configuration configures the Gateway to establish a Simple TLS connection (not mTLS) to the hostname “echoserver” on port 8443. However, it’s important to note that simply adding the DestinationRule will lead to an HTTP 503 error. This is due to the fact that we are essentially requesting the Gateway to implement a configuration that is not feasible.

To rectify this, we also require a PeerAuthentication object. This object will serve to deactivate mTLS on port 8443 for the echoserver Pods. Unfortunately, this action has an unintended consequence: it disables mTLS for the east-west scenario as well.:

---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: echoserver
  namespace: echoserver
spec:
  host: echoserver
  workloadSelector:
    matchLabels:
      istio.io/gateway-name: echoserver
  trafficPolicy:
    portLevelSettings:
    - port:
        number: 8443
      tls:
        mode: SIMPLE
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: allow-echoserver-simple-tls-endpoint
  namespace: echoserver
spec:
  selector:
    matchLabels:
      run: echoserver
  mtls:
    mode: STRICT
  portLevelMtls:
    8443:
      mode: DISABLE

Conclusion

In conclusion, when working with Istio, the typical assumption is that workload containers expose plaintext endpoints, with the responsibility for encryption in transit delegated to the sidecar containers. Istio manages the east-west traffic scenario, seamlessly encapsulating traffic into mTLS connections, regardless of whether the intercepted request is encrypted or not.

However, the north-south traffic scenario presents greater complexity. This stems from the fact that, upon termination of the TLS connection at the mesh’s edge, the Istio API lacks a mechanism to articulate a dual-layered connection. Specifically, this would encompass a mTLS connection between the Gateway and the Sidecar, followed by a Simple TLS connection between the sidecar and the actual workload inside the Pod network namespace.