Connecting to a kafka cluster running in kubernetes from outside
Ok, so trying to connect your machine to a remote kafka cluster running in kubernetes. What are the options? There’s port forwarding with kubectl, VPN-ing to the cluster, telepresence, …
We’ll look at the hard way, using port forwarding & DNAT to the kafka pods, mostly for fun.
The hard way
…with a single broker
kubectl port-forward svc/kafka 9092
opens a localhost:9092
port, but if you connect a kafka consumer to it, the kafka protocol will point the consumer to the broker’s IP address in the cluster and it’ll fail:
$ kubectl port-forward svc/kafka 9092$ kafkacat -C -b localhost -t foobar -o -1 -c 1%3|1533029229.817|FAIL|rdkafka#consumer-0| 100.111.134.183:9092/bootstrap: Failed to connect to broker at 100.111.134.183:9092: Network is unreachable
%3|1533029229.817|ERROR|rdkafka#consumer-0| 100.111.134.183:9092/bootstrap: Failed to connect to broker at 100.111.134.183:9092: Network is unreachable
%3|1533029229.817|ERROR|rdkafka#consumer-0| 1/1 brokers are down
% ERROR: Failed to query metadata for topic foobar: Local: Timed out
100.111.134.183
is an internal cluster IP, which is inaccessible from your laptop. You can however DNAT that IP to localhost to force it through the tunnel like so:
$ iptables -t nat -I OUTPUT -d 100.111.134.183 -j DNAT --to-destination 127.0.0.1
Now our consumer will merrily connect and fetch data from kafka via the tunnel:
$ kafkacat -C -b localhost -t foobar -o -5 -c 3
foo
bar
baz
For osX, check out this SO answer detailing on how to do DNAT.
Tzapulica kindly wrote a gist for setting this up on osX. Thanks tz! :)
…with multi-broker
This will require port-forwarding all broker pods & DNAT rules for each.
I’ve written a proof of concept script here. (it’s only for linux).
The easy way
….and now, after all that, there’s Telepresence, which is infinitely easier to setup and use:
Telepresence substitutes a two-way network proxy for your normal pod running in the Kubernetes cluster. This pod proxies data from your Kubernetes environment (e.g., TCP connections, environment variables, volumes) to the local process. The local process has its networking transparently overridden so that DNS calls and TCP connections are routed through the proxy to the remote Kubernetes cluster.
telepresence --run-shell
More on the linux network/netfilter stack here
https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg