Your laptop as part of a service mesh

Lorenzo Fundaró
Omio Engineering
Published in
5 min readJan 19, 2021

This blog post shows how we used Istio’s EnvoyFilter to dynamically reroute requests from a test cluster to anywhere and from anywhere back to the test cluster.

The problem

At Omio, we need to run end-to-end tests covering dozens of services. Historically, we have run those on dedicated VMs that developers provision with all the required dependencies. As our microservice architecture grew up, this option stopped being effective; they had to keep track of an ever-growing set of dependencies and sometimes they wouldn’t even fit on a single VM. Developers started using our QA environment to run these tests; a Kubernetes cluster running all production services but without the side effects (credit card charges, refunds, etc). We soon realized this method had some shortcomings:

  • Slowed down the development feedback loop. To get your software tested with the latest release of your dependencies, you have to open a Pull Request, get someone to review it, pass unit tests, merge to master, wait for deployment, start the test, the test fails, start the process all over again.
  • Needs coordination within a team, i.e. no other commit should be merged to master to avoid overlapping tests.

Candidate solutions

We considered other ways but none of them worked for us:

  • Creating mocks for all of our services. With so many services, we would have to invest too much time implementing mocks and, not to forget, maintaining them.
  • Have a dedicated VM with a lot of resources per team. Unfortunately, it slows down development. People within the team need to take turns to take over the machine and run tests. It’s also not cost effective.

The solution

We decided to keep using our QA environment but focused on solving the shortcomings that we mentioned above.

Avoiding the process of merging our code to master and deploying to QA could significantly speed up the development feedback loop. For this to happen, we needed to make our service in development join the cluster. When running a test, we wanted requests to hit our development service and not the one running in QA. The following illustrates this better:

A client makes a request which would normally travel through A, B and C in QA. Instead, this request contains routing information that tells Envoy that when hitting B, the request should instead go to B’. Service B’ should be configured to talk to C on QA to continue the normal flow of the request in QA.

Implementation

We discovered that this dynamic routing is possible with the help of Istio’s EnvoyFilter. Istio injects a sidecar to every pod in our cluster and each filter we apply will run in all side-cars. This filter is a wrapper around Envoy’s Lua filter, which provides a simple API that consists of two functions:

function envoy_on_request(request_handle)
function envoy_on_response(request_handle)

A consumer of this API can provide an implementation of each function with a custom Lua script. Envoy will run each function before (on_request) or after (on_response) in the request flow. We are interested in the highlighted function. For every inbound request, we would like to decide if the request should be rerouted somewhere else. To trigger the routing logic, a request should contain two things: which service is subject to rerouting and where it should be sent to?

If a developer wants a requests to be dynamically rerouted, the following HTTP header is needed:

HTTP 1.1 GET /
....
x-devroute: { “foo”: “192.168.1.12:8080” }
....

This tells the Lua function that if the service that it is proxying to is called “foo” then the request should be rerouted to “192.168.1.12” on port “8080”. The following is a pseudo-code of the Lua function:

function envoy_on_request(request_handle)
contract = request_handle:headers():get("x-devroute")
if string.match(contract, "foo") == nil then
return
end
-- we have a match, send the request somewhere else
address = contract["foo"]
headers = request_handle:headers()
-- make a call to a Foo running somewhere else
response = request_handle:httpCall(address,headers, ...)
-- respond immediately and don’t proxy request to original Foo
request_handle:respond(response)
end

Gotchas

Unfortunately, this Lua function can only call members inside the mesh, so putting your laptop’s IP would not work out of the box. One would need to specify a cluster (see Cluster Manager) and fill in the details. This makes our contract hard to work with, i.e. whenever you need to test something, you have to take steps to configure infrastructure. We have worked around this by telling the Lua code to route traffic to a proxy within the mesh, which in turn will read the address value and forward the request to the final destination. With this last component, we close the loop and we are able to dynamically route requests from a test environment to anywhere we want and back.

The following gives a final picture of all the components involved in the solution:

We have prepared a reference implementation that you can use to play and better understand how all the pieces fit together.

Drawbacks

A limitation of this contract that is worth mentioning is that it requires all services in the cluster to forward the contract header (x-devroute). Without this header, the Lua code can’t take any routing decision and should just proxy the request to its original destination within the cluster.

Conclusions

In this post, we have demonstrated one of the many possibilities that are available when the networking layer of a cluster offers an API. Thanks to Istio, we are able to gain fine-grained control of the traffic that travels in our cluster and take on-the-fly routing decisions. With this new contract, we save developers time and reduce costs. Finally, end-to-end tests can now be run from anywhere with minimal infrastructure configuration.

P.S.: check out our reference implementation to learn more.

--

--