How to implement Istio Service Mesh — AMA with Kissflow DevOps team

Dinesh Kumar P
CTOtalk
Published in
4 min readJul 7, 2021

Though Microservices solves many problems, it can also bring in several challenges like Traffic Management, Discoverability, Security, Observability, etc.

This blog is about Kissflow’s journey to improve its microservices architecture by deploying Istio service mesh. Rajesh A, DevOps Architect at Kissflow walks through, how these tools have solved the challenges faced in the existing microservices architecture of Kissflow. We hope this would help folks who would like to know how Istio service mesh is implemented in an existing system.

1. What were the problems/challenges faced in Kissflow’s microservices architecture before implementing Istio service mesh?

When splitting the bigger system into smaller pieces, few general challenges arise in the system.

a. Communication — We didn't have a system to control/troubleshoot the traffic flows and API calls between services; Traffic I mean in 2 forms,

  • North-South traffic — Direction of traffic flow in and out of the app’s perimeter happening via the application users. E.g. API requests getting in and data (payload) sent out from the perimeter as a response.
  • East-West traffic — Direction of traffic flow between different services internally within the app’s perimeter. E.g. Service A to Service B; Type of communication, response exchanged between them…

b. Security — Even within the app, we preferred to set up secure communication between different Microservices to prevent vulnerabilities like MITM attacks (Man in The Middle).

c. Visibility — What are the services running, How much each of the services is getting loaded with requests and responses, which one is getting failed, and how frequent it is.

d. Monitoring — In addition to these, monitoring every running component up and running all the time is not an easy job.

These are the challenges and we thought Istio would solve the same.

2. Nice! I hope Istio would have solved these challenges post-implementation. Can you take us through the journey, how Istio service mesh is implemented in Kissflow architecture by phases?

When a production environment is being used by thousands of customers, we can't do a lift and shift kind of approach. We did careful progress in quick phases. Let me explain in detail.

Phase 1 - Evaluation — We listed out certain criteria in mind to filter out among different tools. These criteria vary between companies. In our case it is, good support, a future-ready system, managed service. With that, we investigated different tools like Istia, Linkerd, Consul and finalized choosing Istia as “Anthos Service Mesh” via Google.

Phase 2 - Interactions with Google Architects — Worked out solutions for our problem statement, iterated design architecture with pros and cons of each, and ensured following best practices designing the architecture.

Phase 3 - Deploy in dev environments first — Once the architecture is ready, along with the engineering team, we implemented it in the development environment first. Planned as loosely coupled and ensured that there is no dependency on infrastructure w.r.t code. Tried out few things for traffic management like defining a circuit breaker for a service, timeout, retry, canary deployment, A/B testing. All these things we were able to do at the infra level and code need not be touched at all.

Phase 4 - Up and Shift production infrastructure — We are aware that the base of architecture is being changed. With minimized downtime, our engineers did the rollout swift region by region.

With all hard work, we implemented this architecture in our GCP via automation using terraform and shell scripts.

3. There are many open source components for Service mesh. Can you talk a little bit more about why we have chosen managed service?

Yes. Istio is also an open-source component and can be deployed by ourselves. But, the burden of maintaining it, following best practices in security, etc. falls on our heads. Also, we thought to have certain functionalities like blue-green deployment. As told earlier, Kissflow’s standpoint is always to go as managed service with better support.

Google has bundled Istio as “Anthos Service Mesh” with all the features we had in mind. And Kissflow is hosted in GCP eventually we already had a good relationship with Google, which led us to go for Istio in form of Anthos Service Mesh.

4. What you have planned next to do in the cloud infrastructure of Kissflow?

a. Insights from telemetry data — Service mesh produces a lot of telemetry data like successive response rate, dependency services, RPS, ingress traffic, audit tracing. We are planning to set up a better monitoring tool to handle as proactive and reactive ways of handling issues.

b. Edge computing — Distributed computing paradigm allows bringing computation and data storage closer to the location to improve response time and save bandwidth. i.e. deploy the cluster to multiple edge locations still being secure and having control over those microservices cluster from my end. While this is being implemented, we should be ready to support the engineering team from the infra team side. I think we will sort it out when the system gets a bit more mature.

5. Who should implement Service mesh? Is it only by those who are having Microservices as their architecture pattern?

Service mesh is a proxy pattern, implemented based on the sidecar ambassador pattern. So, the service mesh is not just for microservices. It can be implemented for multiple patterns including legacy architecture. If you have an architecture where the application’s service should not be handling the I/O interaction directly and should happen via a proxy i.e regulate the system via proxy, then you can think of using Service mesh.

Finally,

Tampering through a running production system is not an easy task. Engineering changes like performance improvements, infra changes are INVISIBLE and the work done might not show much addition to the product from an external/user point of view. Anyway standing for I in WINES framework followed by Kissflow, such invisible changes also are taken into consideration like this Istio service mesh rollout change done.

See you soon in the next blog :)

--

--

Dinesh Kumar P
CTOtalk
Writer for

Product @Kissflow | Microsoft MVP - Data Platform | Low code & No code passionate