Google Cloud DevOps Series: Observability with SRE principles
Google Cloud DevOps Series: Part-5
Welcome to Part 5 of the Google Cloud DevOps series.. You can find the complete series Here
As Samajik becomes more agile and moves towards increasing development velocity, Samajik DevOps team would want to make sure that they respond to urgent issues that are impacting customers and not reacting to minor issues and being woken up in the middle of the night. To accomplish this objective, many organisations are adopting the SRE model defined by Google.
GKE Platform Observability-Logging and Monitoring (Demo)
In the earlier parts of this series we have created multiple GKE clusters. Let’s understand Observability techniques using the Production cluster.
One of the key benefits of ‘Google Cloud Operations’ is the ability to collect Out-of-the-box collection of system metrics, logs, and rich context without any agent deployment.
Salient Features:
- Key metrics for GKE clusters, nodes, namespaces without installing agents.
- Default collection of application logs
- Metadata and relationships amongst GKE entities objects
Navigate to the Operations-Monitoring and click on the Dashboards. It will show Dashboards for all Google Cloud services.
Navigate to the Dashboards section and click on the GKE
Navigate to the Services section. On this ‘services overview’ page, there is a list of services and a summary of services for which alerts are firing on SLOs as well as services that are out of error budgets. Observe that currently there are no SLOs defined for the services.
Let’s define the Service first. Go to the DEFINE SERVICE option and select the service for example ‘frontend’ service
Before we can set an SLO, the first thing we need to do is to identify a Service Level Indicator on which we create an SLO. Click on the ‘frontend’ service and select the ‘Create SLO’ option. Follow the steps given below.
Step-1: Set your Service Level Indicator by selecting either Request-based or Window-based option and then click on Continue.
Step-2: Define Service Level Indicator details by selecting the appropriate Performance metric eg. ‘kubernetes.io/container/restart_count’
Step-3: Define your Service Level Objective by selecting ‘Compliance period’ and ‘Performance goal’
Step-4: Review the settings and Create the SLO.
Navigate to the Operations-Logging section. Logs of the desired microservice can be explored using the Query in search section.
You can refer to the SRE implements DevOps Series on YouTube to understand more about SRE principles.
Coming up…
In this blog, we learned SRE principles and how to implement them using Cloud Operations tools. Guhan was impressed with the DevOps philosophy and Google SRE practice. Guhan and his team at Samajik implemented not just the CI/CD and developer workflow but Observability as well. Let’s stay tuned for the discussion which will cover Agility with Cost-optimisation…
Contributors: Shijimol A K, Dhandus, Anchit Nishant, Tushar Gupta
Update: You can read Part-6 here.