Choreo observability for improved developer productivity
How Choreo observability helps developers to build robust applications
One major challenge of distributed systems is observability. If you are developing a system with 10s, 100s, or 1000s of services, then implementing observability at each service so that any failure scenario can be troubleshooted is a tedious task. Developers need to add different levels of logging (e.g. debug, info, fatal, warning) and write code to publish them to log monitoring systems. This would take a considerable amount of time from the developers that would otherwise be spent on doing business-critical implementations.
Choreo considers observability as a first-class feature of any enterprise system. Hence it has a built-in observability feature that is enabled by default for each application that you develop. Choreo offers details such as throughput, latency, and logs from the platform itself without any manual interventions from the developers.
Choreo frees the developers from the task of setting up log monitoring systems, adding log entries, and publishing logs from the code. Instead, it provides a built-in observability mechanism that would capture the logs from the underlying infrastructure and monitor them and analyze them and present the details in a user-friendly manner.
Choreo observability provides details such as throughput, latency, log entries, performance breakdown, and direct access to the log files. Developers can troubleshoot the errors and failures in the production environment itself by analyzing the provided details and the log files. In a typical enterprise setup, developers spend a lot of time recreating production failures in pre-production environments since they are hesitant to touch the production systems. With Choreo, developers don’t need to worry about those aspects since availability levels are guaranteed by the platform itself.
The figure below depicts the observability interface of Choreo for a given application.
As shown in the preceding figure, the applications that you deploy in the Choreo platform are observed by the platform itself without writing any additional code. Once you choose the program that you need to troubleshoot or debug, you will see different views of the observability parameters on the right-hand side of the interface.
Throughput and Latency view
This view provides a graphical representation of the Throughput (TPS) separating out the successful requests and failed requests against the time in one graph. Another graph shows the corresponding latency of the requests against the time. These graphs help developers to identify the failure scenarios. In addition to these 2 graphs, there is a section at the bottom that shows the log entries of the application. If you have added additional logging for debugging purposes, this view will help immensely to identify the root cause of the failure. The below figure depicts this view for a sample application.
There is another view in the observability section which shows the details of each and every log entry along with the system and application metrics. It combines the log entries with the metrics such as
- Error count
- CPU usage
- Memory usage
By analyzing these parameters, we can get a good understanding of where things have gone wrong and take necessary measures to fix the problem. The figure below depicts this view with a sample application.
If you are interested in learning how to troubleshoot an example with the Choreo observability feature, you can refer to the below-mentioned article.
Performing a Root Cause Analysis (RCA) with Choreo observability
System-level monitoring with DevOps portal
In addition to these application-specific observability features, Choreo also provides a DevOps portal to monitor the system-level metrics of the applications. You can find more details on that from the following link.