Monitoring Serverless Applications with Epsagon

Tech@ProSiebenSat.1
ProSiebenSat.1 Tech Blog

--

by Daniele Frasca

Some say that “you cannot observe without monitoring”. You can also go into production without monitoring in place, but it will make your life extremely complicated if you must find a problem.

Considering that everything around us is running by software applications, and competition is remarkably high in every sector, we cannot allow our applications to underperform or have unresolved issues because we may have very high-cost consequences.

All systems fail, and every software has bugs. Given that application performance monitoring is a crucial part of our development time, we should aim to have a solution that mitigates issues before they become problems and let us face them. No one wants to wake up in the middle of the night because of some unexpected situation you could find out by looking at your application during the development phases.

Selecting the right tool is essential for your business. Bugs are expensive, and quickly finding them will save you a lot of time, money, and mental health. We all have been in a situation where we jump from one log to another trying to put all the pieces together, and we all know that it is not funny when you have a production bug, and your customers are facing some disruptions.

What should we look for?

Before shopping around, we have heavily used what AWS offers to us. It was essential because all the products out there have their strengths and weaknesses. We had to understand the limitations and the needs of our application to be successful. For us, the significant points are:

  1. Alert system
  2. Reports
  3. User Experience
  4. End-to-end tracing with multiple accounts/teams
  5. Search
  6. Service Maps

AWS Observability offers multiple services around Amazon CloudWatch, X-Ray and CloudWatch Insights:

• Logging (why):

  • By default, it is not centralised, and you have many log groups.
  • CloudWatch Logs Insights enables you to search and analyse your log data interactively up to 20 log groups.

• Metrics (what):

  • By default, several services provide free metrics for resources.
  • Amazon CloudWatch can display all the services metrics of your account.

• Tracing (where):

  • Distributed tracing with X-Ray helps you to find out where the failures are. Moreover, they give you excellent visibility of performances.

• Insights:

  • Collects and aggregates Lambda function runtime performance metrics and logs for your serverless applications.

In theory, you have everything you need, but you must actively enable tracing (AWS X-Ray), Lambda Insights, and you will be surprised to find out that AWS X-Ray (at the time of writing) cannot trace the request end-to-end.

For example, if you want to see a connection between your Lambda and a service like SQS, you manually instrument the code to do so.

Let’s take a simple case:

X-Ray will show you two traces like this:

Without connecting them, and so at debug level, you cannot follow a single request. Because of this, you won’t have an idea by which request the problem is caused.
I do not want to talk about all the other limitations of this service, but sadly, AWS X-Ray doesn’t fit our purpose, so we moved to Epsagon.

Epsagon stitches together metrics, logs and traces, increasing observability, and it is almost painless free.

The onboarding is straightforward. You must deploy Epsagon CloudFormation into your AWS account by clicking one button.

Once your account is integrated, you can auto-trace your Lambda functions, but I would not use this choice because, with CI/CD, you want to be on top of your integration. So in a few steps, you are ready to go.

With CloudFormation, you can configure your Lambda functions like this:

In the Lambda code, you need only to initialise the library and wrap your function:

Suppose you are now using SQS, SNS, EventBridge etc. In that case, Epsagon will trace them automatically for you without you adding instrumentation for each service, and you will end up with a Service Map that will connect all your services like this:

Service Map is not helpful if you cannot search or filter. Something convenient in Epsagon is the concept of tagging traces.

Tagging traces can help you enrich the data collected as part of the trace for two main reasons:

• To pinpoint a specific event in our application.

• To detect trends based on a unique business dimension.

Tagging adds more context to an existing trace. You can do it in two ways, and it depends on what you want to achieve:

  1. By code “epsagon.label(‘key’, ‘value’)”.
  2. Indexing custom tags from your trace.

Now you can search traces and filter by your custom tags if you need.

The payload of the message that triggers the Lambda function is something you are not getting with AWS X-RAY, but I have found extremely useful.

Finally, I would like to spend some words on the Alert system. Compared to AWS options, it is much easier, plus you have better options out of the box like visibility on timeout, and you get alerts if your Lambda is close to reaching the configured setting like Memory. You have four types of alerts, and Lambda has interesting events.

You can easily configure what you want and decide to receive the alerts in multiple channels.

Conclusion

We are happy with our choice: Epsagon is currently ahead of AWS offerings, and for transparency, they only have few issues (By the way, Epsagon does not pay us). As I already mentioned in this article, every solution out there has its strength and weakness.

We are in constant contact with their support, giving them feedbacks to aspire for a better product for both parties.

Thanks to the enhanced observability with respect to the past solution, we can now find issues earlier, just before they could become a problem. Besides, thanks to the easier usability, we can quickly decide where and how to improve our system.

--

--