Serverless Monitoring: How to Funnel Lambda Errors to Slack

Listening to CloudWatch logs to alert developers about errors in Lambda functions

Ronny Roeller
NEXT Engineering
2 min readAug 13, 2023

--

The Challenge: Monitor a Dynamic Lambda Ecosystem

At NEXT, we pride ourselves on our expansive serverless ecosystem, which comprises over 100 Lambda functions. While this architecture is dynamic, it presents the challenge of efficiently monitoring errors. It’s unrealistic to expect our Ops team to be familiar with the nuances of every Lambda function. It’s therefore crucial to promptly notify developers on Slack when something within their domain failed.

The Solution: Dynamically Managed Subscriptions to CloudWatch

We’ve devised a three-tiered solution to address this:

  1. Proactive monitoring with Lambda: We implemented a dedicated Lambda function that continuously monitors errors across our stack via CloudWatch. This function not only identifies errors but also promptly relays notifications to our developers on Slack.
  2. Adaptive design for change: Given our ever-evolving ecosystem with new Lambda functions being introduced regularly, adaptability is key. Whenever a new function is deployed, our system auto-updates. The subscription filters for our error-notification Lambda function adjust in real-time, ensuring comprehensive monitoring.
  3. Daily Health resync: We believe in double-checking. To that end, we’ve incorporated a daily resync powered by AWS EventBus. Think of it as a routine checkup to ensure we never miss an error.

Prevent error fatigue by distinguishing genuine anomalies from expected errors

The risk with such proactive error monitoring is the potential onslaught of notifications, leading to “error fatigue”. Certain errors, such as those stemming from Cognito’s pre-signup trigger — which expects an error when signup is to be prevented — are expected. These aren’t system glitches but purposeful blocks.

To filter out such “expected errors”, we’ve introduced a special error class marked with a distinct identifier. This lets our alerting Lambda function differentiate genuine anomalies from the routine ones.

Here’s a look at our error class:

class ExpectedError extends Error {
public marker = 'EXPECTED_ERROR';
constructor(message) {
super(message);
Object.setPrototypeOf(this, ExpectedError.prototuype);
}
}

Subsequently, our AWS subscription filter can then ignore these expected errors:

{
destinationArn: sendErrorNotificationLambdaArn,
filterName: 'SendErrorNotificationFilter,
filterPattern: 'ERROR -EXPECTED_ERROR',
logGroupName,
}

In conclusion, our quest to centralize error notifications has offered valuable insights. As we grow and evolve, we’re reminded of the importance of proactive error handling in ensuring the reliability of our serverless ecosystem.

Happy coding!

--

--

Ronny Roeller
NEXT Engineering

CTO at nextapp.co # Product discovery platform for high performing teams that bring their customers into every decision