Reporting AWS X-Ray Exceptions to Slack Using Serverless and TypeScript

Arseny Yankovsky
Sep 9, 2018 · 4 min read

Once you’ve setup your application tracing with AWS X-Ray the next logical step is to use it to do monitoring.

Today we’re going to do a simple AWS Lambda that will report exceptions to Slack. The lambda will run every 5 minutes, get all traces with exceptions for the last 5 minutes and then report them.

Example exception report

Getting Exceptions From AWS X-Ray

First, we’re going to get exceptions from the AWS X-Ray. To do that we will utilize getTraceSummaries and batchGetTraces methods of AWS.XRay class.

Let’s implement getTraceIds function that will get ids of traces with exceptions for a given time period.

import { XRay } from 'aws-sdk'
import { TraceSummary } from 'aws-sdk/clients/xray'
import { map } from 'lodash'
const getTraceIds = async (xray: XRay, timePeriod) => {
const params: GetTraceSummariesRequest = {
EndTime: new Date(),
StartTime: new Date(Date.now() - timePeriod),
FilterExpression: 'Error',
}

let traceIds: TraceSummary[] = []

while (true) {
const response = await xray.getTraceSummaries(params).promise()

traceIds = traceIds.concat((map(response.TraceSummaries, trace => trace.Id)))

if (response.NextToken === null) {
return traceIds
}

params.NextToken = response.NextToken
}
}

Since getTraceSummaries call can be paginated we need to request more data until we get null as NextToken.

After we got all the ids it’s time to use batchGetTraces to get details of traces. We will implement getTraces function that will use our getTraceIds function to get the list of traces with exceptions.

import { XRay } from 'aws-sdk'
import { chain } from 'lodash'
const region = process.env.AWS_REGIONconst getTraces = async () => {
const xray = new XRay({ region, apiVersion: '2016-04-12' })

const timePeriod = 60 * 1000 * 5

const traceIds = await getTraceIds(xray, timePeriod)

const summaryResponses = await Promise.all(
chain(traceIds)
.chunk(5)
.map(currentChunk => xray.batchGetTraces({ TraceIds: currentChunk }).promise())
.value(),
)

return chain(summaryResponses)
.map(response => response.Traces)
.flatten()
.value()
}

We need to break trace ids into chunks before requesting traces because batchGetTraces will only work on 5 at the time. Fanning-out requests will make the lambda faster and more effective.

Forming Exception Report Messages

Now that we got all of our exception traces from the API it’s time to transform them into human-readable messages. But before we could do that we need to extract exception data from traces. Let’s implement extractExceptions function that will do that.

const extractExceptions = traces =>
chain(traces)
.flatMap(trace => flatMap(trace.Segments, (segment) => {
const document = JSON.parse(segment.Document)

if (document.cause && document.cause.exceptions) {
return map(document.cause.exceptions, exception => ({
exception,
segment,
trace,
document,
}))
}
}))
.compact()
.value()

This function traverses traces array in search of exceptions and returns a flat array objects that will hold the information on the exception itself, segment and trace it was reported in. Let’s call these kinds of objects exceptionData.

Now when we did that we will use these objects to form a human-readable message. We will break it down into two separate functions. printStackTrace function will only print the exception stack trace. generateExceptionMessage will use it and also add some general exception information.

const printStackTrace = (exception) => {
return map(
exception.stack,
traceElement => ` at ${traceElement.label} (${traceElement.path}:${traceElement.line})`,
).join('\n')
}

const generateExceptionMessage = (exceptionData) => {
return `Exception occured in *${exceptionData.document.name}*\`\`\`
${exceptionData.exception.message}
${printStackTrace(exceptionData.exception)} \`\`\``
}

We will use Slack formatting so it will result in a message like this:

Sending Messages To Slack

Now that we have exceptionData array and functions to form human-readable messages it’s time to send them to Slack.

We will need to create a Slack App. After we did that we’ll need to add chat:write:bot permission to the created app, add it to Slack workspace. Now when setup is done let’s save the OAuth access token that we’ll use.

One last thing is to find a Slack channel id. Easiest way to do that is to copy the link to a channel that looks like that:

https://workspace.slack.com/messages/CCM2D5ZPC

That last part of the url is the channel id we’ll need.

Now that we have all the data we need let’s implement a sendMessages function that will accept an array of exceptionData and send messages to Slack.

const region = process.env.AWS_REGIONconst sendMessages = async (exceptionDatas) => {
const web = new WebClient(process.env.SLACK_TOKEN)

const ts = Number(new Date())

await Promise.all(exceptionDatas.map((exceptionData) => {
return web.chat.postMessage({
ts,
icon_emoji: ':fish:',
channel: process.env.SLACK_CHANNEL as string,
text: generateExceptionMessage(exceptionData),
as_user: false,
attachments: [
{
fallback: `Check trace at https://${region}.console.aws.amazon.com/xray/home?region=${region}#/traces/${exceptionData.trace.Id}`,
actions: [
{
type: 'button',
text: 'View Trace',
url: `https://${region}.console.aws.amazon.com/xray/home?region=${region}#/traces/${exceptionData.trace.Id}`,
},
],
},
],
})
}))
}

Note that we also added nice small button that will open an exception trace details in AWS Console.

Now that we have all the functions done let’s combine them all and make a lambda that will run every 5 minutes.

export const reportExceptions = async () => {
try {
const traces = await getTraces()

const exceptions = extractExceptions(traces)

console.log(`There are ${exceptions.length} exceptions`)

if (exceptions.length > 0) {
await sendMessages(exceptions)
}
} catch (e) {
console.log(e)
throw e
}
}

Let’s define our service using Serverless Framework:

service: xray-monitor

provider:
name: aws
runtime: nodejs8.10
region: eu-west-1
iamRoleStatements:
- Effect: "Allow"
Action:
- "xray:*"
Resource:
- "*"

plugins:
- serverless-plugin-typescript

functions:
report:
handler: src/handler.reportExceptions
events:
- schedule: rate(5 minutes)
memorySize: 128
environment:
SLACK_TOKEN: PASTE_YOUR_SLACK_TOKEN_HERE
SLACK_CHANNEL: PASTE_SLACK_CHANNEL_ID_HERE

Full source code is available on GitHub. It also contains instructions on how to start a reporter for your project in around 10 minutes.


We have setup an easy and convenient way of monitoring exceptions in your application.

It’s easy to change the way we get messages to email or a dashboard. It could also be useful to monitor slow execution time for performance critical endpoints or processes.

Arseny Yankovsky

Written by

Software Developer

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade