Discovering Issues Visually in your Serverless Architecture with Thundra

I was in Serverless Days Boston last week and would like to give a big thanks to the organization team for this great day! James Beswick spoke about their story of how they started to get rid of servers. He stated that either you start writing serverless with a lift and shift of existing architecture, or with a greenfield project, you start with one giant function which does everything. Most probably, you will end up building functions that will get triggered by the API Gateway and write their output to DynamoDB.

After you step into the land of Serverless, you will continue optimizing your architecture to reduce single point of failure problems, to decrease the time spent on a transaction, to increase the readability of your code and/or for many other reasons. You will start to split your architecture to more tiny pieces. In the end, you will have a complex serverless architecture with a handful of functions, interacting with different resources to get a job done. You may end up with an architecture like this:

When you reach that level with your architecture, it is hard to detect an error or performance bottleneck by only looking into a single function. You will need to have the visual representation of your architecture, not to hang onto the wall, but to understand the root cause of an error or a performance bottleneck. Such a representation should give you a general idea about what part of the architecture started to behave unexpectedly. Today, we’re proud to announce our support to enable serverless developers to detect the errors/performance problems visually with our architecture view.

Discovering errors at a glance

When you first land on the “Architecture” page, you will see your serverless architecture presented visually and grouped with respect to their projects. In order to see a more meaningful architecture, we recommend you group the functions that are working together under one project.

Thundra’s new architecture view enables you to detect the errors in your system with a glance. When you see the edge between your Lambda function and a resource green, it indicates a healthy interaction, and you will be relieved to see it. However, if you see yellow, orange or red edges, it indicates a deterioration of the interaction respectively. Thanks to the visual aid of the interactions you can now focus on the problematic areas a lot faster. In the following architecture view, we can clearly see problematic interactions of the Redis service with the `user-get-lambda-java-staging` and `user-delete-lambda-java-staging` functions.

Symptom detected, what’s the diagnosis?

After you see a problematic situation, it is very natural that you need to look closer and understand why the errors occur. When you click on an edge in the architecture view, you will be able to see a detailed view on the right of the screen. It will give you a quick overview of how the interaction between functions is performing. Moreover, you can see in the time series charts how many times the function and the resource interacted and how the duration of the interactions is changing. You can also see the individual invocations of the function contained within the selected interaction. Using that, you can jump into the trace charts of that invocation and search for the reasons why an interaction incurred or why the duration jumped haphazardly. In the image below, you can see that there was a problem around 5:30 with the interaction between our lambda function and the connected S3 bucket. In spite of this, the interaction is a lot healthier in the last hour compared to the preceding hour.

In the image below, you can see the individual invocations that occurred between the Lambda function and the S3 bucket. Clicking on a listed invocation in the table will open the invocation details of the invocation, and you will be able to see the trace chart and the logs. Voila! With two clicks, you discovered an error in your system.

Tracking the changes with your architecture

Serverless is like lego and, we love to play with it to have higher performing systems. You sometimes try something new with your applications or introduce some new component to improve the performance. Our architecture view has a time slider to enable you to detect the changes in your serverless architecture over time.

You can play with the time slider to see the effect of a change you make in your serverless architecture. The following images illustrate how the architecture changed over time. As you can see, a new Lambda function was added to the architecture that gets triggered by the S3 bucket. We see that the health of the new architecture is still good, as we can click on the edge and get detailed information.

Architecture before change
Architecture after change

Settings with the architecture view

As I mentioned before, edges between functions and resources are colored according to the health. We enable you also to see some of the basic information of the architecture. If you enable the “Show Metrics” switch in the application, you will be able to see how many times your function interacted with the resource and what was the average duration with that function took in execution with the resource.

We loved the new AWS icons! However, our interviews with customers showed us that the majority of people are still having problems getting used to them. For this reason, we illustrate the architectural view with the old icons by default but you can always change it to the new logos.

When we first saw the architecture view in our development environment for our own architecture, we wanted to frame it somewhere. We believe that you may want to do so too, or at least put your architecture into some design document. For this reason, we let you download your architecture view as png.

What do you need to do to have the architecture view?

We are able to present the architecture view for Java, Node.js, and Python for now. In order to have the architecture view with your functions, you need to increase your agent versions as follows:

  • For Node.js, the agent library version is `2.2.1` or higher. The layer version needs to be `7` or higher.
  • For Python, the agent library version is `2.2.5` or higher. The layer version needs to be `6` or higher.
  • For Java, the agent library version is `2.1.7` or higher. The layer version needs to be `7` or higher.

What’s coming next?

With this feature addition, we opened up a new page in Thundra. Be ensured that we will be improving our architecture view in the future. Firstly, the nodes in the architecture will be clickable to show more details about the nodes themselves. For example, you will be able to see the most problematic queries when you click on a ‘PostgreSQL’ node. Another view will be available when you click on an external service and so on. Secondly, you will be able to see the traces on top of invocations in which this interaction occurred. With this, you will be able to trace all the transaction paths with our upcoming distributed tracing feature and see the problematic workflows in your system. There will be some more features to ease the job of discovering problems in your serverless architectures.

You can sign up to our web console and start experimenting. Don’t forget to explore our demo environment — no sign up needed. We are very curious about your comments and feedback. Join our Slack channel or send a tweet or contact us from our website!


Originally published at blog.thundra.io.