Signoz: The Startup’s Answer For Distributed Tracing and Metrics Tracking

The Alternative for a DevOps Team For An Resource Strapped Startup

Vikranth Kanumuru

Published in

Kanlanc

6 min readFeb 1, 2022

One fine Friday evening,

Developer: “We have no clue where the error is coming from!!”

CTO: “What is the DevOps Team doing right now?!!”

Developer: “But sir, we don’t have a DevOps team! We will have to figure out from which of the microservices our issue is coming from all by ourselves. This is gonna take quite a bit of effort and probably an all nighter…”

CTO: “That doesn't sound so bad…”

Developer: “….for three days straight….”

CTO: “ Why the heck would it take so long? Don’t we have a tracing tool in place like DataDog?!!”

Developer: “Sir…..you were too greedy to get more DevOps people for that and to pay for the service.”

CTO: “…. you are fired with immediate effect”

Developer: “I am the only one who can solve this issue in a reasonable time.”

CTO: “…You are rehired with immediate effect”

Developer: “With a hike and bonus?”

CTO: “…Damn it, alright with a hike and bonus”

Jokes aside, for a startup that is tight on resources on all aspects like human and financial, building a microservices architecture with full deployments and debugging automation is gonna be very hard. (Notice I said hard but not impossible)

Generally, the pipeline built with open source tools for tracking metrics and tracing errors looks like this:

https://www.youtube.com/watch?v=kRkz7mCugus

The tools used in this pipeline are Prometheus, Jaegar, Kafka. Its easily seen the amount of effort and knowledge required to set this up and integrate it with your application. Even more, when an error comes up on a Friday night after deployment and tracing is required to figure out what went wrong and where before your weekend is confiscated by your boss.

You can read more about the “why” of Signoz over here

The developers of Signoz understood this and to make other developers' lives easier and add more work-life balance to your life, built Signoz that basically does what the above pipeline is meant to do with easy setup and clean dashboards.

In this article, we are gonna build a bit more than the basic application given in the documentation to understand exactly how is this tool so life-changing(Pun Intended).

Setup

It’s pretty easy to get Signoz up and running just by following the documentation. The application we are gonna build is in Nodejs and the docs for it can be found here.

Note: I couldn’t get it running on windows but I was strapped for time, so I just used my Macbook to get it running.

If everything is working on the terminal after you run the command given in the docs, go to http://localhost:3000/application and it should look like the image below.

If you don’t see your node_app on the dashboard, don’t worry and just send a few requests to that API and it will appear on the dashboard.

For example,

If your application that you built using the doc is running on port 9090, send a few get requests to it by going to http://localhost:9090/hello and reload the page several times.

Enhancing The Basic Application

When I first went through the docs, I couldn’t really tell how useful Signoz would be because of the very basic application given in the documentation.

So, I went ahead and improved it by adding external API calls, internal setTimeout’s, local DB, and a few other routes to truly understand the capabilities of Signoz.

The GitHub link for the project is here

Application Metrics and Routes Overview

These are the metrics of the application from the above Github link after you send GET requests to the routes.

You can see from the image here that I primarily added 4 routes to the basic application.

The externalapidelay route true to its name calls an API from beeceptor, a free online API mock tool that lets you configure delay responses. I made it to be 5 seconds delay and adding that to the time it takes for our route to call this API, the total time taken becomes around 6 seconds.

The internalapidelay route introduces delay with the help of setTimeout function and I set it to 2 seconds

The databaseread route reads information from a localDB created with the help of lokiJS but turns out Signoz doesn’t recognize lokiJS as an actual database and didn’t consider reads and writes to it as database calls, so I’ll cover that possibly in a later article with the help of a true database like MySQL, or MongoDB.

The sendError route is simply a route that returns an error like 404 or 500 so we can understand how does Signoz tracing helps catch our bug-producing route.

//Code for all the above routesapp.get("/externalapidelay", (req, res) => {// Introduces 5 second dalay configurable in beeceptoraxios.get("https://infra.free.beeceptor.com/beerceptordelay").then((response) => {console.log(response.data);res.status(200).send(response.data);});});app.get("/internalapidelay", (req, res) => {setTimeout(() => {res.status(200).send({ message: "I am sorry I am late by 2 seconds" });}, 2000);});app.get("/senderror", (req, res) => {res.status(500).send({ message: "We have an emergency!!" });});app.get("/databaseget", (req, res) => {// call get lokidb heredb.loadDatabase({}, () => {var urls = db.getCollection("urlList");var url = urls.findOne({ urlCode: 1 });if (url) {return res.json({ longUrl: url.longUrl });} else {return res.status(404).json({ message: "No URL Found" });}});});

Finding Bottlenecks In Your Application

Signoz provides a clear understanding of which routes in our application get requests, their frequency, the time they take to give a response to the user and the number of users that experience this latency(P50, P95, and P99 are percentiles of users that experience that particular amount of delay)

This information can be extremely crucial for improving the user experience because end users are usually an impatient bunch.

Website conversion rates drop by an average of 4.42% with each additional second of load time (between seconds 0–5)

If you can figure out which route is hoarding the resources and work on optimizing them, the value you add to the company is immense.

From the above picture, we understand that externalapicall is taking 6 seconds to give an answer back to the user, let's explore it further.

Clicking on the route, I am taken to this page that has more information about this particular route and I can apply various filters to get more info about it.

What’s better, In the above picture, let’s say this is not a common occurrence and only the latest call had this delay. We can click on this particular request to get more info about it.

By doing so, we now see every phase the request went through to understand where exactly our bottleneck originated from, and from the above picture, it is clear that the delay was actually caused by a get request which was sent to the beeceptor endpoint.

Now, I know what your thinking…

But wait there’s much more you can do with Signoz like figuring out where errors are coming from your backend using Traces, using the service map tab to get a bigger picture of all the services and their relationships with each other, and many more which will be covered in a different article.

Conclusion

Signoz is one of those hidden gems that is both open source and is extremely powerful when used correctly.

The setup is extremely easy and requires zero knowledge to get up and running but the value it gives in return wants me to willingly pay for it.

If there are any other use cases or new features that I might have missed, do let me know.