How I Made an Impact in My First 100 Days at Helios

How Helios helps simplify my daily dev tasks by providing full visibility into application design and architecture, generating E2E tests and more.

Published in

Cloud Native Daily

7 min readJul 20, 2023

Joining a new dev team can be an exciting but somewhat intimidating experience. On one hand, you’re jumping into new adventures and opportunities. On the other hand, most onboarding experiences are fraught with stress and a sense of overwhelming from how much you have to learn, fast, to be able to contribute to your new team. To be honest, I’d never worked at a place where the developer onboarding experience was particularly memorable — until I joined Helios.

I joined the team around 3 months ago, and like any newbie, I was eager to make an impact right away but knew I first needed to get comfortable with the product, architecture, code, setup, etc. Luckily, Helios (the product) was designed to help developers simplify day-to-day tasks in the developer journey — and it helped me tremendously as I was getting down to business. I found myself using Helios, again and again, to get myself up to speed on Helios, a classic case of dogfooding but with the added advantage of having a smooth onboarding. I share my experience below in hopes that other developers see there are enjoyable onboarding journeys out there!

Full visibility into application design and architecture

One of the first things developers generally do in the onboarding process is get to know the system architecture they’re going to be working in. That is, what services are used, how they interact with each other, what the rest of the stack (queues, databases, etc.) consists of, and in general, the flows of the application.

Using Helios, I was able to easily view in our visualization graphs the entire micro-services architecture and understand the flows. I quickly figured out not only which services interact with each other, but also the dependencies and shared models between APIs, how often each flow is used, which services are critical and many other details about our architecture that would have otherwise taken several whiteboard sessions with colleagues to understand.

As an example, one of the main flows in our OpenTelemetry-based product is to fetch traces for a specific customer. The flow starts in our main service helios-app and, upon customer login, it fetches all related traces. The helios-app queries Elasticsearch where we store all raw traces (and spans) coming from the customer’s services.

Then to present a trace we perform span grouping and aggregation during the span collection phase. The helios-app service performs queries to our entities-service, which holds an enriched trace object with additional peripheral data and metadata we collect for our customers.

We also query our account-service, which holds data about organizations and users using our system – to validate the user and the organization in each query. And this is just one flow!

The trace visualization is truly worth a thousand words; it holds all the information described above and is equivalent to a whiteboard session with a colleague.

Troubleshooting and dev velocity

As part of my onboarding, I was assigned certain small tasks which generally aren’t too complicated but do involve many services and flows so that as a new developer, I’d have a chance to get hands-on experience with as many components as possible (in terms of code, deployment, etc.) in the system.

As expected, I came across an issue while working on one of the tasks. The task was to add a cron job to collect data for our analytics service. The task was completed and my code was deployed, but the next day I found out that the job didn’t run and I needed to debug it. Using Helios' troubleshooting capabilities, I found this to be a pain-free process. The application error logs (both in local run and in remote) had a visualization link generated by Helios that got me straight to the flow’s visualizations and from there I was able to understand the exact failure point, which was an error between the analytics service and Postgres when trying to insert the data.

Generating E2E tests

Often, a popular onboarding task for new developers joining the team is writing an E2E test for a flow in the application. This task involves getting familiar with the code base, testing infrastructure, and CI/CD pipelines, and also understanding business logic regarding critical paths, ‘happy flows’ and edge cases. Helios’s E2E test generation capability turned this task — which can usually be pretty frustrating — into a great learning (and dogfooding!) experience.

I was able to generate an E2E test for one of the first features I released, which was implementing a new call to our account-service for getting all organizations from Postgres. It was a good opportunity for me to both add an E2E test and also test my own work. I generated the test directly from Helios, using the trace visualization to help me understand the requests paths and payloads, the number of requests and the order of requests.

This may sound trivial for a flow you’re familiar with, but it’s never trivial in distributed applications, let alone for flows you’re not familiar with. From the eyes of a new developer getting to know the architecture, it simplifies the process and gives a major confidence boost. I created an E2E test quickly using the test code generation and ran it. It continues running today and is part of our CI/CD pipeline now.

The above screenshot shows my process of creating the tests. First I chose my validation checkpoint — for example, I wanted to validate that the call for the DB contains a select organization.id. Then you can see the generated code with a placeholder for a hostname var which I configured to set up a relevant hostname for the test.

My first on-call shift

On-call shifts are a challenge for every developer, especially for new team members. My first on-call shift involved many alerts and errors that were totally new for me. The main difference between other on-call experiences and being on-call using Helios is that, with each error log and alert, you get a link to the relevant trace so that you can immediately understand the failure point.

Without Helios, just this particular part of understanding where the error occurred (in which service) and seeing the whole context (in which flow, how many times, what were the attributes/data of the flow) can take you an hour, if not more. Helios expedites the investigation process and made my life as an on-call much easier while speeding up the SLA. It also increased my confidence because I knew I could handle parts of the application even without being familiar with them ahead of time.

For example, I saw an alert regarding a failure in sending an invitation to a user to log into our product. From a business perspective, we can’t allow such flows to fail and it is critical to understand the root cause of the issue and if bugs exist in the flow. Fetching the trace showed me the exact error, which was specifically related to an invite input validation in our code that needed enforcement so that the API call with Hubspot would go through. The issue was quickly resolved.

Onboarding at — and with — Helios

Overall, my onboarding at Helios was a great experience thanks to Helios capabilities themselves, and the amazing team members working with me. I was able to identify issues, reproduce scenarios, and generate E2E tests — across local, staging and production environments — faster and easier than I could have imagined. This gave me the confidence to contribute and make an impact a lot faster than I would have been able to without Helios.

Using Helios, I found that I was able to save my colleagues precious time. You may be a fast learner, but as a new team member, you still rely on ‘onboarding buddies’ to help you with tasks and walk you through a completely new code base and systems. I was able to easily find answers to many questions I had along the way, without needing to bother other team members.

Helios tools are invaluable for any developers joining new teams because they help them get to know systems and streamline tasks in the developer journey. These tools and tricks are incorporated into our day-to-day development DNA at Helios, beyond onboarding. I’m proud of getting the chance to work on this product and share my experience with more dev teams around the world. The above is my own experience onboarding, but any developer who wants to onboard with confidence will benefit from using Helios.

Testing Microservices - Trace Based Integration Testing Example

Microservices architectures require a new type of testing. Here's why traditional testing fail and the new automated…

gethelios.dev

API observability: Leveraging OTel to improve developer experience

A Helios developer shares her experience applying API observability to improve API discovery, enforcement and…

gethelios.dev

Deploying OpenTelemetry with Java: Your Guide

Learn to deploy OpenTelemetry in Java to collect services data. Use Jaeger Helios and other tools and troubleshoot with…

gethelios.dev

Serverless observability, monitoring, and debugging explained

Serverless troubleshooting requires E2E observability, through collecting trace data on top of logs and metrics- Here's…

gethelios.dev

Distributed tracing Node.js- OpenTelemetry-based monitoring

distributed tracing is critical to maintaining complex systems in Node.js for fast troubleshooting - This guide covers…

gethelios.dev

Kubernetes Monitoring with OpenTelemetry

Learn how to monitor Kubernetes using OpenTelemetry with real-time visibility and granular error data - Reduce MTTR by…

gethelios.dev

Combining OTel and Prometheus metrics for alerting machine

Using both OpenTelemetry and Prometheus, we delivered a trace-based alerting mechanism quickly and efficiently - here's…

gethelios.dev

Lambda monitoring: Combining the three pillars of observability to reduce MTTR

Discover real-world examples of how connecting metrics, logs and traces improves troubleshooting Lambda errors.

gethelios.dev

How I Made an Impact in My First 100 Days at Helios

How Helios helps simplify my daily dev tasks by providing full visibility into application design and architecture, generating E2E tests and more.

Full visibility into application design and architecture

Troubleshooting and dev velocity

Generating E2E tests

My first on-call shift

Onboarding at — and with — Helios

Further Reading:

Testing Microservices - Trace Based Integration Testing Example

Microservices architectures require a new type of testing. Here's why traditional testing fail and the new automated…

API observability: Leveraging OTel to improve developer experience

A Helios developer shares her experience applying API observability to improve API discovery, enforcement and…

Deploying OpenTelemetry with Java: Your Guide

Learn to deploy OpenTelemetry in Java to collect services data. Use Jaeger Helios and other tools and troubleshoot with…

Serverless observability, monitoring, and debugging explained

Serverless troubleshooting requires E2E observability, through collecting trace data on top of logs and metrics- Here's…

Distributed tracing Node.js- OpenTelemetry-based monitoring

distributed tracing is critical to maintaining complex systems in Node.js for fast troubleshooting - This guide covers…

Kubernetes Monitoring with OpenTelemetry

Learn how to monitor Kubernetes using OpenTelemetry with real-time visibility and granular error data - Reduce MTTR by…

Combining OTel and Prometheus metrics for alerting machine

Using both OpenTelemetry and Prometheus, we delivered a trace-based alerting mechanism quickly and efficiently - here's…

Lambda monitoring: Combining the three pillars of observability to reduce MTTR

Discover real-world examples of how connecting metrics, logs and traces improves troubleshooting Lambda errors.

Written by Liron Kreiss