How we are improving developer experience at QuintoAndar with backstage.io

Gabriel Dantas
Blog Técnico QuintoAndar
10 min readJan 6, 2022

How was our platform team’s journey in using an open-source tool to solve our engineers experience challenges

When a tech company starts to scale increasing the number of services and engineers, the visualization of resources and tracking of ownership becomes distressed. Organizing the engineering structure, its accesses, cataloging resources, and making all the tools accessible to everyone, is a challenge for every company growing in a fast-paced environment.

At QuintoAndar, we were challenged by two major problems:

  • Making all the engineering tools easily accessible/updated for everyone
  • Improve the visualization of our microservices including its ownership, in a way that reflects how our engineering teams were organized.

Below we’ll describe the problems mentioned and how Backstage has helped us succeed.

Access and visualization of engineering tools

It is typical for developers to use more than one tool in their daily development, such as ArgoCD, Drone CI, Grafana, Thanos, and Kibana.

Inside QuintoAndar the tools are dynamically provisioned and can have their addresses changed. This means that addresses that were saved in the favorites bar or in documentations were deprecated with a certain frequency. Keeping our developers up to date with news is always a challenge and not addressing this issue has a strong impact on engineering productivity. Only documenting tools’ location and how to access them wasn’t scalable enough for our scenario.

With that in mind, our platform teams started to elaborate on the idea of having a “portal” service so our developers could easily infrastructure and product tools.

Image of a blue dog using a notebook surrounded by tools used in the DevOps ecosystem.
Homes is our listing sniffer and is surrounded by tools we use in our engineering

Improve the visibility of our microservices and their relationship with our squads and tribes

Accessing our tools wasn’t our only challenge. We have several squads and tribes within QuintoAndar creating new microservices almost every week. The high volume of new services has brought to us concerns about dealing with ownership:

  • How do we know which teams are responsible for each microservice?
  • Which team should we contact to know about business rules?
  • Can we find out how many services each squad/tribe is the owner of?
  • How can we relate microservices metrics to their respective squads and tribes? (Ex: APIs error rate, latencies, vulnerabilities)

This information is valuable not only for the product teams to organize themselves but also for the platform teams to be able to create feasible cost management and incident management practices in our engineering structure.

And for a long time, the solution to our problem was a spreadsheet 😵‍💫

A fictitious table of how QuintoAndar services were cataloged with column of repository, tribe, squad and service owner’s email.
A fictitious table of how we organize our services

But like any spreadsheet, some problems started to become more and more evident:

  • Often the team responsible for that service was out of date
  • New services were not always inserted in the spreadsheet
  • Disabled or invalid emails were not updated in the spreadsheet
  • The spreadsheet was becoming a database
  • The repository name was not enough to correlate infrastructure resources to that microservice
  • New hires were unaware of the spreadsheet

In other words, we didn’t have confidence in the spreadsheet to be our source of truth about our services and which teams were working on them.

Thinking about how to solve the problems mentioned, our team started to look for a tool that can be a good fit to not only solve the issues but also empower the software development journey at QuintoAndar.

Backstage

What is Backstage?

Simply put, Backstage is an open-source project created by Spotify and donated to the CNCF (Cloud Native Computing Foundation), which enables us to create a portal for developers. With Backstage we are able to expose in a unified way everything that is needed to create and maintain infrastructure resources, follow CI/CD pipelines, and visualize observability metrics, without the developer needing to have specific knowledge about the tools, being a true hub of information for any developer within the company.

Backstage Core Features

To be able to accomplish this mission Backstage has some main features.

Service Catalog

The Service Catalog is one of the most important features of Backstage, it is through it that we can store data related to services, libraries, data pipelines. Anything that is a component used within your company, can be stored along with its metadata. [1]

Template Software

Template Software is a mechanism that allows the creation of components inside Backstage directly through the user interface with a few clicks. Using a YAML file as template new services could be created allowing to add the right resources to it. It provides a head start to developers improving productivity and dev experience.

The Software Template helps a lot in standardizing the creation of services, a simple example is Spotify’s Golden Path. [2]

TechDocs

TechDocs is a feature that brings the docs-like-code idea, which is, bringing the documentation about your service living together with the source code and being accessible by any developer.

This core feature offers a set of resources so that the service documentation written in markdown can be centralized in Backstage. With that, the documentation is quite easily searchable and accessible to anyone who needs it. [3]

Nowadays these are the main features of Backstage provided at the core, but they have also a list of useful plugins that can be installed individually, which you can find in the Plugins Marketplace.

The potential connecting Backstage and QuintoAndar

Although all the main Backstage features have attracted our attention, issues such as documentation and resource creation are challenges that we already have some tools to help our developers in their daily use and that even deserve a dedicated blog post.

With that, the Service Catalog was our first objective to bring Backstage into QuintoAndar. Remember the spreadsheet? It was time to kill it with fire.

Integrating Backstage in our Workflows

We internally developed a project called Checklist-as-Code to initially solve the main problems mentioned.

Checklist Concept

It is a concise summary of all production availability standards and requirements, a set of questions that help to classify whether the service is ready to go into production. It was inspired 📔 Production-Readiness Checklist — Production-Ready Microservices.

As this data is specific and unique for each service and its context, we didn’t want it to get lost in some documentation. For this, we created a file pattern that could be filled and versioned along with the microservice code, we called this file .checklist.yaml, and in it stored information about the team, squad, cost center, and description of which context that service was created.

Below you can see an example of this file:

But in order to trust this as a source of truth for our automation, we had to consider:

  • Ensure that the values that are filled in are valid and that developers would be notified in case of changes to the expected values.
  • The data needs to be stored and accessible so that we could analyze the services of each squad and tribe. Another option is to check which repositories in our organization do not have a .checklist.yaml file.

For this, we created an architecture to validate and notify our engineers.

Validation workflow diagram of our checklist files
The continuous validation workflow of our checklist files

Validation of the checklist file

The main concept is that we created a package using Golang to allow validating our checklists. Using the validator library, we exposed an endpoint that could receive the contents of a file and return if this file is valid checking if it has the fields mandatory and meets the requirements we set.

Below we can see a simple representation of how the code looked in relation to the fields of the file:

.checklist.yamlserviceOwners:
tribe: tech_platform
squad: platform_portal
owner: peter.parker@quintoandar.com.br
checklist-struct.gotype ServiceOwners struct {
Tribe string `yaml:"tribe" validate:"required,snakecase"`
Squad string `yaml:"squad" validate:"required,snakecase"`
Owner string `yaml:"owner" validate:"required,email"`
}

Due to the number of repositories and projects that we have in our CI system, adding a step in each pipeline was an unfeasible alternative for us.

So, using organization-workflows we could run a single GitHub Action for our entire organization. This GitHub workflow basically reads the .checklist.yaml file and makes a request to our checklist validation API creating a check-run. If the checklist has any validation errors, the workflow makes a comment inside the pull request so that the developer could work on the fix.

A screenshot of how the comment is done on pull requests
A screenshot of how the comment is done on pull requests

Now we can store the data as we are able to validate the required fields.

Persisting of the checklist data

For that, we’ve added a webhook in QuintoAndar’s GitHub organization that sends all push events to a microservice that propagates this event to an Amazon SNS topic and consequently an AWS SQS queue. We have followed this approach so that other automation can consume these push events and perform automation from them.

Considering our use case, we have a worker that consumes these events and validates if the push was done on the repository’s default branch, which most of the time is the main or master branch, once this push has been done we validate if there is a .checklist.yaml file in that repository. If so, the worker validates if this file is filled correctly, consequently, we aggregate some information from GitHub itself, such as the topics that that repository has, and insert it in an Amazon DynamoDB table. The image below illustrates this process.

Persisting workflow diagram of our checklist files
As we read, validate and insert data from our checklist files

With this data validated and stored, it’s finally time for Backstage. From this data, we populate Backstage Service Catalog through a Processor that we developed.

This processor reads items from DynamoDB and creates the respective entities in Backstage.

Diagram of how checklist entities are transformed into Backstage entities
How we transform checklist data into backstage entities

Thanks to Backstage core features, the spreadsheet for service ownership is no longer necessary.

Adopting Backstage

But we didn’t stop there, with the entities being created and updated into Backstage we developed some plugins internally to help us track service quality.

Security Overview (Plugin)

Our Application Security team used Backstage to show vulnerabilities data from various sources within the component tab, giving more visibility into the security aspect of each repository, fostering a culture of security within engineering.

Screenshot Security Tab in Service component
Screenshot Security Tab in Service component

Service Assessment (Plugin)

The service assessment is a manual tool where teams can discuss aspects necessary for an application and raise discussions about good software development practices, in addition to proposing ideas of how to improve the application quality.

A gif of what our service assembly plugin looks like
A gif of what our service assembly plugin looks like

Toolbox (Plugin)

From the first challenge, we mentioned it was easy to create a frontend plugin for Backstage that would consume the addresses of our tools from an API we have internally, making all developers have access to our tools grouped in each context.

A gif of what our toolbox plugin looks like
A gif of what our toolbox plugin looks like

Culture in service ownership

With the possibility of connecting components (applications, repositories, resources) with squad and tribes into Backstage, we were able to give visibility to our engineers about applications ownership, giving them much more transparency.

Now that we can trust the ownership data, we aim to use it to achieve high-quality services creation from scratch. So, as QuintoAndar keeps growing we have a much better tool to address scalability, reliability, and testability issues delivering an amazing user experience to our final users.

Why not build our own Backstage?

Certainly, we have thought about creating our own service and not using an open-source tool to do so. However, QuintoAndar is a tech company and we want to be part of this growing community. Also, we have two main reasons to believe in the Backstage community:

The right tool for the right problem, at the right time

As soon as Spotify announced Backstage, we saw that the project had enormous potential and that the proposal makes sense with the challenges we were facing. Thinking about the Service Catalog, we knew the complexity and the effort that we would have to invest to develop something completely from scratch.

So, using Backstage to catalog our infrastructure made it easy to solve the spreadsheet and ownership problems.

The Backstage ecosystem and community surprised us

We followed the project closely and saw the astronomical growth of the community and how much Spotify itself invested to make the project grow. Several plugins being developed in a very short time and joining the CNCF were motivating us to not only use Backstage but also contribute and be part of the community.

QuintoAndar is a company that uses technology as its main growth driver through several open-source community tools such as Kubernetes, ArgoCD, ArgoWorkflows, Apache Airflow, and Prometheus, we always strive to be close to the community.

Final Considerations

We had several challenges and we know that we still have a lot to build, improving the developer experience of our developers is a principle for us to scale QuintoAndar in a healthy and efficient way.

Backstage fully meets our will, the community is bringing great use cases and improving the open-source tool, something that we would hardly be able to do with the same speed internally.

Being able to use Backstage and contribute to It is something that motivates us. Connecting QuintoAndar engineering to this developer portal “framework” will certainly help us to solve more and more challenges ahead.

Being part of an open-source community saves us a lot of time not reinventing the wheel, so we can think about what matters: be the destination for housing and conquer the world.

--

--

Gabriel Dantas
Blog Técnico QuintoAndar

Making developers’ lives better using yaml — Site Reliability Engineer @QuintoAndar — https://www.gdantas.com.br/