4 Lessons on Site Reliability Engineering That You Can Use for Anything, Even Parenting

8 min readJan 21, 2021


From a Conversation with Ricardo, Director of Infrastructure at OLX Europe

Ask Ricardo why he became a site reliability engineer (SRE), and you’ll hear a great story.

“When I was a kid, I really wanted to have a ZX Spectrum 48. I got one when I was 12. At that time, games were on tapes and it took around 40 minutes to load (if all went well) — talk about managing expectations. I tried some programming on it for a while. Later, I got my first PC, and that’s when it truly started. I began investing a lot of time in learning computer languages and operating systems. I enjoyed creating small home networks (Ethernet and Token Ring), disassembling machines, and building my own custom-built PCs. With tons of trial and error, I learned a lot,” describes Ricardo, Director of Infrastructure at OLX Europe.

Recently, we sat down with Ricardo to get his insights on infrastructure and work in the engineering sector. We’ve put together a list of four insights he provided.

Now, let’s begin…

Move fast through the unknowns

“We are not afraid of taking big tasks with a lot of unknowns and we move really fast with purpose,” responds Ricardo when asked about what makes work at OLX so exciting.

Since Ricardo has joined OLX, his team has achieved a lot:

  • In the first stage, they migrated from data centers to AWS in the cloud, using a Lift-and-Shift approach. They then began rearchitecting our infrastructure in the cloud.
  • In the second stage, they moved from virtual machines to containerized workloads. All countries are now running on EKS with a bunch of native applications.
  • Moving forward, they’ve stayed busy identifying better systems designs and constantly improving automation throughout the OLX platforms.

All of this work has delivered many benefits for OLX.

“We’ve achieved significant cost reduction, greater reliability, reduced idle time, and more efficient usage of resources. For example, we were using a well-known content delivery network, or CDN provider, for many years. This year, we migrated to another CDN provider and we’ve achieved millions in savings.”

This fast-moving approach and a commitment to making necessary changes have become part of the culture at OLX. It’s not easy work, but the challenge makes the job fulfilling.

“We’re always trying to improve. We’re seeing that our efforts will deliver long-term rewards for our team and our users. Our infrastructure is now more efficient, secure, and reliable,” emphasizes Ricardo.

OLX Group SRE Team

Design for failure

As an SRE expert, Ricardo thinks a lot about service reliability. To make the OLX platform more reliable for our users, Ricardo studies past issues and learns from those mistakes.

“What’s interesting is that one of my interests outside of site reliability engineering is studying history. I love history, especially Roman history and contemporary history. While learning about the past, I like to study mistakes because I always want to get to the root cause of why something happened. And I want to learn how to prevent that mistake from happening in the future. If we want to move forward successfully, we need to be bold without ignoring past experiences. We must learn from the past,” states Ricardo.

Throughout his career, Ricardo has made a point of learning from mistakes and issues. Before coming to OLX, Ricardo worked at Amazon, where he helped a lot of large customers solve infrastructure issues.

“At Amazon, it was kind of like dropping in on a parachute in highly stressful environments to solve problems. I gained a lot of experience working with a variety of big customers and recognized how to fix and prevent all sorts of issues. It was a great mix of experiences.”

After working at Amazon, Ricardo moved to Lisbon. His decision to join OLX actually came at the last minute, right before he was about to pursue another opportunity.

OLX caught my attention because I recognized the ability to create and experiment here. That differed from my work at Amazon. While I had been building from scratch already in other roles, OLX really has provided me with the chance to build a large, scalable cloud native infrastructure from scratch. My mix of experiences has served me well in this role.”

The OLX team at AWS re:invent

Ricardo understands the type of issues that can arise with large-scale infrastructure. That’s why he’s ready to identify, address, and learn from mistakes. This way, the platform can become more reliable.

Everything that we build at OLX is built with the intent of self-healing. Our infrastructure is built with the intent to recover from losses, inactivity, and other issues whenever possible. It’s the way we strive to architect our applications. It’s a well-known cloud principle: design everything for failure.”

As Ricardo states, don’t be afraid of failure. You can learn from it and improve!

“If I could tell my childhood self one thing, it would be to start taking risks sooner. Because you’ll have more experiences and learn more.”

That advice Ricardo has for his younger self actually fits into why he finds work at OLX so exhilarating.

“The technical challenge and scale of what we’re doing at OLX keep me motivated each day. SRE for such a large platform is incredibly important. Building for the public internet is a completely different ballgame. The pace is fast, as we have to keep up with the ever-changing landscape. We have to adapt and learn. What we create has a big impact on customers.”

Parenthood, law and order, and being there

During our discussion with Ricardo, we discovered something intriguing: The connection between parenthood and SRE.

“Your education and the way you are raised counts,” says Ricardo.

“That’s how you get your set of principles. My father was in the military and there was a lot of law and order and discipline. This attention to detail is what’s been instilled in me. I’ve taken the way I was raised and applied it to my work and my parenting. I do believe my parents have had a big influence on me.”

If you think about it, you can apply SRE principles to parenthood, or vice-versa. There’s an insightful article on Medium about the connection between parenthood and roles in tech. Specifically, the article states that parenthood requires you to communicate mindfully and be consistent. These two principles apply to Ricardo’s role at OLX.

With the family

I also think a lot about reliability as a parent. We have to be there for our children. That principle applies to my work at OLX. Reliability must be the focus of the platform. Additionally, as a leader at OLX, being there is more important than ever. My commitment to reliability, which has been instilled in me since I was a child, has helped me be a better leader during the COVID-19 pandemic,” attests Ricardo.

“Although I had remote teams before, I dedicated a lot of time to connect with my team in-person. With the pandemic, you can’t have those events. My team isn’t about your rank. It’s about trust and the connection you have as a team. SREs can be in highly stressful situations and we need to be able to support and rely on each other. Rank isn’t given. During Covid-19, I have to be more available than ever for the team. I have to be present to my team’s problems. This way, we can achieve our goals.”

To ensure the company works as effectively as possible, Ricardo has placed extra importance on internal structure.Teams are embedded in packs in OLX Group’s verticals and horizontals platforms. Then, there is a concept of a central chapter of SRE. This ensures relationships can flourish and the team stays on the path toward achieving the mission.

“To empower teams, we must work closely with each team so that they have what’s necessary to meet objectives and goals. We must come together. Working in isolation isn’t an option for us.”

Ricardo has taken extra steps to ensure developers can work as efficiently as possible. He wants the best possible developer experience at OLX.

“We created a complex system to make things more efficient on the OLX platform. Now, we have to create easier interactions to abstract that complexity so developers can experiment and create new applications in a self-serving way.”

Go to space…or listen to Nirvana and classical music

Outside of his work at OLX, Ricardo has a lot of interests and passions. For example, he loves learning about space.

“Space is the new frontier. I would like to be a part of it but maybe not in my lifetime,” says Ricardo.

When he’s not looking to the stars, you may find Ricardo listening to Nirvana and rock music. He also recently has gotten into classical music.

“I’ve always been fascinated by the unknown. That’s why I’m opening myself up to new music. Within classical music, there’s a lot of pondering of the unknown and musings about the universe. The connection intrigues me, and perhaps explains why I’m really getting into classical music. It also explains why I like space so much. We have to keep that sense of wonder.”

Back at OLX, the team may not be going to space (yet). But they are forging ahead, ready to build a better tomorrow.

From the cabins to the beaches

That finishes our conversation with Ricardo. We hope you’ve learned a lot from his insights into site reliability engineering and having professional success.

Before we conclude, let’s hear some life advice from Ricardo:

“Enjoy the day. I come from Lisbon, so I love going to the beaches. Also, you can’t understate the importance of family. I come from a big family and had eight cousins my own age. We would spend time at our aunt’s house in Costa da Caparica and having fun on the beach together. These are my best memories: Being with friends and family.”



