Unlocking Efficiency: Optimizing Platform Engineering at Tamara

Lam Tran
Tamara Tech & Product
5 min readJun 10, 2024

Introduction

Engineering as a function has evolved significantly over the years, from building basic systems to managing complex, integrated environments. The rise of DevOps brought collaboration and automation, yet challenges persisted in handling the myriad tools and infrastructure. This need for a cohesive and scalable approach led to the development of Platform Engineering. Platform Engineering enables development teams to build, deploy, and operate applications efficiently and reliably by providing streamlined platforms and infrastructure.

What is Platform Engineering?

Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era. Platform engineers provide an integrated product, most often referred to as an “Internal Developer Platform,” which covers the operational necessities of the entire lifecycle of an application.

Platform Engineering at Tamara

At Tamara, our team comprises over 200 software engineers organized into multiple squads such as Checkout, Payment, and Card, as well as various chapters like PHP, Java, Go, and QA. Each squad includes engineers with different roles, including Backend, Frontend, and QA, ensuring a well-rounded approach to development.

While we embrace the diversity in software development and delivery approaches, it presents unique challenges and growth opportunities. To improve our current setup, we have identified several key areas for enhancement. These include better measurement of developer and software productivity, standardizing pipelines for a consistent experience during development and deployment, and creating a centralized technical hub for accessing project or repository-related information.

To address these needs, we have assembled a dedicated team of two software engineers and three DevOps engineers. Their primary function is to establish a robust foundation for other teams, such as development and data teams. By providing the best tooling and workflows, they aim to enhance the daily experience of our developers, fostering a productive and positive work environment.

Challenges

Development Containers

One of the most common challenges we come across is ensuring a seamless onboarding experience for new team members, particularly those who will work on our existing repositories. At Tamara, our polyrepo approach, adopted from the beginning, resulted in numerous repositories written in different programming languages, each potentially requiring different language versions and tooling configurations. This diversity can lead to varying tooling requirements. For instance, one engineer might have installed Java 11 with Maven on their machine for Repository X, but when working with another repository, they require Java 17 with Gradle. This frequent switching of tools can disrupt their workflow.

To address this issue, we have implemented Development Containers (dev containers), a widely recognized solution in the community. When developers check out a repository, they can open it in a dev container, ensuring consistent availability of IDE settings, programming language versions, configurations, and other customized tools. This unifies the development environment for all engineers working on the same repository.

Initially, we introduced dev containers for most of our legacy repositories to streamline the onboarding process for new team members and enforce consistent dev container settings across new repositories based on programming language defaults. While this has brought significant benefits during development, there are still areas for optimization, such as pre-building dev container images instead of building them from scratch. This optimization would further enhance the efficiency of our development process.

Internal Developer Platform

As teams and products experience significant growth, engineers introduce new technical software to simplify the product development workflows. This includes central secret management tools for storing application and infrastructure secrets, continuous deployment tools for managing software deployments across teams and services, and version control systems for storing service codebases along with their documentation and dependencies.

However, this situation can pose challenges for developers. They may struggle to discover service dependencies, find API documentation, or perform testing with mock data. Additionally, they need to context-switch between different platforms to view the entire lifecycle of their features. This involves reaching out to other teams for API access through tools like Postman or Confluence, monitoring the CI process to ensure successful test runs before merging pull requests, and checking ArgoCD for deployment status and New Relic for service performance.

To address these challenges and allow developers to focus on business value more efficiently, Tamara has built an Internal Developer Platform (IDP). We have chosen Backstage as the IDP solution due to its flexibility for customization and its extensive community support. Our implementation of Backstage includes key features such as a software catalog, enabling developers to quickly discover service repositories within their scope and understand dependencies and API consumption between services. Each service also renders its own API documentation using the Redocly UI widget and OpenAPI v3.

With IDP, now developers can focus on building features, without worrying about navigating multiple platforms jumping during the development lifecycle, which also reduces lead time. The platform team is responsible for maintaining the integration of Backstage plugins and welcomes contributions from other developers to further enhance the platform.

Observability

Observability provides engineers with a proactive approach to analyze and optimize their systems based on the data they generate. Observability platforms offer a centralized solution for collecting, storing, analyzing, and visualizing logs, metrics, and traces, providing a connected real-time view of all the operational data in your software system. Additionally, these platforms offer the flexibility to ask questions about your applications and infrastructure, enabling a better understanding of system behavior and facilitating improvements in system performance.

For years, Tamara has relied on New Relic as its observability platform. However, when developers need to understand or test the telemetry data (logs, metrics, traces, profiles) on their local machines, it becomes challenging. To address this, our platform team has developed a reusable module using the OpenTelemetry and Grafana LGTM stack. This module can run locally alongside developers’ applications using Docker Compose. By using the Compose Include command, developers can easily incorporate the module into their local environments. Furthermore, we have plans to support it in Kubernetes environments.

By shifting observability to the left, all engineers can have a unified troubleshooting experience for applications. If they utilize correlations among metrics, traces, and logs in the production environment, they should be able to apply the same approach in other environments, including local development environments.

Environment as a Service

To facilitate fast and efficient shipping, it’s vital to have environments that support dynamic testing for all stakeholders. At Tamara, with approximately 10 teams, we decided to provide each team with an isolated development environment using ArgoCD and Kubernetes namespaces. Initially, this approach worked well. However, as the teams and features grew significantly, it posed challenges for both the development and DevOps teams. Developers within the same team started encountering conflicts related to application logic or configuration when testing in the shared environment. Additionally, the DevOps team had to invest considerable effort in managing development environments, including resource utilization and cost-effectiveness.

To address these issues, we aim to build ephemeral environments for each engineer. These environments will have dynamic provisioning and decommissioning capabilities, ensuring they do not share any resources with other environments. Furthermore, we will make it easy to clone existing environments when needed. This level of flexibility will allow developers to focus on building application features without worrying about the complexities of environment management.

References

  1. Platform Engineering Blog
  2. Containers Dev
  3. VS Code Dev Containers
  4. Five Reasons for an Internal Developer Platform
  5. Backstage
  6. What is Observability
  7. OpenTelemetry
  8. Preview Environments

--

--