5 Lessons for a Successful Internal Development Platform

One of the most popular trends mentioned in the State of DevOps Report in the last two years is the platform. Maybe you’ve heard about other similar buzzwords, such as IDP, DevX, developer experience and platform engineering.

But let me step back for a moment and first define a platform:

An Internal Development Platform (IDP) is a collection of core infrastructure services that most of the deployment teams require in order to develop awesome products.

So why is it important?

Embedding an IDP can have an amazing impact on your organization, especially when you have several product teams that all have similar demands for shared infrastructure.

In the last three years, I have been part of the CyberArk DevOps group and led, together with the team, the architecture of the Internal Development Platform. During these three years, we’ve had many successes and failures and learned a lot along the way.

In this blog post, I will share five lessons we discovered for building a successful Internal Development Platform.

Lesson #1: Ensure that Platform Reliability is Essential

One of the first lessons we learned was the importance of reliability in our platforms.

Building the platform sometimes feels like a race where we rush to gain all of the users’ requested capabilities.

But, if your infrastructure services are unreliable, the impact is multiplied by the number of teams that you serve (your customers). The conversation about the platform suddenly becomes diverted from focusing on the capabilities and moves to the dissolution of trust in the platform itself.

This led us to the first insight we gained about the platform team’s performance:

The platform team’s success will first be measured by how reliable their services are.

Lesson #2: Pave a Golden Development Path

As platform teams, you will be requested to support various overlapping capabilities and technologies by the development teams. For example, from the DevOps area, supporting multiple branching strategies.

This is very tricky since the more capabilities you have, the more effort you will need to spend on maintenance and support.

We found it helpful to define a golden path for development teams who want to use the platform.

This golden path is a set of technologies, tools, and standards that the platform knows how to best work with. Using it ensures the developer the best experience from the platform.

You should define it accurately, including how wide the path is for the development team (i.e., how many deviations exist in the golden path). Then you can publish it.

It should be a process that includes talking with the teams’ most influential developers and collecting their stack usage, processes, and thoughts about the stack and standards.
This will allow the organization to stay updated with the latest toolsets and standards in the industry.

Lesson #3: Ensure the Right Amount of Autonomy

When centralizing infrastructure services for several teams, you end up being placed in a protective position for changes.

You want to keep the service reliable and controlled so that the limitations and restrictions spring up like mushrooms. Your goal is to become a bottleneck to the customers’ requests.

Unfortunately, you are also now full speed ahead towards developer frustration and damaging the developer experience you aim to achieve.

The key to success here is to categorize the privileged actions of the users into areas of impact:

  • System-level impact — Actions that might make your maintenance/operation more difficult. For example, deviation out of the paved golden path. Or actions that might create an outage for the service and affect your entire user base.
  • Team/User-level impact — Actions that only affect team efficiency from an operation/reliability perspective.

Any system-level actions should be provided via a managed self-service that will protect your operational and reliability interests.

Any team/user-level actions should be permitted, without being dependent on the platform team. These will be accompanied by documented best practices for increasing work efficiency.

This way, you ensure the right amount of autonomy without compromising your needs as the platform owner.

Lesson #4: Simplify Your Services

One of the most frustrating things you may experience as a developer is using a service or library that requires unclear parameters or has incomplete documentation — if there is documentation at all.

You want to be careful not to provide these types of services, as they can damage the developer experience and increase the number of support tickets that you receive.

All products and services should strive for simplicity. Creating simple-to-use services will ensure quick adoption and happy developers.

One way to achieve this is to seriously consider each parameter that you require from the user. Always ask yourself before placing a new parameter: “Can I figure out this parameter automatically? Is there a default value that will catch the common use case?”

And of course, documentation. If you have UI, embed the documentation links and provide a short description so the developer won’t need to travel across sites to understand how to use the service.

Lesson #5: Make Your Platform Status Visible

As you take full ownership of the infrastructure that operates the platform, problems are bound to happen. We are human and mistakes that can potentially cause an outage are inevitable.

Your customers and your developers need to know what is going on in real-time. Otherwise, they will reach out to you again and again, to understand if what they experienced was a system-level bug/failure or if it was incorrect usage on their end.

Consider how other platform vendors always publish a status page to communicate to their customers’ relevant information, such as system-level issues that are occurring.

The key to success here is to well-define the areas and workflows that cover your services, monitor them and make their status visible to the developers.

The Evolving Platform Journey

Architecting a successful platform can be challenging. I believe that following these core lessons we’ve learned will help you achieve both a better architecture and developer experience for your customers.

Behind each lesson lies a journey of failures and successes. It’s an evolving journey on which we continue to gain insights on how to create the best Internal Development Platform possible for our customers.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Shlomi Benita

Shlomi Benita

DevOps system architect at CyberArk. Love to secure stuff, architecting, code, and solve problems.