A playbook for API-first transformation at scale- Delivery Infrastructure Platform Part 2

10 min readNov 18, 2021

In the Part 1 of this post, we discussed about the Delivery Infrastructure Platform areas and all its customers. In this part, we’ll discuss all the tools, services, API that comprise the platform, the options available for you in the industry in each of these areas and guidance on how to think about implementing them.

Now that we understand the customers of the Delivery Infrastructure Platform, let’s look at building one.

We are driven by product, design, and API-first thinking so the delivery infrastructure platform tooling would be focused on delivering an end-user facing API product that is robust, reliable, and easy to consume, The API product is described by an API contract that services publish, and consumers use to interact with a service. Essentially, an API contract, in addition to describing the capability operations and features, should also describe the non-functional features of a service (Authentication, Authorization, SLOs, security, privacy and all its Metadata (such as team information, business capability mapping, visibility (external, internal etc.), version, service name, go-live date, sunset date if it is a deprecated API, and so on).

So, an API contract can be thought of as the following,

API Contract = capability features/operations specifications + non-functional features specifications + metadata

With the API contract describing all functional, nonfunctional and metadata and now becoming the source of truth for everything, it is important that the sanctity (accuracy and correctness) of the API contract is always maintained. The API contract drives service implementation (both functional and non-functional).

Choosing an Interface Definition Language (IDL) for describing an API Contract

There are myriad ways to describe your API contracts (proprietary or open source). For REST APIs, OpenAPI (formerly swagger) is the industry standard way to describe a contract. And AsyncAPI is becoming the industry standard for describing events. There are tooling ecosystem around these so you can use them to your advantage and build only the missing features that your organization needs. Regardless of the IDL format you choose, you would need to add many vendor extensions to describe all non-functional concerns in the specification (depicted in the previous diagram), because these are yet to be addressed items, in these IDL specifications.

Governance Tools

All the tools under this section help your teams to follow a common set of guidelines/policies in the service development in an automated fashion via your CI/CD pipeline. This reduces developer friction for reviews/compliance and helps improving the developer velocity to a great deal.

Linter/Validator- The API Linter/Validator runs many rulesets against the API contract IDL files (swagger.json, asyncapi.json, query/mutation.graphql etc.) and reports violations. Examples of rulesets are design standards ruleset(e.g., API styling ruleset, API protocol (HTTP/REST/GraphQL) specific ruleset, authN/authZ ruleset, i18N ruleset, internet standards format ruleset (e.g., RFC 3339), documentation ruleset, API IDL (OpenAPI, AsyncAPI, GraphQL rulesets), business domain & data quality rulesets and compliance rulesets (security, privacy, legal).

The validator should be fully integrated to the CI/CD pipeline and automatically update the maturity assessment validation criteria in Align and Design process.

You wouldn’t find any API validator in the industry that addresses all the concerns described above, but you may find validators that check your API specification’s compliance against a specific IDL format (e.g., OpenAPI or AsyncAPI). When you choose one, make sure it supports addition of new rulesets in a pluggable way (simple to complex) and do a thorough evaluation.

Code Generator — The code generator compliments the API validator and generates all compliance policies/configurations, service code (functional) and all code/policy/configurations for all non-functional areas (AuthZ, AuthN, Logging, data handling etc.). The code generator also generates API tests (functional, negative/fault injection, security/compliance test cases) and mocks.

Like the Linter/Validator, you wouldn’t find any code generator in the industry that meets all your functional/non-functional concerns (usually described using vendor extensions in OpenAPI/AsyncAPI IDL files or via custom directives in GraphQL specifications), but you would find code generators that generate code from a standard IDL file (swagger.json, asynapi.json). You may leverage one of such code generators and add the missing features.

Language controls and directives- The code generator also generates appropriate language specific controls (e.g., JAX-RS standard/custom annotations for Java based Apps) to enforce many of the run time functions relating to API requests and responses. Examples are, request/response validations, logging as per data classification, field encryptions, error handling/responses, things related to data access and so on.

Except for the standard annotations (like the JAX-RS annotations), you would mostly need to build all other custom annotations.

API contract accuracy and conformance- Once the service implements the complete API contract, the API contract conformance utility checks for the API contract-service implementation conformance by running all test cases that were generated by the code generator. The API contract conformance utility should be integrated with the CI/CD pipeline. The API contract conformance utility also automatically updates the quality maturity criteria of the API PDLC Develop phase. There are some interesting developments in the industry in this area, but they still won’t address all concerns like security/compliance/fault injection tests etc. You would mostly need to build this one or use a testing tool from an external provider and add the missing features.

Gateways

API gateways are very important component of your Delivery Infrastructure Platform. There are plenty of commercial offerings out there in the industry addressing this segment (some in the form of API Gateway only or complete API management solutions (API gateway together with developer portal).

If you would like to build one, then Envoy is a very popular and performant choice.

Depending on the types of APIs (e.g., GraphQL, REST, Events, Websockets) that you are serving, you may even think of separate gateways for each API protocol.

Service Mesh

In a micro-services architecture, the service mesh plays a very crucial role and helps to implement all your non-functional/cross-cutting concerns. All the code/configs/policies generated by the code generator for all non-functional/cross-cutting concerns are deployed as part of your sidecar proxy in the service mesh infrastructure. A service mesh is also a very important infrastructure component for reliable service communication and observability.

If you would like to implement one yourself, Istio is a very popular service mesh infrastructure to consider.

Data Aggregator

The area of data aggregation is a constant struggle for most companies. The ever-changing data needs of end-user facing frontend apps and end-user facing products are very difficult to meet. Access from different types of devices of varied bandwidths, further complicates the data requirements. Yes, you can use patterns like “Materialized View” or a single use data aggregation layer to alleviate the cross-service data aggregations a bit, but it is never complete because of ever changing data needs. In any company, you’ll find 10s of single use data aggregators with most concerns duplicated. By doing so, you are also leaving the burden of data handling concerns to these aggregator teams, which is a tremendous burden on them and their productivity. GraphQL has changed this for better and GraphQL federation protocol further made it an excellent approach to solve all data aggregation use cases. GraphQL federation works very well with a micro service based architecture. Using GraphQL federation, you can build one unified graph that end-user product teams interact with to fetch all their data from many different products that they need and extend the graph to add end-user specific concerns, as required.

We’ve specifically called out “Data Aggregation” here so don’t get confused with service aggregation (particularly the execution of mutation operations spanning multiple services). If you encounter scenarios where you always need to execute several mutation operations across multiple services and via a distributed transactions to serve an end-user product feature, don’t start looking immediately to solve it through patterns like “Saga”, instead go back to the drawing board and seriously inspect the service boundaries-therein lies the problem and solution.

Platform services

There are some platform services you would always need to enable many interaction patterns. Examples are- Request Idempotency, Asynchronous-Request/Reply and File upload/download services. These help with developer velocity as most API operations that you would implement, needs idempotency (e.g., POST or any GraphQL mutation). You would need to build all these services yourself

Discovery

The idea of building services assets with clearly defined business boundaries have not yet sunk in with many folks in the industry. So, it is hard to find any discovery tool that provides features to position/align an API product (and the service asset) to a business boundary (the Align step of the API PDLC) and enable discovery of APIs and services via the business domain hierarchy (Core domain->Subdomains(or bounded contexts)->Capability boundary (or service boundary) if we are using DDD) , or lists all APIs, along with the service assets and also provides discoverability the other way around (i.e. discovery the business domain hierarchy/alignment from an API product or a service name). As we have spoken few times in the previous posts that only when you fully integrate the business capability model with product development lifecycle and with everything you do in your organization, you realize its full potential. You really need to build this in-house as there’s no alternative or tooling available outside.

The second aspect of discovery is the discovery of API specifications (call it as the API explorer) that are written using a particular IDL format (for example, OpenAPI for REST) and provides all other features like Try it now testing through mocks, generation of different views of the spec and so on). There are many commercial offerings in this area and in case you want to use an external/commercial offering, instead of trying to build it yourself. The API explorer should provide all features that a Consumer Role would need to integrate with the API.

The third aspect of discovery is the discovery of metadata relating to every asset. See the “Metadata” section for more details.

The fourth aspect of discovery are various aggregation dashboards on per API/service basis, where consumers (particularly business owners, service producers) can choose various parameters (failure rates, adoption, tech debt management, quality, usage, overall health).

Finally, Search (advanced/fuzzy) is a very important feature of your discovery tool.

PDLC workflow tool

The PDLC (aka Product Development Lifecycle) workflow tool enables collaboration among all stakeholder roles, in the API/service development lifecycle, for them to participate and complete the required artifacts, as required in each of the phases. The tool should enable guiding the stakeholders on their artifact contributions, perform automatic state transition from one lifecycle state to the next, based on maturity/quality assessment score in each phase, notifying appropriate stakeholders for SLA breaches in any of the phases, and automatic assignment/reassignment of tasks to a stakeholder role, as appropriate for each phase. The PDLC tool drives your API/Service development, from concept through launch so it is one of the most important tools in the Delivery Infrastructure Platform.

There are some recent developments around PDLC workflow collaboration tools in the industry from few vendors so you can either choose to use them or build this on your own.

Metadata

The idea of maintaining metadata of all assets in your organization and making them available as an API product is a very powerful concept for building a self-service platform ecosystem. The metadata API product provides meta information of all APIs/services (for example, schemas of all APIs, their versions and current lifecycle state, ownership, support info, maturity/quality score, onboarding needs etc.). When you API/service-enable metadata, it can be used in multiple different ways by varied audience, for example, your compliance team may want to do regulatory reporting of all APIs that handle personal data and the list of personal data on per business domain basis, your CI/CD jobs can call the Metadata API to determine the quality/maturity score of an API/Service and surfaces it as a compliance requirement/prerequisite for deployments. The values it provides are aplenty.

Conclusion

Throughout this post and all previous posts, we always talked about micro-services, API products, business capabilities together, not in isolation. When the idea of “Micro-services” was first coined, the goal was always to create autonomous product teams, with each team focusing on delivering a distinct business capability and owning the complete end-user experience of that business capability, and with APIs as the only way to interact with the services. So, the idea of focusing on business, managing it as a product (essentially the API is the product) to delight the end-users by delivering great experience (both functional and non-functional) were always fundamental to developing a Microservice-A product that is not reliable is not a great product. A product that is not secure or doesn’t know how to protect the customer data is not a great product either. Thinking the business architecture, Microservices, APIs and the underlying technology architecture together (enabled through Delivery Infrastructure Platform tooling) helps you thrive, enable teams to innovate at rapid speed vs not thinking them together, results in unintended consequences, such as fuzzy service boundaries, fuzzy APIs, unintended service dependencies, developers spending time on developing features that are non-value adding (lost productivity), API contract-service implementation mismatch(inaccurate/obsolete API contracts) and so on.

Calling the tooling driving the API product development as “Delivery Infrastructure Platform” in the post is intentional because there’s no such isolated/separate category as API tooling. There’s just one set of tooling- The Delivery Infrastructure Platform which helps you to develop robust/high quality APIs, services and data products. Delivery Infrastructure Platform is a key building block of an API-first transformation and the tools that enable the thinking of the business architecture, service and technology architecture and APIs together are key to success.

This article is the sixth of seven that explores the API-first transformation playbook in detail. In the next post, we’ll discuss about some of the cultural things and how to successfully navigate them and strategies to operate at scale in organizations of any size.

A playbook for API-first transformation at scale- Delivery Infrastructure Platform Part 2

Written by Jayadeba Jena