How to build a technology platform
This is a rumination on how we are thinking about building Myntra’s logistics capabilities. The attempt is to develop a mental framework for building software systems and teams that can be applied to any problem domain.
Let’s try to build a logistics technology platform for the world!
- SaaS platform for logistics companies.
- Out-of-the-box modelling and default behaviours for logistics entities.
- Standard method of customizing core behaviour to create new experiences without modifying the platform.
What are the problems in building a software system for the long term?
- Changing processes : Frequent changes and even complete overhauls of the logistics operations model.
- Evolving business : No one can predict how the business will evolve.
- Reducing go-to-market : Expectation of ever increasing agility and reduced time to market despite the above.
Given this, how do we write software that maintains/scales well and helps us keep pace with the business? We have two important observations here.
- Technical solutions built for specific business problems are not reusable in a fluid business landscape.
- We encounter different behaviours for the same entity more often than we encounter new entities.
A platform approach solves this unknown-unknowns dilemma neatly by creating a reusable set of business tools which can be arranged in different configurations to achieve new and varied outcomes.
But we already have a platform!
No we don’t.
- We often use “platform” to mean “this system/suite of systems does everything related to xxxxx”. Replace xxxxxx with taxation, discounts, logistics.
- Strictly speaking, a platform allows others to do xxxxx in a minimally opinionated way.
What we have is a product that is trying to fill the shoes of a platform.
- Our software is built around the business as we know it today.
- Changes to business will either result in ever increasing “if-else” or recurring rewrites.
- It is difficult to change behaviour of the system without touching core codebase. It is near impossible to evolve software at the edges.
We have something like this.
- A set of entities and behaviours that we consider core, but are then forced to modify for each use case. Different behaviours are considered core for their respective use case.
- A shallow business logic layer with fuzzy boundaries since we cannot properly define what is core and what is not.
- A trivial API layer which delegates everything to downstream components and does not abstract callers from the underlying architecture.
Because entities and their behaviours are locked together intrinsically, we end up with a system which is being pulled in directions it was never meant to go in.
Envisioning a true logistics platform
Characteristics of the platform
Taking a leaf out of Jeff Bezos play book, we define our platform to have the following characteristics:
- Built first, then reused forever : Has to be built agnostic of specific business needs, and then used for addressing business needs.
- Externally programmable : Must expose hooks to build customizable experiences on top of it.
- No exceptions to the above two.
As we go about building a platform, how will we know if we are moving in the right direction. A few key benefits can be tracked to evaluate this.
- Components are increasingly isolated by rate of change. Some change often and some perhaps not at all.
- Business decoupling : Fewer and fewer systems should be impacted by any business requirement.
- Single source and modelling for business entities.
- Homogenous development experience and consensus on design and development practices.
- Well defined owners of different technical components. No shared responsibility.
- Increasing team autonomy (aka reduction in inter-owner dependencies).
- Agility and reduced time to market for business features.
Bird’s Eye View
Here’s a quick look at Myntra’s technical ecosystem and where our logistics systems are to fit in it.
- The green boxes are business facing products.
- Under them, we have a stack of tools to serve an increasing degree of abstraction. Each layer is a platform in itself, compliant to all platform building guidelines.
- While there is a sense of increasing abstraction from hardware management to business capabilities, this is not really a “stack” since all layers of the “stack” are independently available for use at any level. e.g. Service discovery can be used by CEP system as well as last mile shipment service.
The non-logistics pieces are included only for context. Each of them merits a deep discussion all by themselves, but we have a platform to build. So onward!
Separating product from platform
- We identify 2 constructs : platform services and product services
- Platform services are business agnostic capabilities.
- Product service is any specific experience or workflow built on top of one or more platform services.
- They are not aware of how they are being used in the business context.
- They provide standard read/write constructs and guaranteed SLAs.
- They don’t understand the content of workflows beyond their trigger conditions.
- Products deliver business value by building optimized end-user constructs like API gateways for data aggregation, UI for manual interaction, and workflows for process automation.
- They may themselves be made up of multiple components.
- They may extend platform entity data models by storing local versions of that data and/or storing additional data against it.
If we do this right, our software should look like this. Notice the shrinkage of core and increase in size of logic and API layer.
- Small entities with minimal essential states and business validations.
- Ability to register more states and validations for different use cases, thereby adding extensibility to the entity model.
- Ability to externally configure workflows to stitch actions across multiple core entities.
- Handling of infra concerns like rate limiting, authentication etc.
The business logic layer would increasingly drop out from within core services onto external workflow management systems.
Interactions between platform and products
- Since platform service are unaware of the products built on top of them, they should ideally not have code to invoke those components directly.
- Platform services can drive cross-service interactions by triggering workflows configured by the product services.
- Platform service broadcast all their events (business events) over some messaging medium. Other platform services or products can hook into these events to build functionality in a decoupled manner.
- While the above is ideal, as a compromise, products can configure the platform service workflow mapping so as to call any other single API (instead of always calling the workflow service API). This can be done for several reasons — the product service may want to encapsulate the workflow logic in its code or the business logic involved does not require a workflow (multi-step process) and can be achieved by a simpler piece of code.
- Components built for the experience layer may be pushed down into the platform layer as they evolve to serve more use cases in a generic manner.
Discussing a platform service
The main concern in building a platform service is how to make it extensible without modifying it repeatedly, or at the very least to be able to modify it with little danger of impacting existing functionality. We adopt a state machine and workflow driven approach to this.
- Platform service exposes API which has some core functionality.
- On top of that, it can trigger a custom workflow defined by its users.
Every platform service is composed of these parts.
- Rate limiting
- Call routing to correct component.
Some of these concerns may be outsourced into an external API gateway, thereby reducing the complexity on the service itself.
Entity core business logic
This logic is uniform across across all tenants/clients etc and pertains to maintaining sanctity of that entity. We only work with the service’s own entity here.
Configurable state machine
- All writes are validated via an extensible state machine. Every entity MUST define a base set of states which can be extended but never reduced. This also defines the set of actions supported on the entity. e.g. Trip service defines CREATED->STARTED→COMPLETED as the core set of transitions that must happen in that order. Myntra trips team might want to define an extra state to make this : CREATED→PENDING-START→STARTED→COMPLETED. Both of these will be configured and invoked while trip updates are happening for Myntra and Store trips respectively.
- Writes that don’t involve a state transition cannot be validated by platform service.
External workflow trigger
- An external API can be invoke to trigger additional workflows as configured for some combination of entity attributes.
- It only calls the given external end-point with the input given to the service and output emitted by the service. The responsibility of interpreting this data and enriching it further rests with the called API.
- Since the triggered workflow may call other services which in turn have their own workflows defined, a single event in one service can create a workflow fanout.
- The platform service owner is responsible for guaranteed invocation of the configured workflow but not for the contents of the workflow. Development and operational ownership of the workflow rests with the team which created it and mapped it in the platform service. This separation of concerns must be honoured EVEN IF BOTH PARTIES ARE THE SAME PERSON.
This module broadcasts every successful event via a messaging medium. The publication must be guaranteed if an event has happened.
Building a product experience
- Product building effort should be centred around solving business problems.
- We want to build efficient workflows(ui/backend) using platform components as building blocks.
- New components and entities relevant to this product should be built outside of the platform.
- All architecture, design practices like scalability, micro-services, security, maintainability etc apply independently to product building just as they apply to any software development.
- Teams building services on top of the platform integrate with platform services via the APIs, workflow service, and event bus.
- These APIs may be pass-throughs to platform service, thin wrappers on platform services (minor changes to API structure and language), or heavy duty APIs which compose multiple platform services.
- These services may directly use workflow service and other parts of the “lower platform” offerings for its own purposes. i.e. Not everything must go via the logistics platform layer.
- The APIs exposed here need not be generic. The focus should be on building effective/optimized API for specific use-cases.
- UIs can be built directly on the platform services. However, it might be difficult to bridge the gap between the user experience and platform since the latter does not do any cross entity aggregation, nor build any APIs for specific product use cases.
- Typically the UI team would build a backing API layer to take care of data aggregation and massaging.
- The backing API would be built as per the API experience guidelines mentioned above.
- Most product changes are expected to impact only the UI and the backing API layer. Platform components should only be impacted in extreme scenarios.
The human perspective
Let’s look at how we are doing on our metrics so far. Since we have discussed only tech guidelines, let’s look at technical metrics.
1. Components are increasingly isolated by rate of change. Some change often and some perhaps not at all
Looking good. Platform service seem to change rarely. Product layer changes often without impacting the platform (Except in terms of scalability etc).
2. Business decoupling : Fewer and fewer systems should be impacted by any business requirement
Changes to business UX or workflows impact only the product layer, and even there the change is isolated by good design. UI changes typically only impact the UI layer and the backing API’s read layer. Operational workflow changes are dealt with mostly by reshuffling of defined technical workflows.
3. Single source and modelling for business entities
The universally acknowledged entities and their data models are all located in the platform. Meta data about them may be scattered across products, but it is only of local relevance.
4. Homogenous development experience and consensus on design and development practices
Since the base data model is uniform and so is the tooling to build workflows, state machines etc, an effective guideline is already in place about how to build new systems. e.g. Shipments look roughly the same in every system, and can be manipulated in a consistent manner.
The constraints enforced on product-platform interactions and an increasingly deep uniformity in development tools set a design and implementation standard, any deviation from which is easily detected. This is either corrected or explicitly acknowledged as a genuine requirement. If it is a genuine requirement, it might be a good candidate for assimilation into the platform.
Organizing teams for platform architecture
Now that we are able to visualize decoupled software, we need a team structure which can leverage this to achieve agility and scale. An org structure explicitly designed for this essential since developer discipline to do the right thing is hard to enforce over any large team size. We can’t rely on activist developers to keep a large company on the straight and narrow.
Amazon’s two-pizza team is a good working model to achieve this.
- Create small teams with ownership of very specific technical or business problems.
- One team officially does not care about teams.
- A team will not be disturbed randomly for other stuff simply because they don’t have “enough work” in this sprint. It is the responsibility of the team to push the boundary on their particular charter.
- All teams MUST comply with the strict guidelines around how to write software (product-platform separation etc) which prevent technical anarchy.
- We also mandate that all internal, re-usable software built by a team should be pushed into the platform if possible. Otherwise, the team must demonstrate that the software isn’t re-usable by definition.
A logistics example
Instead of having a single big LMS team which spans across the entire business and technical domain, we create 5 teams with narrow but deep responsibilities.
Logistics platform team 1
- Master Bag
- Shipment Tracking
Logistics platform team 2
- Network (Hub, lanes, and mappings)
- Courier contracts (including HLP, store etc) and handover configurations
- Geo Data
Logistics platform team 3
- Courier Integrations and manifestation
- Inbound API Gateway
- Outbound client integrations
- Capacity Engine
- Promise Engine
- LMS inbound API gateway
- LOVE (Screens till DC handover)
- Hub App
- Shipment ,master bag, and container workflows
MT LMS last mile team (Product)
- HLP and store handshake workflows.
- LOVE last mile screens
The idea is to decouple teams as much as we decouple systems to move ever faster.
A lot of these principles are being tried out in Myntra right now and are the stuff of hot debates all across our tech teams. We hope to establish a framework for platform architecture thinking which will stand the test of time and guide us on this exciting journey.