Our Award Winning John Lewis Digital Platform

Rob Hornby
John Lewis Partnership Software Engineering
11 min readNov 11, 2019

We are Rob Hornby @rob.hornby (Platform Lead) and Alex Moss @alex.moss (Platform Architect), John Lewis Partners working on the JL Digital Platform.

Today, we’re writing about our recent success in the DevOps Industry Awards, where we were crowned victors on the night for the Best DevOps Cloud Project!

Our Platform As A Product

At John Lewis & Partners, we concentrate on providing exceptional service and standards across our retail stores and online. With ever-changing consumer needs and behaviours, we made it a strategic objective to move to cloud so we could create new product offerings faster than ever before.

The JL Digital Platform is the product that helps teams deliver to that goal. It has already produced profound results. It’s enabled us not only to keep up with a fast-moving online retail landscape, but also to put ourselves in a position to provide the differentiating services that matter to our customers. It’s allowed us to innovate quickly, deliver new offerings and continue to provide best-in-class services for our engineering teams.

The time it takes us to create, enhance or innovate around online services has gone down dramatically. This gives us more time as an organisation to focus on our customers — ensuring that our commitment to them continues to be at the heart of our organisation as the retail space evolves.

Building On Solid Foundations

https://unsplash.com/photos/9jPJrfLTBi0

Three years ago, as the retail sector entered a period of severe economic turbulence, we realised we wouldn’t be able to deliver product offerings to the customer with our existing technology landscape. We were releasing once a month through a monolithic off-the-shelf E-Commerce application hosted in our own data centers.

In 2017, we established a new strategic objective to re-platform to a cloud-based microservice architecture, enabling us to deliver digital propositions more quickly to market.

We’ve written about this first year in cloud before — a Kubernetes-based bespoke platform on top of Google Cloud Platform (GCP) was born — which then became known as the John Lewis Digital Platform (JLDP).

This work served as the foundation for what was to follow. In the year since, we have continued to build upon this platform, continually evolving it to meet the varied needs of an increasing number of teams and business services, who quickly saw the benefits of the platform we had created.

Mapping The Journey

https://unsplash.com/photos/J3JMyXWQHXU

We knew we needed to be faster to market and were aware of what slowed us down. The platform evolved organically initially through a small cross-functional team experimenting with new technology. From the outset we established it as a continuously evolving product, rather than a project with a fixed lifespan, always keeping an eye on our overarching vision:

JLDP will empower teams through a frictionless & stable state-of-the-art platform, so that they can quickly deliver innovative & high-quality services for our customers.

We had three primary goals for the platform

  1. Increasing the Pace of Change. Traditional infrastructure took three months to procure and provision and even then suffered from inconsistencies. It had a complex Path To Production — typically requiring four weeks to release on top of the associated prioritisation and development effort. We had to do better. We felt that we could achieve this through adoption of Continuous Delivery and Operability practices, driving automation and making as much as possible self-service.
  2. Resilient & Secure Platform. We knew we couldn’t do this without also maintaining the high standards for reliability and security that our customers expect. www.johnlewis.com is critical to JL & Partners, particularly at the peak periods typical for general merchandise retailers.
  3. Developing the Skills of our Partners. The changes introduced by the platform would also represent an opportunity for our own people to grow their skills and capabilities. Instead of wrangling a commercial off-the-shelf application, our Partners would be building new services from scratch. We wanted to make sure we were arming them with modern tools and techniques, putting these directly in the hands of our engineers and wider community for the benefit of everyone. We partnered with Google to enable key skills and learning, as well as other organisations also with a focus on learning.

Establishing The Pathway

https://unsplash.com/photos/C0koz3G1I4I

JLDP is a differentiating product built by John Lewis & Partners for John Lewis & Partners, rather than a one off commodity infrastructure project.

The team works on a quarterly basis, through OKR’s to map out capabilities, develops them via lean, kanban techniques and silently launches new features to product teams with zero downtime. There is a focus on solutions underpinned by open source technology.

After delivering the initial migration of the frontend for johnlewis.com, demand for JLDP quickly grew, and the JLDP team built a fully automated, self-service pathway for product teams in response. We refer to it as an opinionated “Paved Road” (borrowing the concept from Netflix’s blog post), based on Continuous Delivery and Operability principles. Product teams receive capabilities such as deployment pipelines, “Four Golden Signals” dashboards, self-service capacity management, and Google Cloud service provisioning out of the box. Each of those capabilities is a blend of open sourcing tooling and custom code, that create enabling constraints to encourage product teams to frequently launch updates to customers.

We’ve supported the platform with self-serve documentation which outlines, step by step, how teams should build and operate on the platform. This documentation is continually evolving as we launch new features and remove complexity for teams on our platform — its key success is it is of practical use to teams rather than shelf material produced to overcome a stage gate. Reducing this documentation through automated processes is celebrated!

JLDP is built on top of open-source technology and managed commodity services in Google Cloud wherever possible. For example, software as a service tooling is used for platform capabilities such as version control and deployment pipelines. We use services like Google Kubernetes Engine and GCP’s managed databases as the raw platform building blocks, then assemble and configure them using automation tools like Terraform and Gitlab CI in ways that meet the needs of our product teams, and substantially reduce our development costs at the same time.

One of the more significant steps we took this year was introducing our own Custom Resource Definition to our platform along with its own Controller — known as the Microservice Manager. Instead of declaring the usual Kubernetes primitives, teams are encouraged to instead define a Microservice which is our curated view of workloads running on JLDP. This is powerful, as it allows us to bake a whole load of great things in — such as resiliency and security configuration, telemetry tools, and management of secrets.

Our Platform Engineers look after this, freeing up Software Engineers to spend more time working on new features for the website, and less on how to do things in Kubernetes or GCP. If you’re interested in learning more, we’ve written about the CRD in a lot more detail, including the process that led us to it.

On The Paved Road, Can The Flowers Still Grow?

https://unsplash.com/photos/upaJhH2bd8Y

Establishing an opinionated platform has not been without its challenges.

The perception of removing an engineer’s agency has to be handled with care. We feel this is one of the things that makes the concept of a paved road so impactful — you don’t have to stay on it! But, by providing a smoother, easier path — and in particular making sure that it stays that way — you make it compelling to use. The onus is then on us as the team looking after it to continually keep it up to date with new features, exposing the new capabilities or tools out there for engineers to use, so that it continues to be compelling for our Product Teams.

Just as our Product Teams want to build compelling experiences for our customers on johnlewis.com, the JLDP Team want to to build a compelling platform for our engineers to use.

What this means in practice is frequent new capabilities launching through the Microservice CRD (such as automatic configuration of Google Cloud Endpoints), frequent new Operability features launched through our Paved Road Pipeline (such as out-the-box Availability Alerts) and frequent experimentation with new technology offerings (the latest examples are GCP’s Workload Identity, and PagerDuty).

We’ve also deliberately chosen to challenge some of the organisational norms, in an effort to make significant changes particularly with the goal of increasing delivery velocity. We would not have been able to do this without great support from the “Digital” area of the organisation, with key senior stakeholders empowering the team and recognising that this was a product, not a project with an end date.

We deliberately co-located the JLDP team with many of the Product Teams using the platform — those close working relationships were particularly important in the early days of experimentation and getting fast feedback.

The “tenant” teams have also had to adapt, as we deliberately offered fewer environments for testing & development to reduce large sprawling end-to-end testing phases. We’ve also pushed the responsibility for build & deployment to teams, as well as the continued operational health of their product. We’ve supported them through blueprints for pipelines and automatic telemetry and alerting. Teams have taken to these changes with relish.

The largest challenges have been when interfacing with wider organisational constraints, which often exist for good reasons when the delivery approach is different from ours — examples include things like the security engagement process and operational handovers. In the world of JLDP, there are a much larger number of much smaller services, the delivery velocity is much higher and operational accountability sits with the Product Teams. These traditional processes have therefore needed to adapt. We’ve handled this by demonstrating alternative ways of achieving the right outcomes.

A good example of this is working with our security teams to achieve just the right amount of engagement through a streamlined process, simplifying a 300+ point checklist through our consistent and repeatable security patterns offered by the platform. We saw this reduce our overall time to first customer by 13 days, in addition to the reduction in effort for people across the various teams involved.

The JLDP team were also one of the earliest ambassadors for the “build it, run it” cultural shift — championing this as a better way of working, even for our business-critical services. This involved addressing some of our organisational models around being on-call, and moving to more modern communication mediums like Slack to manage our support, with a team “front door” for those needing assistance. The various patterns we offer for Continuous Delivery and Operability also reduce friction as it allows our Product Teams to manage their own services autonomously. This approach has led to only 3 out-of-hours support calls for the platform in the last 18 months, none of which were a customer-impacting incident.

So, Did We Reach Our Destination?

https://unsplash.com/photos/c-8JMx17iwQ

The success of our product has now made it a key ongoing outcome for our wider JL & Partners Reinvention strategy.

How did we prove this? By continually measuring our progress.

To support this, we built our own JLDP Service Catalogue which visualises key metrics on Continuous Delivery and Operability. We have measured key Continuous Delivery indicators such as the time it takes a service to have its first live customers, as well as the frequency, lead time and throughput of their deployments. We’ve recently extended this to cover key Operability indicators such as Availability, and are putting tooling in place to allow us to track things like incident rates and recovery times. This has provided key insights for the teams running on our platform and enabled further conversations to help reduce lead time.

Platform Availability

We reduced access to “infrastructure” from months to hours, reducing the amount of additional effort teams needed to build telemetry and Continuous Delivery pipelines. Average live-to-customer timescales are approximately 90 days from enablement on the platform, and this is continuing to drop as we remove constraints.

Search Service Page with Lead Time metrics

As a result we have seen teams are able to increase their pace of change. We have gone from 10 deployments a year to nearly 5,000. This has been without increasing operational incident volumes. Where we have had incidents, rollback has reduced from hours to minutes.

The success of JLDP has meant it has evolved over time to support a more diverse mix of workloads. It has empowered Product Teams to create compelling, innovative offerings, building on the wider reinvention strategy of the organisation towards own products and services — such as Appointment Booking. It offers in-store beauty experiences bookable online for our customers, at a low level of risk and capital investment, and is a good example of using the new platform to execute our new vision.

Our choice to embrace cloud and open source technologies has helped us understand and manage our costs — the cost for us to scale and the visibility of this when we do helps us make much more informed choices about the value of a new service or experiment we want to run.

We were pretty happy with our award

We continue to learn from our implementation of the platform; we are a product. We continue to re-platform further components of our existing commerce monolith, as well as launching new propositions for our customers. We believe that our approach of listening to our teams, measuring our outcomes, then using this insight to develop new capabilities and improve the operation of this business-critical platform will continue to be the best way for us to meet the needs of the services running on our platform and our organisation as a whole.

The success of our cloud adoption is meeting the needs of our customers and providing value to John Lewis & Partners.

--

--

Rob Hornby
John Lewis Partnership Software Engineering

Lead Engineer within our Technical Profession & Platform Product Lead for John Lewis with a background in retail technologies, software testing and platforms.