How we support a 24 by 7 Retail business

--

We recently put this together internally to explain to anyone joining the organisation the expectations we had around in and out of hours support here at the John Lewis Partnership. It seemed too good not to share so here goes.

Overview

John Lewis and Waitrose are retail stores with a significant online presence where trade is often at it’s peak in early/late evening. Our stores also operate over the weekend and through public holidays. As part of our engineering direction we have moved away from a separate operational run team to have a modern approach where teams build and run their own services. We believe this leads to better delivery and operational performance.

This requires Partners to support in and out of hours for some of our critical services especially during peak periods, we are a retail business and value supporting our fellow Partners working in stores or other locations and that is rarely within office hours of 9–5. With the right skills anyone in the team can undertake this activity, it is paid and we aim to make this a low impact activity utilising modern tools and ways of working.

What this means in practice

Don’t worry this isn’t contractual, we don’t have lots of incidents requiring callout, we understand that people have different life circumstances that might make this difficult to do.

  • John Lewis Digital operate a formalised build it run it capability with critical and essential services requiring out of hours support
  • Waitrose Digital is experimenting with this approach, teams manage their service during the day and has an out of hours operational team and best effort callout approach
  • Projects and Programmes typically operate periods of “hypercare” where there is a expectation during launches to be on-call or supporting go live of new applications or products
  • We support our Shops through formalised callout for our Retail shops team
  • We have periods of peak trade for Christmas, Sale launches and Black Friday where we request that teams have eyes on support

In all cases we work with the teams involved to ensure support is set up correctly and it a good experience

  • All teams will have at least 4 people on the rota meaning that if asked to support it is 1 in 4 weeks typically, possibly less
  • Where a team can’t fulfil a rota due to size we look to Domains where multiple teams support multiple services
  • We pay people additional for being on call. It is higher for public holidays and weekends. You can also choose to take time in lieu for any period where called out
  • Eyes on for launches for sales tend to be between 6–9pm when trade is busiest
  • You are supported and not alone with informal backup support within teams, we also have roles for major incidents with Incident Commanders (not as scary as it sounds) who support a Engineer when on-call with wider stakeholder communication and can get anything you need to close the incident
  • We use tools such as Pagerduty to manage callout through routing of the right alerts to the right people quickly
  • There will be runbooks for most typical issues and we run game days and chaos events to support learning
  • We normally expect people to be online within 20–30 minutes when called out
  • This does mean you’ll need access to phone and a computer when on call

You don’t go on call from day 1, we ease you in and it’s down to you to be confident you can support the application you are working on

  • Out of hours is part of supporting your service. However this is also the case during the day, before going on call you will have become familiar with running your service through responding to in-hours requests
  • Teams typically run tabletop, game or/and chaos days to allow you to check your skills in a controlled environment
  • You have to be confident in going on call, as already mentioned, team game days, chaos events are all part of a good team and learning
  • Your Engineering people manager can ensure you are developing any skills required and can broker any conversations where it might not be right for you to be on support
  • Incidents are change related and are resolved during the day. We typically make changes in day, even complex ones, using feature toggles and high quality automation. We have extremely low change failure rates and deploy on demand multiple times a day so this doesn’t happen often

In summary

  • We are a retail business with shop hours outside of working hours so some periods of out of hours working is likely, you get paid for this
  • You can talk to your team and People manager if you can’t go on-call for any reason, we’ll understand
  • You’ll be trained in being on-call and have a modern working environment which makes responding to an incident as easy as it can be
  • You won’t be alone, there is strong support from a wider team
  • You’ll be learning a great new skill and one that is increasingly in demand

--

--

Rob Hornby
John Lewis Partnership Software Engineering

Lead Engineer within our Technical Profession & Platform Product Lead for John Lewis with a background in retail technologies, software testing and platforms.