The Waitrose.com Journey — Part Two

Peter O'Shaughnessy
Waitrose & Partners Digital
7 min readDec 9, 2020

--

Snapshot testing? (A polaroid photo of a beach scene). Photo by Nazym Jumadilova on Unsplash.

Snapshot Tests, Functional Tests and GraphQL

In Part One, I shared the origins of waitrose.com, why we ended up needing to migrate to a new platform, and the first of my 5 lessons I learned along the way. In this second and final part, I’ll share the other lessons and reveal how the story ends (for now)!

I mentioned Jest and snapshot testing briefly in Part One

Lesson Learned 6: Snapshot Testing has been worth adopting too

Snapshot testing is when you have the output rendered from your components stored in automatic snapshots, so each time your unit tests run, it can verify that the output still matches.

We did debate introducing it at first, as there was a question about whether it might make us a bit lazy and make our unit tests less self-explanatory. But from what I’ve seen I don’t think those fears have been borne out, and it’s saved us writing a lot of boring test code, manually checking individual pieces of rendering output.

Functional tests

As for functional tests, we’ve been using a library called Codecept since before I joined. Codecept gives a nice clean syntax for writing functional tests and it handles things like waits and timeouts by default, which helps to avoid some of the usual flakiness of automated, browser-based testing.

Our colleagues over in John Lewis are using Cypress, but I understand we in Waitrose had been using Puppeteer directly before, and we were looking for more of an abstraction layer over the top of it, rather than something built from the ground up. Codecept works with Puppeteer, WebDriver and other browser controllers too.

By all reports, Codecept has been way better than using Puppeteer directly. Although we have still had some odd, flaky issues from time to time…

Pup-peteer? (A dog puppet on strings). Photo by pixpoetry on Unsplash.

Lesson Learned 7: Common causes of flaky functional tests

The most common types of issues I’ve seen causing flaky tests have been:

1. When we’re selecting an element to interact with, with a selector that actually picks out multiple elements, and it’s not actually interacting with the one we thought!

2. When tests have passed or failed at different times depending on window size or scroll position. These have tended to have been when the elements it’s trying to interact with are hidden underneath modals or our sticky header.

When these things happen, the best thing is usually to fire up the test in the non-headless, GUI mode (with Codecept that’s done by passing show:true in the Puppeteer config) and put some pauses in, so you can see what’s happening. Switching on the debug/verbose logging has also proved useful for us.

Now this is the good kind of flaky. I wouldn’t mind testing these. (Flaky pastry baked goods). Photo by Markus Spiske on Unsplash.

As we wrote more and more functional tests though, the time taken to run them got longer and longer, til it was taking over half an hour in Jenkins…

Lesson Learned 8: Parallelisation has saved so much time

…So we parallelised it, by automatically separating the tests into a number of chunks — and using Codecept’s “Run Multiple” command.

Originally we split it into 4 chunks running concurrently in Jenkins, and the time taken went down from about half an hour to about 10–12 minutes.
We’re now splitting into 6 concurrent tasks and the time they take to complete overall is about 18 minutes.

If we hadn’t introduced parallelisation, I’m not sure exactly how long it would be taking, but it could now be upto about an hour and a half!

Parallel lines. This bicycle could speed me up too. (A silhouetted bicycle at sunset with horizontal wire lines). Photo by Stesson Bezuidenhout on Unsplash.

GraphQL

Another technology we introduced over the course of the project was GraphQL. The primary reason to introduce it was to use it as a switching layer between our legacy APIs and our shiny new microservices — to give the clients (both the website and our apps) a stable API, abstracting ourselves away from the system change underneath. But it gave us some more lessons to learn…

Lesson Learned 9: Launching core infrastructure probably does warrant a dedicated team

An earlier lesson learned about the Design System was that it was possible to develop one without a dedicated team. But for Graph, I think we all wished that we did form an actual team around it, to get it into production.

I think the main difference is that it’s OK to progress a Design System incrementally. But with a new piece of core infrastructure, it has to be properly complete and production-ready before you launch it.

And another lesson I learned from our experience using Graph was…

Lesson Learned 10: Let the new tech bed down before expanding its use!

I believe now that I was probably guilty of applying our GraphQL hammer to one too many nails, too early on.

Even before Graph was properly in production we started using it in our Checkout team as an aggregation layer, to avoid some more complex logic — making and combining multiple API calls on the front-end, and in the Apps. It is working well for that, but it did slow our development down quite a lot for a while, and it makes our whole development setup more complex.

So in retrospect it would have been better to get Graph doing one thing well first — its original purpose of switching — before we started applying it to other problems…

Golden hammer? (A brass gavel). Photo by walknboston (CC BY 2.0)

Those were 10 lessons I learned along the journey. Now let’s get back to the story…

What happened next?

In early 2019, it was confirmed that Waitrose and Ocado were going to be going separate ways. Investing in our own online shopping website suddenly looked even wiser. We had until September 2020 to get ready, to hopefully coax some of those customers over to waitrose.com. We really needed to complete this Samosa migration before then, so we had a stable platform ready in time.

And then when we were just a few months away, a certain pandemic cropped up. Lockdown One struck and we had a massive surge in visits to waitrose.com. It was a difficult time for many of our customers and colleagues, as I’m sure it has been for many of you too… But there are silver linings, as they say: it helped us identify many back-end performance bottlenecks and scaling issues!

After we battled through fixing those, we still had a lot of testing to do…
In the Checkout team, we had been building out the new front-end in parallel with the new back-end services. So we had been doing all our development using mock services and mock data. When we were finally able to start hooking up to the real back-end services and real data, we discovered a lot of defects. (Bonus Lesson Learned 11: we’re trying harder to work more iteratively and collaboratively with the back-end and deliver more frequent vertical slices — that should be easier now!)

We drank lots of coffee and within our Checkout team, we fixed 60 defects in 3 weeks.

Samosa night

And so, finally on the 10th of August 2020, it was Samosa night… Time to migrate Checkout, Orders and Slots.

Over 30 of us were up overnight to run through the implementation plan for the switchover, which included migrating over half a million active trolley records.

And it went… as smoothly as could be expected! We had a few minor data issues, but no show-stoppers and nothing we weren’t able to “fix forwards” in a short timeframe.

Cue celebrations — in a suitably socially-distanced, virtual way of course!

Samosa night! (A wooden try with samosas). Photo by kabir cheema on Unsplash

The End (ish)

So now we have all the major functionality of the website migrated over to React, Node and cloud services.

But our replatforming journey still continues, because we still need to migrate Recipes and some other bits and pieces. We also know that we still have a lot of improvements to make — around performance, accessibility and more.

However, we’ve come to the end of a significant chapter of the waitrose.com journey. Our platform migration has helped us to scale up for a big increase in demand and we should now be in a better place to be able to respond more quickly to our customers’ expectations and feedback. We’ve also learned lots along the way!

--

--