Peak Preparations: A story of Tech in the run up to Christmas…

Kavitha Rajendran
We are Team NOTHS
Published in
6 min readMar 4, 2019

It’s probably best to start with a bit of background about what “peak” means to our teams here at notonthehighstreet. The run up to Christmas is our biggest and busiest time of the year. It involves months of planning and an enormous amount of effort cross functionally, so from a technical perspective there is a lot of pressure. We’re an Ecommerce business so we need to make sure the website stays up and running and functioning as best as it can.

We have had some fairly major technical issues in the past which have consequently had a big effect on how we’ve performed during peak. A couple of years ago on Black Friday, we had our ad appear during an X Factor ad break which caused a huge increase in traffic to the site. Our infrastructure however wasn’t scaled to the right levels meaning we experienced a short outage. We needed to avoid things like this happening again, and ultimately we wanted the rest of the business to have more confidence in us as a team, so there was a lot of work to be done from our side.

In July 2018, I was asked by our acting CTO if I would be happy to take responsibility to lead the preparations and manage the tech requirements for Peak in addition to my existing role. I had never done anything like this before. Considering what had happened in previous years, there was an enormous amount of pressure — I knew deep down that the success or failure of peak now lay on my shoulders. A lot of things could go wrong but I also knew what a fantastic opportunity this would be. Not only would I learn so much myself, but it would also be a great opportunity for me to collaborate with people from all across the business. People I otherwise might not have had the opportunity to work with. My first priority was to build a great tech peak team so I approached people in every team to see who would be interested in contributing.

With the tech peak team in place, we started by looking at what had happened previously and then collectively drafted the overall architectural diagram for NOTHS and presented it to the entire Tech team.

Overall NOTHS architecture diagram

We then drafted a month-by-month plan, created a dedicated slack channel, set up weekly stand ups (which then became daily stand ups closer to peak) and started the channels of communication going.

…So, what were the highlights?

For me the best thing was the sense of cross team collaboration. I attended daily Christmas stand ups representing Tech, with representatives from other parts of the business to discuss and collectively plan for peak. We also had an ‘Emergency Christmas WhatsApp group’ which was a group message chain, set up to raise any issues across the business during peak. What did I learn? It was great to see what really goes on behind the scenes at Christmas to ensure that peak trading is a success. Seeing first hand how each team contributes to it and works together was really brilliant.

Another thing that worked well was the communication we had with the rest of the Tech team, giving them full transparency of how things were going and what needed to happen. I made sure Tech had the visibility of the daily forecast of expected traffic, TTV, number of unique users and checkouts etc so we were able to plan accordingly. I also made sure the team were aware of all of the key event dates, things like TV sponsorships, pop-up shop dates and all key marketing strategies that can cause a spike in traffic. This meant we were able to scale our infrastructure accordingly. I also kept the team looped in on our performance relative to target and the previous year so we were able to track everything day-by-day.

We started the tech preparations by reviewing all the issues identified during peak 2017. We also performed impact and risk analysis of all the Tech services for peak and what measures were needed to be taken to ensure the stability of the services.

One of the key aspects of peak prep is load testing where you test systems’ performance under simulated real-life load. We placed a lot of emphasis on load testing, which is something I want to highlight because it made a huge difference. In previous years, the preparation for peak was done so last minute that the teams never really had time to fix any bottlenecks in time. Last year though, we took the time to fix all the bottlenecks which ended up having a hugely successful impact on how things went in peak.

As an example, one of the bottlenecks we encountered was requests capping out at 160k rpm (requests per minute). The load balancing algorithm used by Nginx caused uneven distribution of traffic across application containers, which caused containers to become overloaded and unhealthy (slow to respond), taking them out of rotation, causing all Nginx instances to reload the configuration. After being taken out of rotation, they would recover and be added back, which caused all Nginx instances to reload their config again. We fixed this by upgrading Nginx and using the new “power of two” load balancing algorithm that achieved a more even spread across containers.

One of the load test results

Another example of a bottleneck to highlight is the legacy user login issue that was identified during load testing. The user identity team worked really hard and fixed the issue. They managed to reduce the time it took for a legacy user to log in from 20 seconds down to <500 milliseconds in time for peak.

Our goal for peak was to be able to handle +10% of the profile of traffic from peak 2017. After extensively running over 70 load tests the results showed us that we were ready to handle the load. We even managed to get our infrastructure to handle double the traffic experienced in peak 2017.

Training was another important aspect of what we ran, to ensure everyone was up to speed and knew what had to be done. We created runbooks for every service so people on call knew what to do when dealing with issues. We also spent time reviewing and setting up alerts for every single service, as well as creating monitoring dashboards for all the key services and customer journeys we wanted to monitor close to peak. We created a peak calendar to get visibility of engineers’ availability and to manage holiday expectations. The best thing about my peak team was that every member took ownership of different aspects of the work involved and completed it with utmost sincerity.

And how did it all go?

It is very important to have a good team that share the same vision and work towards it. I have to say that every member of the peak team was incredibly motivated, worked really well with each other and were ready to do whatever it took to achieve the goal.

We were ready by October last year. We scaled our infrastructure at the end of November, kept a close eye on what was going on and then scaled down after Christmas. I can proudly say that we had no technical issues that in anyway impacted a loss on TTV and that peak 2018 was a huge success. It was a triumph for the tech team working so closely with the rest of the business and I really hope there are many peaks like it to come!

The Christmas Stand-Up cross functional team in black on Black Friday!

--

--

Kavitha Rajendran
We are Team NOTHS

I provide hands-on technical leadership to help improve the checkout and fulfillment experience for notonthehighstreet.com customers.