How the Boozt platform team prepares for peak seasons

Boozt Tech
Boozt Tech
Published in
5 min readJan 24, 2022

Written by Marco Rosello, Web Development Director

Boozt being mainly a fashion e-commerce, means that we experience major shopping seasons. These peak periods are incredibly important to us because we get a lot of new and old customers visiting our site. An example of our biggest peak season is Black Friday. This is no longer just a day but it lasts a week. During peaks, we experience an extraordinary amount of incoming traffic to our webshop, the sales are high and we may become busy and it can be stressful. However, this is also the period of time which is the most fun and rewarding!

Planning and preparation is vital

Planning and preparation are crucial in order for us to be ahead of the curve and still be able to deliver a great customer experience. For us at Boozt, this means that preparation starts well in advance.

In this blogpost I will go over how we at Boozt prepare for some of our busiest seasons and I will provide you with some tips on how to prepare for peaks.

Scalability is critical

Scalability is the ability to handle a growing number of customers and to scale web services, capacity is incredibly important during times like Black Friday, Cyber Monday, holiday periods etc.

Performance/speed and scalability are not equal. Most of the time a performant system has a greater chance to scale but you will still have limiting factors that need to be addressed. That is why load testing is crucial. Through load testing, you can locate bottlenecks and fix them immediately. Read about how we do load testing at Boozt here.

Here are some examples of what we did:

  • We used some browser caching (local storage) to reduce HTTP calls
  • We centralized our Redis cache to increase the cache success rate and reduce db load
  • We increased the size of our session redis instance
  • We optimized database queries, reads especially
  • We changed database schema to improve write performance
  • We added kill switches for major but not critical features in case we got overwhelmed, we could turn them off.

Caching sounds simple but as once said…

“There are only two hard things in Computer Science: cache validation and naming things” — Phil Karlton

Ensuring reliability

Scaling is part of this but not all of it. Another thing that is very important in a campaign like Black Friday is reliability. This is to ensure that the systems can endure the high amounts of incoming traffic.

We have many partners for shipping, payment etc. We also have microservices that potentially can go down. Every time we connect to a new service via network, we need a failover plan. We assume every system will go down at some point, but we try to make sure this doesn’t affect our customer experience. For shipping integration we simply disable them, we always have some home deliveries available because we do not rely on them when taking the order. For payments we have failovers. If one fails — we failover automatically to the other. We have replicas for our database, so if the master has an issue, we can have a failover to one of the replicas, etc. Considering that anything can go wrong, having a plan for each case, helps us be prepared to act if needed.

Power to the people

In order to ensure that we would be ready and that we would get feedback from the team, I decided to facilitate a premortem meeting. I asked everyone involved in the Black Friday preparation to write what they thought could go wrong and then address it ahead of time.

Everyone prepared a document before the meeting and then we went around the table, sharing our concerns and suggestions.Based on these, we created an action plan and tracked the progress based on it.

Here is a short sample:

  • Write disaster recovery plan/runbook
  • Improve replication lag
  • Handle stock service downtime gracefully
  • Automate the disabling of not crucial background jobs
  • Have a manual payment failover in case automatic fails

Chaos engineering

One aspect of the action plan involved chaos engineering, where you experiment on a system in order to build confidence in the system’s capabilities to withstand disturbances. One of the systems we rely on is stock service, keeping track of the current stock so that we do not oversell. Read our blogpost on how we developed our new stock service here.

To make sure that the website would work while the stock service would not, we turned off the stock service. Our hypothesis was that the website would not be affected by this, however, we were proven wrong. Instead, the latency was too big, we relied a lot on stock service and the timeout was too large, slowing down the site. Our internal CMS was also affected, where product pages had an error. However, in the case of CMS it is not so critical to have a really high uptime, so we decided not to take any action but instead announce to our internal users in case it would occur. For the webshop we reduced the timeout to the minimum amount necessary. We repeated the experiment after the changes and verified that the website was not affected by this anymore. We had a slight increase in latency but totally acceptable.

These experiments are very important to be done in advance. If this were to happen during a peak time, then we would have to perform a code review, a deployment to change configurations and so on, then this would be incredibly time-consuming, and would affect our customers greatly.

Concluding thoughts

It is not our first time preparing for Black Friday, we have learned from previous mistakes and this is noticeable while preparing for peak seasons. The outcome of this year’s Black Friday was really good.

Black Friday dashboard

Our CTO message explains better how proud I am to be a part of this team!

Even though the last Black Friday was successful, we always want to get better so here is a short list of improvements to do:

  • For load test metrics not to use users but requests that hit our backend (disregard cached ones)
  • Continuous preparation, have load tests through the year not only the last 2 months before Black Friday
  • Add load test into CI pipeline to assure the test works and doesn’t break due to webshop changes

If you enjoyed this article, and want to read more great stories from the Boozt platform team, be sure to subscribe to the Boozt Tech publication!

Or perhaps you’re interested in joining our platform team? Then check out our careers page.

--

--