How to prepare your company for a sales-season — from a technological
A guide for your technology team to prepare and endure Black Friday, Cyber Monday, Christmas, or any other intense online commerce seasons.
By William Mendes, Infrastructure Engineer — Performance
The thrill of the end-of-year sales-seasons has begun, a time of tremendous demand from consumers. Not only are customers anticipating great discounts, but they are also expecting top-notch online experiences.
Meanwhile, technology teams in e-commerce companies all around the world race against the clock to ensure that all the time and energy spent during the year will have been used well.
Often, without being sure how to proceed or without having time to anticipate and prepare for demand, this period can take the sleep of technology teams, and instead of being a sales oasis, it can become a deserted plain of consumers.
The purpose of this text, my dear reader, is to summarize in a cohesive way my last 7 years of experience working with Application Performance for large e-commerce companies. Helping you to avoid tense days during this period and instead make your profits intense!
Before you start thinking about the coming customers, understand how your technology works.
How do your systems interact with each other?
How do your users interact with your application? Can you replicate this iteration?
What points of attention do you need to have immediately regarding your application?
How to do it? With metrics
Regardless of the technology you use, aim to understand which metrics are most important within your stack, how to collect them, and what they really mean about your platform while in use.
With the defined set of metrics to be collected, implement what you need in your servers, containers, apps, and republish them with active metrics collection — APM can help.
Allow your real users to travel in the application while collecting these metrics and, if it is extremely necessary, after a few days you can disable the collection and just analyze the data generated up to that moment.
With these metrics you (your technical team) should be able to clearly answer the previous questions, and, once those questions are answered, it’s time to go on the attack.
Know thy customer
Now that you know about your application, stack, and your points of degradation, pay attention to how your users interact with your platform and try to reproduce it, in a robotic and dynamic way, but in a controlled manner.
Choose a Performance Testing Tool that fits your scenario and that allows you to exactly replicate your customers’ iteration with your technology, respecting mainly:
- Navigation steps;
- Iteration/waiting time between requests;
- Parameterization/Dynamism in requests.
With the tool chosen and the tests set up, be clear how many requests per minute correspond to the number of users you expect to receive and how they will be distributed over time to replicate during the tests and execute them.
Run several test batteries. Always start with a low load and gradually increase, to understand and be able to analyze the results individually, always adjusting your tests to extract the metrics needed for your business.
Regarding the environment to be used for the test, the closer to the user’s real routine, the better. If it is possible and your team has the technical maturity to do so, do them directly in Production, in the beginning, at a time with the least possible impact on your users.
With all tests done, it’s time to get the team together and understand what happened, how it happened, and why it happened.
What slowness points were found?
What is a symptom and for those symptoms, what is the cause?
Is it possible to draw up short, medium, and long term plans with what was found?
The above questions will be your guides after the tests, and with them, you should be able to prioritize corrections to what has been found, be it just a hot-fix or a change in stack architecture. Be open to what is found and do not discard any finding before drawing up the short, medium, and long term plans, and especially, to replicate the findings in new batteries.
Do not try to solve everything at once, think about which of the solutions will improve most of the expected user load. Sometimes, adding a new and bigger application server gives less return than an in-memory cache, so you should know your stack well before analyzing the test results.
Replicate it and repeat it
With the change plan in place, at each change repeat the load tests, so you will make sure that, at each step, you are closer or further from the goal.
Understand that there will be no silver bullet or a solution that will solve all your problems right away, but try to get as close as possible to the goal you have for that particular season.
More important than plans for what to do if it works, define what strategies will be used if things get out of hand. Inform those who need to be alert, prepare extra servers and previous versions of applications, learn how to apply them and what impact this will bring to users at that time.
These are the basic steps towards tranquillity, but being basic does not mean that they are unique. Keep in mind that this process must be repeated in and out of the sales seasons and with each change in the platform, the steps must be reordered and the questions redone and answered again.
Bear in mind that there is no way to guess the number of tests you should run, for example, at FARFETCH we are used to execute more than 100 Performance Tests specifically for Sale-Season preparation, and for this reason your entire team should be prepared to understand and work on the test results.
The sooner you start preparing and the more continuous the process, the fewer surprises, and the more chances the season will be a success for you and your company.
Don’t wait for the next season to start, if the current season is just around the corner, start small and get ready to have the process fine-tuned for next time.
Originally published at https://www.farfetchtechblog.com on November 26, 2020.