How to Cyber Monday

Javier Cervantes
Sawyer Effect
Published in
3 min readNov 26, 2018

The weekend from #BlackFriday to #CyberMonday is usually a good opportunity to buy things at discounted prices. But if you work in e-commerce (as we do in Sawyer Effect), it’s also when digital stores are put to the test and if things go wrong it could represent big losses.

If you are in charge of the development and/or operations, you need to be ready for the spike of users and purchases to avoid any major outage or problem. In this article, I’ll share some of the things that we have learned that can help you get through successfully.

Plan ahead

Every change in the system represents a risk of unforeseen consequences, so you need to get ahead for improvement and adjustments that you’ll need for the holidays to have them ready by the start of November.

If you need to make changes closer than that, try to make them as small as possible so they can be easily reverted in case you need to.

Start your preparation with a full technical assessment to make sure you cover performance tuning, error logging and other improvements done with enough time to release and test them.

Weakest Link

Identify all your integration points and map how they could affect you if they fail. Some of the most common integration points are: payment gateway, order management system and marketing tools, but you should consider creating a Logical Architecture Diagram if you don’t have one already.

Reach out with 3rd parties to coordinate with them a plan and make sure they are ready and it’s also a good opportunity to review you implementation to find what could be improved. The most common things to review about are low response times, timeouts and handling errors gracefully.

Finally, test different scenarios and try to isolate the failures. An example of a failure we detected recently, we noticed that when Google Auto-Complete API didn’t have the correct permissions, it stopped clients for doing the checkout, we made some adjustments to the implementation so that the worse case is was now to manually enter the address.

Cache Strategy

Review server that main server response times are fast and performant (For example: Home, Search, Product Pages, Category Pages)

Make sure existing cache settings are high enough to ensure a certain level of re-usability of any cached entry (and increment the expire time if possible).

Do not clear cache in close proximity to anticipated traffic spikes and allow enough time for the cache to build up naturally prior to the start of the event.

Troubleshooting

Make sure you have monitoring and observability tools in place to understand the health of your system and to be able to track down errors.

For example have system logs accessible with an aggregation service, review logging levels and messages for the most important workflows.

Create an on-call schedule with a clear escalation processes and make your everyone has the necessary contact information to communicate any problem.

Learn and repeat

Once you are out in the blue, take some time to learn what could be done better and use that information to trigger improvements so you are even more prepared next time.

It would be great to hear from you, please share your thoughts and experiences of what has worked (or not) for you in the comments.

--

--