3 years as a freelancer and one new lesson
First of all, many thanks to all of you who sent me your congratulations.
I had many new projects during the last year, I learned new stuff and enjoyed to help my clients.
I’d like to tell about a new experience I had recently — my customer didn’t pay me. “You optimized our site, and it crashed.” To say the truth, I still have no idea what to do with that. I don’t think I’ll waste a few days to sue them. But I definitely learned something from this story.
The story started in the end of April, when Causematch’s CEO asked me to prepare their site for an upcoming traffic increase. Causematch uses a pretty standard stack — Wordpress (Nginx/PHP/MySQL) hosted on dedicated Linux server with reasonable specs. We had just about two weeks before the upcoming ads campaign, plus the site was live with no staging environment; so I went to optimize the existing configuration instead of refactoring.
I started with running a load test using Blazemeter against the existing site, analyzed bottlenecks and started to work. Most of the problem was related to the database — like table locks and table scans. So I migrated MyISAM tables to InnoDB, added necessary indexes, asked the developers to rewrite some queries and tweaked the MySQL config for performance — pretty common stuff. Plus activated CDN, asked the developers to optimize the images and so on.
After we finished the site optimization, a stress test demonstrated at least 3 times improvement in number of user transactions. So far so good. We had about 2–3 days before the ad campaign. Of course, I asked all team members to freeze site development, i.e. to avoid any modifications of code, configuration and so on.
Here we come to the big day. And it’s starting — monitoring alerts, calls from the the CEO. Yes, the site went down as fast as we received user traffic. Relatively low traffic — much lower than what we simulated during our load test. After sitting all evening and most night I understood that two things had happened after I suggested to freeze the system:
- They modified the user flow in a way which caused about 5 times more load per each user transaction (i.e. per one donation).
- They upgraded the WordPress version, and that, for some weird reason, caused another degradation of the system performance — about 2 times.
Bottom line — in the D-day we had a system which is able to serve 10 times less users than during the tests!
So the lesson is simple — I have to improve my soft skills and be more assertive in my communication.