Do not blindly trust benchmarks, analyse them
In Ukraine we have a proverb stating “Trust but verify”, and that is one of the reasons why I tend to be sceptical, when someone mentions performance related statement that I cannot easily verify.
Unfortunately most of people tend trust such statements, when they are done by a respected people in community, because they don’t expect any unverified information being distributed by them.
The Famous Benchmark Story
As all of us know there was a huge tsunami in Magento community this February after MageCore Inc did release a performance benchmark stating that Magento 2.0 is requiring 7x time more resources than Magento 1.x in order to properly operate in production.
Unfortunately, they didn’t release Gatling simulations at the same time as their benchmark results, and it made it incredibly hard to check if their conclusions are legitimate. Also what felt incredibly strange for me, is that no-one else in the community did ask them to provide it with their setup settings in order to make a proper benchmark verification.
It took me one and a half month to get the full reproducible environment, that they were using, published by them. Then it took me another three months to finalise proper scenarios, as it is hard to dedicate a full week for fixing a benchmark, when you are working as independent consultant.
I must admit, that I appreciate the amount of efforts MageCore Inc team has put into this load test. It is a huge amount of work, but there is something in it, that leads to wrong conclusions.
So, which findings did I come with after these 3 months? It is quite an interesting read, so grab a bucket of popcorn and enjoy it!
Concern #1 Apples to Apples
So, the first thing, that you expect from benchmark that compares two systems, is that they have absolutely identical databases, right?
I expected it from benchmark, being done by MageCore Inc, but Magento 2 and Magento 1 databases compared in their article use completely different data sets. Yes, there are 25,000 products in each database and same number of categories, but the data sets are absolutely different.
Here is summary of differences of M1 and M2 databases, that you can easily verify yourself. Just clone their repository and run commands from the following list.
Different Products
Simply run the following commands on their data set for load test:
As you see, their set of products is completely different. This is not a one-to-one comparison. They were supposed to use data-migration-tool in order to transfer data from Magento 1.x to Magento 2.x.
Products per Category
Import both Magento databases into mysql and run the following command to see the difference:
As you see, Magento 1 has 8144 less products assigned into categories than Magento 2. Also, they made a database, that on average has 1000 more products assigned to anchor categories on Magento 2 than on Magento 1, and 100 more on regular sub-categories. See the numbers yourself:
This draws to a conclusion, that numbers mentioned in their original load test, does not correspond to a fair benchmark at all. It is just a comparison, if Magento 2 with larger database than Magento 1 is slower, and as the result, it is.
Concern #2 Nature of Data-Set
The database used for MageCore Inc. benchmark is not a DB of an average merchant in the wild. Click here for an report in JSON format of Magento 1.x database I did when I was creating a proper data set for a fair benchmark of Magento 1.x to Magento 2.x.
Configurable vs Simple Products
If you take a close look, you’ll find all simple products that are visible in categories, all of them are also assigned to configurable products. There are 19068 simple products assigned in configurable products and 13322 of them are assigned to at least one category. That does not correspond to the real merchant databases, where content duplication is a great concern. Also it would add usability issues for your buyers to find the right product they need.
Ideally, you shouldn’t show simple products, that are assigned to configurable ones in your category at all. From my experience, majority of retail merchants, that use configurable or grouped products, use these product types as main sale channel. For instance, a fashion retail store, that has around 20k simple SKU, has only 5k of configurable products being visible in the catalog, without showing simple products in different sizes. That is a data size, that should have been used, but it didn’t.
I created a sample database with only 10% of simple products, that are assigned to configurable ones, staying visible. You can see data report of new database here: https://github.com/IvanChepurnyi/load-test-magento1-bootstrap/blob/master/db/data.json
Product Positions
This is a very interesting issue. As if you would take a look at their testing simulation, you’ll notice, that they always add configurable and simple product together to the shopping cart. But that is a wrong case. Database is organised in such a way that you don’t see configurable products in category unless you are in the very last page. In the test they always visit only the first category page.
Ideally, they should have been adding simples in this scenario, as no customers would ever visit a last page of category, that has more than 1000 items in it with 12 items shown per page. It would take visitor ages to get there.
So why does it happen?
- All category position of products are set to 0
- Simple products have primary key in range from 1 till 20,000
- Configurable products have primary key in range from 20,001 till 25,000
- So MySQL will always show simple products first
I created a sample database with more sparse positions based on their dataset, and it resulted in different benchmark results for Magento 1.9.2.4 then with only simple products on category page. You can find report of this dataset here: https://github.com/IvanChepurnyi/load-test-magento1-bootstrap/blob/master/db/data-large.json
Concern #3 Magento 1.x Oro_Ajax
When I was trying to run the first load check in April, I got simulations failing, because of missing /ajax/ controller handles. In the end it appears, that they added custom module to implement all the shopping cart process on Magento 1.x. This module gave unfair advantage to Magento 1.x over Magento 2, as it was developed to reduce amount of requests triggered by adding product to cart in Magento 1.
In order to have a fair benchmark, the modifications of Magento 1.x standard behaviour should be minimal. You should follow all the same routines, as in default installation.
Apart from behaviour, this module disables plenty of functionality in original Phoenix Varnish cache modules, that they used as a base. As well, they disable some core functionalities in order to improve read performance on the database (reports module on product page).
Of course doing all of these will result in better throughput of Magento 1.x over Magento 2.x.
Here is a proper Magento 1.x setup, that is fair and minimal in customisations to the default behaviour, that you can try by yourself:
It contains a very simple project setup with EcomDev_Varnish module and single local.xml that moves cart and customer blocks into ajax callbacks. Those are updated on shopping cart change only. That it is very similar to Magento 2.x Varnish implementation.
Concern #4 The Gatling Simulation
Apart from database and code differences, there are also issues in the way how simulation are configured. Both Magento 1.x and Magento 2.x are not reflecting behaviour of customer, based on the database that is provided for benchmark. Below you can find an overview of general issues I encounter.
Shopping Cart Behaviour
In default Magento 2.x main focus is - to bring customer as soon as possible to checkout and finish all necessary steps. You should have a very good sight as a customer to notice “View Shopping Cart” button in mini-cart block. Also, you never get redirected to shopping cart after adding product to it.
In default Magento 1.x you also don’t click intentionally on “View Shopping Cart” link, but instead the default behaviour redirects you back to shopping cart after adding a product.
But in simulations, absolutely every customer is visiting shopping cart page. Not only in abandonment case, but also when you go to checkout page.
The difference in customer behaviour of two systems should be reflected in simulations. They should not be copied one to one.
Valid abandonment simulation behaviour for Magento 1 should be:
- Visit a category page
- Visit a product page
- Add it to cart
- Get redirected to shopping cart page
- Reload mini-cart and messages blocks via AJAX call
Valid abandonment simulation behaviour for Magento 2 should be:
- Visit a category page
- Visit a product page
- Add it to cart
- Reload mini-cart and messages blocks via AJAX call
Also there is an issue with adding a second product, as it gets added without visiting any category page in both cases, for Magento 1 and Magento 2.
Configurable Products Got Wrong
As I said earlier the customer behaviour does not reflect the actual catalog structure. In both Magento 1 and Magento 2 they load configurable product page, the same number of times as they load simple product page. But as it was mentioned in database section, customer should be a real detective to find at least one configurable product in catalog.
Also you must consider the changes in behaviour of Magento 2.x configurable pages. They introduced a switch of the main configurable image from simple product selection. As configurable product has quite a lot of assigned simple ones to it, the first visit on the page triggers resize of all possible image combinations. This is once in a life time process for a product. (Also Magento 2.x has console command for resizing those images in batch, before serving it to the customer, but it is broken at the moment).
You cannot measure performance of Magento 1.x to Magento 2.x without taking this particular behaviour into account. And if you are creating a load test simulation, you cannot have it unnoticed.
I am going to publish later on a proper comparison with pre-warmed image cache and without pre-warmed image cache.
Additional AJAX calls in Magento 2.x scenario
In Magento 2 simulation they start every visitor session with a call to “loadSections” ajax call. I did test on every version from 2.0.4 till 2.0.7 and 2.1.0 and none of them had it. Even more, those ajax calls are only called by Magento, when something is done by a customer (something changed to shopping cart, customer session started, customer session ended, etc).
Also this call was not specifying which dynamic blocks to retrieve, so Magento was retrieving all of them, that is a very expensive operation itself. Usually Magento 2.x behaviour consists only of loading necessary blocks, like cart and messages.
Difference in Randomness of Scenarios
This one is very concerning, as they have a direct difference in percentage of probably of particular scenario being chosen by an available gatling worker.
In Magento 2.x they use the following weight values for random switcher:
- 40 abandoned carts
- 25 browsing catalog
- 25 browsing layered catalog
- 10 full checkout
But in Magento 1.x they have different numbers for catalog:
- 40 abandoned carts
- 20 browsing catalog
- 20 browsing layered catalog
- 10 full checkout
It increases probability of more checkouts and abandonment carts in Magento 1.x
Check their scenarios below yourself:
Conclusions
It is possible that Magento 2 is not faster at this point of a time, but it is also not 7x times more expensive to host than Magento 1. Those are two different systems and each of them need to be properly analysed before you make any performance benchmark scenario.
Fair Benchmark Coming
Thanks to Byte Internet, a hosting provider in the Netherlands, for supporting me by providing hosting resources for doing a load test. They provided me with their Hypernode GO BIG L plan for it and passionately were waiting all this 3 months for results.
Soon I will publish my load tests results on their blog at https://www.byte.nl/blog/. So make sure to keep an eye on it.
Meanwhile, you can try it by yourself with my bootstrap scripts and gatling scenarios, that I adapted from original benchmark: