StarRocks vs. ClickHouse: The Quest for Analytical Database Performance
In late 2022, ClickHouse released its open-source performance benchmark project, ClickBench. This benchmarking tool quickly generated a lot of attention and discussion. Data warehouse vendors and data infrastructure engineers rushed to the site to check out who ranked as the ‘fastest’ analytical database.
We applaud ClickHouse for not only providing such a helpful tool, but also for supporting a culture of healthy competition in the community by vetting and accepting results submitted by other groups. The team behind StarRocks had the pleasure of collaborating with the ClickHouse team to discuss test results, and this experience was nothing short of enlightening.
A Tale of Two Databases: Who Is the Fastest Analytical Database?
If you’ve been keeping an eye on ClickBench’s latest results, you’ve probably come across some entertaining discussions related to the products competing for the number one spot on the chart. ClickHouse could have easily turned ClickBench into a vendor-biased marketing tool, but to their immense credit, the ClickHouse team displayed great sportsmanship by accepting results from other projects in the space. This included StarRocks, which ran up near the top of the chart immediately. In fact, StarRocks briefly held the number one spot on its first day on ClickBench.
We could end the story there, but ClickHouse turned around with some impressive results from their next release that put them back on top. Just like any great athlete, ClickHouse wasn’t going down without a fight.
But the StarRocks community is not so easily discouraged. Only a few months later StarRocks re-claimed the top spot with its latest release.
This race is reminiscent of the competition you’d find between famous sports rivalries: Ronaldo and Messi, Federer and Nadal, and even the Lakers and Celtics. Okay, maybe that’s a little dramatic, but the StarRocks community continues to enjoy its intense, but friendly, competition with ClickHouse.
While ClickBench is an excellent indicator of performance for certain scenarios, and StarRocks and ClickHouse are basically neck and neck in that race, we believe there is much more to a great analytical database than what is covered by ClickBench alone.
Going Beyond ClickBench: Properly Evaluating High-Performance Analytical Databases
Sticking with the sports analogy, ranking in ClickBench is just one of the many competitions (query performance) under a larger category (top analytics databases), like the 100-meter freestyle in swimming. To be seen as the best, StarRocks needs to compete and win in different competitions, not just one. Because analytical workloads in real life vary drastically from customer to customer, we need to support all scenarios well.
One great athlete comes to mind in this example: Michael Phelps. Not only did he win the 100-meter freestyle gold in the Olympics, but he also won the 200-meters, 400-meters, butterfly, and medley competitions.
That’s the level of success the StarRocks community aspires to. At StarRocks, we believe there are other scenarios not covered in ClickBench that matter in real life. So we have also been publishing test results against other important test sets such as TPC-H and SSB.
Factors we believe should be highlighted for proper evaluation are:
- Query performance on joined tables without de-normalization — This is a critical feature to simplify the analytics data pipeline and improve timeliness. ClickBench only focuses on de-normalized table query performance.
- Scalability to handle growing data volumes — Modern analytics architectures needs to be distributed and scalable. ClickBench is great for single node configurations, but how easy is it to add or remove a server from the distributed platform? This is important to know.
- High concurrency queries — More and more use cases require support for large numbers of concurrent queries. ClickBench only tests a single query session, so we need to investigate the performance of 100s or 1000s of concurrent queries.
- Ingestion speed — This is another area ClickBench doesn’t cover. While processing queries is critical, it is also important to handle high-speed data ingestion in real-time.
This isn’t an exhaustive list of factors that should be tested, but for ClickBench to adopt them would make the tool more valuable for evaluation purposes.
Driving Greater Analytical Database Performance Through Open Source
The success of ClickBench is a testament to the the power of open source, both in how it brings together developer communities like StarRocks to take on new challenges, and in its fostering of open competition and innovation between projects. On a similar note, earlier this month, the StarRocks project was donated to the Linux Foundation, and we are sure the project will grow even faster in its new home.
Hats off to ClickHouse. It’s an honor to compete with them. It pushes StarRocks to be a better project.
What do you think of StarRocks’ latest achievement on ClickBench? Join the StarRocks Slack and share your thoughts.
Built on the Open Source database StarRocks, CelerData has a performance 3 to 5 times faster than any other solutions on the market, demanding only 1/3 of the hardware resources, and can reduce your operating costs by up to 80%.
As an open-source solution, StarRocks can be downloaded for free from starrocks.io and used in production environments without any performance or capacity limit. We also encourage database developers and users to join the hundreds of contributors worldwide on our GitHub repository and participate in discussions in our Slack channel.
But if you are looking for enterprise-standard features, dedicated support, or just want these benefits on the cloud, we recommend checking out our StarRocks-powered CelerData Enterprise and CelerData Cloud products.
We encourage you to take a look at these solutions to see which is right for your business. If you have any questions, please reach out to one of our engineers here.