A philosophy of building high-quality TiDB

siddontang
6 min read1 day ago

--

Building a high-quality database is a monumental challenge. Even databases with decades of maturity and a reputation for stability, like Oracle, PostgreSQL, MySQL, SQLite, etc, can still encounter bugs. So, how can TiDB gain the trust of customers? How do we ensure that the data stored in TiDB is correct, consistent, and safe? These are not easy tasks, but at TiDB, we are driven by the pursuit of excellence and innovation. Quality assurance is not just a goal for us — it is the foundation upon which we build every aspect of TiDB.

The following outlines the philosophies that guide our approach to building TiDB. These principles are grounded in our real-world experiences at PingCAP, and while they may not be universal, they reflect our continuous journey toward improvement.

Disclaimer:

  • These philosophies are based on our unique journey with TiDB and may not apply universally.
  • Not everyone may agree with them, and we are continuously iterating and evolving our approach.
  • Despite our rigorous philosophy, we cannot guarantee that TiDB is bug-free.

Phil — High-quality emerges from real-world battlefield, not just in-house testing

We believe that true software quality cannot be fully realized within the confines of a lab. In-house testing is essential, but it’s the unpredictable, varied scenarios encountered in the real world that truly test the limits of a product. With every new customer that adopts TiDB, we encounter new environments and use cases, each presenting its unique challenges.

  • More customers = more diversified scenarios: Every customer brings a new scenario, helping us expose bugs and edge cases we might never have encountered otherwise.
  • Diversified scenarios push product boundaries: These real-world use cases help stretch TiDB beyond its initial scope, revealing hidden weaknesses.
  • Fixing these issues enhances quality: Addressing these challenges strengthens the database, improving its robustness.
  • Better quality attracts even more customers: As TiDB becomes stronger and more reliable, its reputation grows, attracting more customers who then help refine it further.

In short, quality is an ongoing battle fought in the real world, and each new customer helps make TiDB better.

Phil — The 80/20 Rule holds true

One of the most significant insights we’ve gained from building TiDB is that the 80/20 rule applies just as much to database development as it does to many other fields.

  • 20% of customers(Major Players) contribute to 80% of the issues: In our experience, a small subset of customers — often those with the most demanding and complex use cases — are responsible for the majority of the bugs we encounter.
  • Focusing on these key scenarios ensures higher quality: By dedicating our resources to addressing these high-impact use cases, we can improve TiDB for everyone. The lessons learned from these scenarios apply broadly, helping to prevent similar issues from cropping up across other customer environments.

By focusing our efforts where they matter most, we’re able to deliver a higher-quality product without compromising our ability to innovate.

Phil — More features, more bugs

There’s no denying that new features are essential for staying competitive in today’s fast-paced tech landscape. Each new feature gives TiDB a competitive edge and attracts more customers. But with innovation comes complexity, and with complexity, bugs are inevitable.

  • New features enhance competitive advantages: They keep TiDB at the cutting edge, helping us meet the evolving demands of the market.
  • R&D on new features can squeeze bandwidth for quality improvements: However, this focus on innovation sometimes means that fewer resources are available to fix bugs or improve existing features.
  • More features can introduce more bugs: Each new feature adds complexity, and complexity is the enemy of stability. Unfixed bugs can accumulate over time, slowly eroding overall quality.
  • Quality improvements come from balance: At PingCAP, we’ve found that striking the right balance between innovation and stability is key. In TiDB 8.1, for example, we’ve controlled the number of new features to give our team the space to improve overall product

This careful balancing act allows us to maintain a high-quality standard without stifling innovation.

Phil — Close known bugs in time to reduce the possibility of defects

We’ve come to see bugs not just as issues to be fixed but as crucial indicators of the health of our product. The faster we can close bugs, the stronger our releases will be.

  • Bugs are a lagging indicator for the current release: The bugs we find after a release give us insight into the stability of that version.
  • But they’re a leading indicator for the next release: Fixing bugs quickly is critical because it prevents those issues from carrying over into future releases, where they can cause even bigger problems.
  • Bug convergence is key: As we fix bugs, we observe a convergence — fewer new bugs are introduced, and the product becomes more stable over time. Tracking bug convergence is one of the most important indicators of TiDB’s overall

By fixing bugs quickly, we reduce the likelihood of defects and ensure that each new release of TiDB is more stable than the last.

Phil — Higher quality bar, better quality

Over the course of TiDB’s development, we’ve raised our standards for what constitutes a blocking bug. In TiDB 8.1, our criteria for blocking bugs are stricter than ever before.

This higher bar ensures that TiDB 8.1 is ready to handle even the most demanding use cases.

Phil — Test before going live

No matter how confident we are in TiDB’s quality, we never assume that there won’t be issues when it’s deployed in production. Testing in real-world environments is essential.

  • Internal testing leads to convergence: In mid-2023, we worked with customers to conduct thorough internal testing before going live. This led to the resolution of numerous issues and showed a clear convergence trend in bug
  • Go-Production without test causes non-convergence: New systems and scenarios in another customer were continuously introduced in production directly, which constantly pushed the limits of our product’s capabilities

Any production change is a very serious matter, and no matter how confident we are in the quality of our products, we need to respect production and be fully prepared when it comes time to make changes.

Conclusion

Building a high-quality distributed database isn’t just a goal — it’s a journey.

At PingCAP, we’ve developed a philosophy that prioritizes real-world testing, focuses on the most critical use cases, balances innovation with stability, and continuously refines our approach to bug fixing and quality assurance. TiDB 8.1 reflects these values, offering a product that is both cutting-edge and reliable.

As we look to the future, we remain committed to raising the bar for what a database can achieve — working hand-in-hand with our customers to build the next generation of data solutions.

References

--

--