Navigating the MySQL Wire-Compatible Database Landscape — Overview

Li Shen
12 min readMay 24, 2023

--

MySQL has been around since the mid-1990s and has become one of the most popular open-source relational database management systems (RDBMS) in the world. It has a strong community and ecosystem, and many companies have built applications and services based on MySQL.

In the past decades, MySQL has evolved a ton, and there are now several variants. Additionally, there are several popular databases built on the MySQL codebase or compatible with the MySQL wire protocol.

The evolution of databases has been shaped by the ever-changing requirements of modern applications and the advancements in hardware technology. Over the years, databases have undergone significant transformations to meet the demands of increasingly complex and data-intensive applications.

In this blog, we will explore the landscape of MySQL wire-compatible databases — or what we’ll term the MySQL family. We will categorize the options in the market and introduce the popular ones. In this way, we will reveal insights into the current market as well as future trends.

MySQL-Derived Databases

In this section, we’ll walk through the most popular category of MySQL wire-compatible databases while providing insights into the many variants built on top of MySQL.

MySQL

MySQL is an open-source RDBMS created by Michael “Monty” Widenius and David Axmark in 1995. It was originally released under the name “MySQL”, but it is now owned by Oracle Corporation.

MySQL has a large community of developers and users, and it is used by many web applications, including Facebook, Twitter, and YouTube. The large adoption is the best proof of its maturity and ease of use. It is also why enormous companies build a large number of third-party tools for MySQL.

MySQL is still evolving with a focus on modern application requirements and improved performance. We can see the trend and direction thru the new features.

  • Reliability

MySQL 5.7.17 introduces the Group Replication feature, also known as MySQL Group Replication (MGR), which provides a native high-availability and fault-tolerant solution. MGR enables synchronous replication, allowing multiple MySQL instances to work together as a group, ensuring data consistency and automatic failover. This innovation simplifies the configuration and management of highly available database systems and enhances the reliability of MySQL deployments.

  • Advanced Analytics

In response to the increasing demand for advanced analytics, MySQL has embraced innovations that enable sophisticated data analysis within the database itself. Features like window functions, Common Table Expressions (CTEs), and extended SQL syntax support empower developers to perform complex calculations, generate reports, and gain deeper insights directly within MySQL. MySQL also introduces parallel query execution, which improves the performance of queries involving large datasets. By utilizing multiple threads to execute a single query in parallel, MySQL 8 can leverage the computing power of modern multi-core processors. Parallel query execution significantly reduces query response times, especially for resource-intensive operations like full table scans or complex joins. This trend towards integrating analytics capabilities within the database reduces the need for data movement and external tools, streamlining the analytical workflow and improving overall performance.

  • Operation efficiency

MySQL 8 introduces Instant DDL (Data Definition Language), which allows certain data definition changes to be performed online without blocking concurrent operations. With Instant DDL, you can alter table structures, add or modify columns, and perform other schema changes without locking the entire table. This enhances database availability, reduces downtime, and improves the overall efficiency of schema modifications. Invisible Index provides administrators with more control over index management. By marking an index as invisible, administrators can observe the query performance without actually removing the index. This feature enables administrators to evaluate the impact of an index on query execution without affecting the application’s behavior. It provides greater flexibility in optimizing database performance and reduces the risk of inadvertently removing essential indexes.

MariaDB

MariaDB is a community-developed fork of MySQL that was created in 2009 after concerns following the acquisition of MySQL by Oracle Corporation.

MariaDB is compatible with MySQL, and it introduces several enhancements and new features that provide added functionality and performance improvements. It includes features such as thread pooling, parallel replication, and improved query optimizer, which contribute to better scalability and higher performance. MariaDB also supports additional storage engines like XtraDB (an enhanced version of InnoDB), Aria, and ColumnStore, offering more choices to users based on their specific requirements.

MariaDB takes pride in its strong community-driven development model. The MariaDB community actively contributes to the development, testing, and improvement of the database system. This vibrant community fosters innovation and ensures the continuous evolution of MariaDB.

Besides the database kernel, MariaDB also offers Xpand (a distributed solution) and ColumnStore (for fast analytics) to meet the changing needs of enterprises.

Percona Server for MySQL

Percona Server for MySQL is an open-source, enhanced version of MySQL that aims to provide better performance, scalability, and advanced features compared to the standard MySQL distribution. It is developed and maintained by Percona, a leading company in the MySQL ecosystem, known for its expertise in database performance optimization and consulting services. Percona Server provides the following benefit:

  • Performance Improvements: Percona Server for MySQL incorporates several performance enhancements to deliver better query execution times and improved throughput. It includes a number of optimizations, such as enhanced thread pooling, improved buffer pool management, and advanced query execution plans. These optimizations help to minimize lock contention, reduce I/O operations, and improve overall database performance.
  • Advanced Monitoring and Diagnostics: Percona Server for MySQL offers extensive monitoring and diagnostics capabilities to help administrators identify and troubleshoot performance issues. It includes the Percona Monitoring and Management (PMM) tool, which provides a comprehensive set of monitoring and analysis features. With PMM, administrators can monitor database performance metrics, identify bottlenecks, and analyze query execution plans to optimize database performance.
  • More comprehensive tooling: Percona XtraBackup enables online, non-blocking backups of MySQL databases, allowing for consistent backups without impacting database performance. Additionally, Percona Toolkit, a set of command-line tools, helps with database management tasks such as data recovery, schema changes, and performance optimization.

Amazon Aurora MySQL

Amazon Aurora MySQL is a cloud-based MySQL-compatible database service provided by Amazon Web Services (AWS). It is designed to be highly scalable and available, and it includes some advanced features such as automatic failover and read scaling.

AWS Aurora is the first cloud-native relational database. It innovates at the storage layer to create a database built for the cloud that can support modern workloads without sacrificing performance. In Amazon Aurora, the log is the database. Database instances write redo log records to the distributed storage layer, and the storage takes care of constructing page images from log records on demand. Database instances never have to flush dirty pages, because the storage layer always knows what pages look like.

This improves several performance and reliability aspects of the database. Write performance is greatly improved due to the elimination of write amplification and the use of a scale-out storage fleet. Amazon Aurora MySQL demonstrates 5x write IOPS on the SysBench benchmark compared to Amazon RDS for MySQL running on similar hardware. Database crash recovery time is cut down dramatically because a database instance no longer has to perform a redo log stream replay.

The Limitations of MySQL-Derived Databases

As discussed, MySQL is a wonderful database with a strong feature set and an active user community. However, if you’re building modern applications with extreme requirements for scale, performance, and data integrity, MySQL has many limitations, including:

  • Scalability: MySQL runs only on a single machine. That means we can only scale it by using more powerful computers. This approach is expensive and not very scalable. As a result, the number of IO operations per second (IOPS) a disk supports can quickly become a bottleneck for the system.
  • Reliability: While a traditional DBMS employs techniques such as changelogs to make it more reliable, it isn’t. For example, if the disk corrupts, then the DBMS may lose all the data.
  • Availability: If the machine crashes, the DBMS won’t be able to serve user requests until the machine is fixed and the DBMS recovery completes.
  • Innovation: After the acquisition by Oracle, developers were concerned about the future of MySQL. No one knows if Oracle will keep investing in MySQL and adding new features.

Sharding Solutions

In modern applications, the demand for database scalability and performance is constantly increasing. As the amount of data grows, traditional single-node MySQL deployments may struggle to keep up. Sharding, the practice of horizontally partitioning data across multiple database instances, has emerged as a popular solution to address these scalability challenges.

To simplify the sharding process and provide additional functionality, several MySQL sharding middleware has gained significant popularity. This section will explore three prominent MySQL sharding middleware: Vitess, ProxySQL, and MariaDB Xpand.

Vitess

Vitess (https://vitess.io/) is an open-source sharding middleware developed by YouTube and is now part of the Cloud Native Computing Foundation (CNCF). It was originally created to address YouTube’s scalability needs and was later released to the public.

Vitess extends MySQL to support sharding, providing transparent routing of queries and transactions across a sharded MySQL infrastructure. Some key features of Vitess include:

  • Sharding: Vitess automatically partitions data across multiple MySQL instances, allowing for linear scalability as the dataset grows.
  • Query Routing: Vitess intelligently routes queries to the appropriate shard, ensuring that data is accessed from the correct MySQL instance.
  • Connection Pooling: Vitess manages a connection pool to efficiently handle connections to MySQL instances, reducing overhead and improving performance.
  • Horizontal Scaling: With Vitess, you can add or remove shards dynamically without affecting application functionality, enabling seamless horizontal scaling.
  • Resharding: Vitess simplifies the process of resharding, which involves redistributing data across shards, ensuring minimal downtime and data loss.

ProxySQL

ProxySQL (https://proxysql.com/) is a high-performance, open-source MySQL proxy that sits between application servers and MySQL database servers. While not strictly a sharding middleware, ProxySQL plays a crucial role in sharding architectures by providing advanced traffic routing, load balancing, and query management capabilities.

Key features of ProxySQL include:

  • Query Routing: ProxySQL routes queries based on predefined rules, allowing you to distribute queries across different MySQL instances or shards.
  • Load Balancing: ProxySQL evenly distributes incoming connections and queries among the available MySQL servers, optimizing resource utilization and improving performance.
  • Connection Pooling: ProxySQL manages a connection pool, reducing the overhead of establishing new connections to MySQL servers and improving scalability.
  • Query Caching: ProxySQL can cache frequently executed queries, eliminating the need to access the MySQL server for every identical query and improving response times.
  • Query Analysis and Rewriting: ProxySQL allows you to analyze and modify incoming queries, enabling query optimization and customization.

MariaDB Xpand (formerly ClustrixDB)

MariaDB Xpand is a distributed SQL database built for scalability and high availability. It is part of the MariaDB Platform and is designed to address the limitations of traditional single-node MySQL deployments. MariaDB Xpand offers the following key features:

  • Distributed Architecture: MariaDB Xpand automatically distributes data across multiple nodes, providing linear scalability and fault tolerance.
  • Transparent Sharding: MariaDB Xpand shards data across nodes, automatically routing queries to the appropriate shards and managing data distribution.
  • SQL Compatibility: MariaDB Xpand supports standard SQL syntax, allowing developers to leverage their existing SQL skills and applications without major modifications.
  • Elastic Scaling: With MariaDB Xpand, you can easily add or remove nodes to accommodate changing workload demands, ensuring scalability and performance.
  • ACID Compliance: MariaDB Xpand ensures the Atomicity, Consistency, Isolation, and Durability (ACID) properties of transactions across distributed nodes, maintaining data integrity.

The Darkside of Sharding

While sharding can provide significant scalability and performance benefits, it also introduces some challenges and drawbacks for developers. Here are some common drawbacks:

  • Increased Complexity for Application Development: Sharding adds complexity to the database architecture. It requires careful planning and design to partition data effectively across multiple shards. Developers need to handle routing logic, data distribution, and shard management. Besides the normal application logic, developers also need to handle failures, retries, and error handling related to distributed queries. These can increase the complexity of the application codebase.
  • Data Consistency: Maintaining data consistency across shards can be challenging in a sharded environment. Transactions involving multiple shards become more complex, and ensuring atomicity and isolation across shards requires careful coordination. Maintaining data integrity and consistency becomes more critical as the number of shards increases.
  • Join and Cross-Shard Queries: Sharding complicates queries that involve joins or aggregations across multiple shards. These queries require coordination between shards, which can result in increased latency, reduced performance, or faulty implementation. Developers need to carefully design the data model and query patterns to minimize cross-shard operations.
  • High Operational Cost: Managing and maintaining a sharded environment requires additional operational effort. Adding or removing shards, rebalancing data, executing DDLs, and performing maintenance tasks like backups and upgrades become more complex and time-consuming compared to a single-node database.
  • Limited Elasticity: While sharding enables horizontal scalability, adding or removing shards may not be as dynamic or elastic as scaling a single-node database vertically. Sharding typically requires careful planning and coordination, and the process of adding or removing shards may involve downtime or data migration.

Developers need to carefully consider the complexity, data consistency, query patterns, shard key selection, maintenance efforts, and application development implications when deciding to implement a sharded database architecture. It is important to weigh the benefits against the drawbacks and evaluate alternative solutions like scale-up approaches or distributed databases before adopting sharding.

Distributed Databases

As the demands for scalability, performance, and availability of MySQL databases continue to increase, traditional single-node deployments may face limitations. Distributed MySQL databases have emerged as a powerful solution to address these challenges.

With the rise of the internet and the explosion of data generated by web applications, new challenges emerged that pushed databases to evolve further. The need for scalability and high availability gave rise to distributed databases, which allowed data to be partitioned and spread across multiple nodes. This distributed architecture not only enabled horizontal scalability but also fault tolerance and enhanced computing capability.

SingleStore

SingleStore, formerly known as MemSQL, is a distributed, in-memory, relational database that provides real-time analytics and transactional capabilities. It combines the scalability and performance benefits of distributed systems with the familiar SQL interface of MySQL. Key features of SingleStore include:

  • Distributed Architecture: SingleStore utilizes a shared-nothing architecture, distributing data across multiple nodes in a cluster. This enables horizontal scalability, allowing applications to handle growing workloads and datasets.
  • In-Memory Processing: SingleStore leverages in-memory storage to accelerate query processing and analytics. It stores data in memory for fast access, while also providing disk-based persistence for durability.
  • Hybrid Workloads: SingleStore supports both transactional and analytical workloads, allowing real-time analytics on live operational data. This eliminates the need for separate systems for transaction processing and analytics, simplifying the architecture.
  • High Availability and Fault Tolerance: SingleStore provides built-in replication and automatic data distribution across nodes for high availability. It can survive node failures without impacting application availability or data integrity.

TiDB

TiDB is an open-source, distributed SQL database developed by PingCAP. It is designed to provide horizontal scalability, strong consistency, and high availability. TiDB supports the MySQL wire protocol, making it compatible with the MySQL ecosystem. Key features of TiDB include:

  • Horizontal Scalability: TiDB allows linear scalability by adding or removing nodes to the cluster. As the dataset and workload grow, developers can easily scale the system to handle increased demand.
  • High Availability: TiDB ensures high availability through data replication across multiple nodes. If a node fails, the system automatically fails over to a healthy replica, minimizing downtime and maintaining data availability.
  • Distributed SQL: TiDB enables developers to write complex SQL queries that seamlessly span multiple nodes in the cluster. The SQL layer of TiDB intelligently routes queries to the appropriate nodes, allowing parallel execution and efficient query processing.
  • Cloud-Native Capabilities: TiDB is designed with cloud-native principles in mind. It can be deployed on popular cloud platforms and takes advantage of cloud services for elasticity, scalability, and ease of management.

Conclusion

Modern applications, characterized by their need for scalability, performance, and data integrity, have been a driving force behind the evolution of databases. As applications grow in size and complexity, traditional database models have faced challenges in scalability and handling massive volumes of data. To address these limitations, new database paradigms have emerged, such as distributed databases, NoSQL databases, and cloud-native databases.

On another side, modern hardware and cloud infrastructure changes have also played a pivotal role in shaping database evolution. The relentless advancements in computer hardware, including faster processors, larger memory capacities, and high-speed storage technologies have enabled databases to handle more significant workloads and process data more efficiently. These improvements have opened doors for innovations in database design and optimization.

The MySQL landscape has continuously evolved to address the challenges of data storage, scalability, performance, and flexibility. As technology continues to advance, databases will continue to evolve, adapting to the ever-changing needs of the digital age.

--

--

Li Shen

Author of TiDB, Focus on Modern Infrastructure Software, Opinions are my own