How Database Scaling Meets The Needs Of High Value Real-Time Transaction Businesses
Cataloging, analyzing and storing new data at the speed of light is one of the biggest opportunities database companies are working on today. It’s also a significant threat to business in a number of ways.
Imagine a massive library — one with actual books, and librarians and millions of shelves in a row. Now imagine you need to run a calculation with information on page 3 of a book on mice, and page 576 of a book on agricultural growing seasons. You’d send the librarians off running to collect the resources you need, flip through the books and complete your analysis. But what happens when this library doesn’t stop growing?
Scaling Database Trends
While scaling is certainly not a new problem, it’s becoming a more pressing one. As the scope of Big Data becomes available and necessary for smaller businesses, the need to both write and analyze transactions in real time grows. The demand for low latency databases has never been greater. Gary Orenstein, CMO at MemSQL, illustrates the problem for TheNewStack. “End customers are needing the ability to do transactions and analytics simultaneously. They need to be able to run analytics on those transactions. They need to be able to query the data up to the last click… that has been impossible before.”
For example, ecommerce giants like Amazon, require interactions with inventory, shopping carts and prices data sets across a huge number of suppliers and shoppers simultaneously. Days like Prime Day mean an enormous amount of transactions need to be carried out quickly to ensure shoppers aren’t dropped.
Often the sticking point for companies looking to scale is when their databases become too big for a single machine. This basic scaling dilemma will impact every database as they grow exponentially. How do you keep track of data across multiple systems? The number of operational transactions and complexity increases dramatically as you partition your database. Inevitably this leads i to slower analysis and performance.
There are a number of ways to resolve the problems of a growing database loosely grouped by two ideas: sharding, and non-sharding.
Scaling with Shards
Sharding is hardly the new kid on the block when it comes to database scaling. MongoDB, one of the biggest players in databases, has relied heavily on sharding to improve performance on massive databases. Let’s return to the library metaphor, except this time we split it into separate buildings based on data type and data region. Immediately this improves our ability to search and locate more efficiently.
Smart sharding partitions your database into smaller, more manageable databases, each with their own CPU, memory and disk. Each of these small databases enjoy improved performance. While you lose the ability to access data in one central location, you gain the availability to scale past what one machine can handle and that’s attractive to business
To Shard Or Not To Shard?
However, sharding isn’t always the best option anymore. By partitioning your data, you limit the scope of any transaction. Seth Proctor, CTO of NuoDB, explains that as you begin creating multiple data sets, you “lose global consistency and have disjointed services to manage, and storage points to maintain and backup.” It concludes that “you definitely cannot run transactionally consistent analysis on the entire data set.” In today’s Big Data world, this is simply not acceptable anymore.
It can also be very complicated to manage sharding. You need to have a plan early to know how many shards you need, and how you’ll be partitioning your database. While there are solutions like Twitter’s Twemproxy and Compose for MongoDB, teams can often be forced into making architectural decisions before they are comfortable.
Consequently, many database providers are going against the traditional sharding route by utilizing smarter methods of acquiring higher readwrite speeds.
Clustrix slices data intelligently across clusters, with multiple copies stored for high availability. Using independent indexes allows slices to work as peers instead of separate databases, reducing development complexity and reducing latency. It’s like filing books away in sensible, topical groups, with duplicate copies in other relevant groups.
NuoDB uses a distributed architecture combined with in-memory peers to access and analyze data across multiple nodes using caches without increasing latency. As more power is required, more in-memory peers can be added, so resources are managed effectively. It’s like using a computer to track down the book you need, instead of relying on a librarian reading a catalogue.
To shard or not to shard? It’s definitely not a question for which we have a definitive answer, yet, companies are constantly coming up with more creative solutions to sharding.
Reduce The Need For Scaling
If you’re considering a move to a different scalable database, you know it’s going to be a resource intensive project. Besides the obvious infrastructure investment, there will be code rewrites and process updates, slowing down development while you migrate. In fact, the amount of time you’ve put into your traditional database might make moving altogether unfeasible.
In order to make the most of your existing solution, it’s also important to reduce the need for scaling overall. Strategies to manage transaction volume are often low-risk, low-resource projects that can be managed by a different team.
For example, SlashDB offers an API tool that turn SQL databases in HTTP resources. This allows teams to access their data in other formats such as JSON and XML. Focusing on API scalability will ensure that your traditional database sees a full and healthy life.
In other words, don’t just work to scale your database, reduce the need for scaling overall.
Further reading: Five Ways To Scale Your API Without Touching Your Code
The Future Of Your Databases
Probably your business won’t feel the strain of database scaling equally. You might be able to use one solution for data that needs real time analysis, and another for longer term storage. Planning for long term growth will help you balance the needs and choose good solutions early on in your product’s lifetime. Don’t just plan for today’s growth — you’ll find yourself stuck in the future!
Originally published at blog.100tb.com.