NoSQL migration — when will you know when to make a move?
By Patrick Callaghan, Vice President Enablement, DataStax
The spend on cloud services shows no sign of slowing down. Forrester has predicted that the revenue from public cloud infrastructure, platforms, and applications will continue to grow over the next few years up to 2022, reaching $411 billion worldwide. This spend is based on the promise that cloud can help you speed up your services, deliver more effective customer service and better support your aims as a business, compared to what you can achieve on your own.
Behind this is a big change in how companies think about their applications and their approach to designing software. Rather than building these services as traditional applications, companies are now thinking about how to use cloud services first. Whether you are at a new company that has a blank sheet of paper to work with, or an existing business that has taken cloud on over time, this “cloud-native” approach to IT can involve a significant mindset change.
As part of this, you have to change how you think about data. The sheer volume of data that you will have coming in means that you have to consider how to capture, store and manage that at scale. Equally, you have to address your requirements for availability, as customers expect any service to be operational around the clock with no downtime. Traditional databases are no longer able to scale up enough to meet these needs.
Moving to NoSQL
Instead, non-relational or NoSQL databases are a perfect fit for this use case. From their initial development to meet the needs of large internet companies, the NoSQL database category has expanded to fit the requirements for a larger amount of enterprises and businesses that now face the same kinds of problems. However, there are a number of questions that you’ll have to answer if you want to make the move from relational to a non-relational database, or to implement a NoSQL database for the first time.
For applications that have to run for a global audience — and today, many more companies can have customers around the world — having them distributed globally makes it easier to meet customer experience demands. Similarly, any database that will support this kind of distributed application will also have to be distributed. Keeping your data close to your application can reduce latency and ensure performance.
More application developers are using cloud-native services and software containers today to deliver the software they create. Containers provide a great way to scale up and down on particular services based on demand levels, and tools like Kubernetes have emerged to make orchestrating these container-based applications easier. However, the back-end infrastructure side is often not considered here. For NoSQL databases, the availability of Kubernetes orchestrators that can make it easier to scale up the number of database nodes alongside any increase in demand for the application.
Planning ahead around data
Alongside this, CIOs are concerned about being locked into specific providers. This fear is a historic one — in the past, companies were effectively forced to stick with applications, operating systems or providers due to the cost of moving away. The cloud was supposed to be a way to break that dependency on internal IT, yet it also represents its own kind of lock-in. If you design and build your applications on elements that are only available on one cloud provider, then moving away requires a reworking of the application. This can rapidly become too expensive and too risky.
Planning ahead on this front around data represents a way to prevent lock-in. Rather than depending on any one cloud provider, running across multiple locations in a multi-cloud model or across a mix of hybrid internal and external IT services can ensure that you remain in control over your decisions. However, this does mean looking at a database that can run across multiple locations and service providers independently and in the same way.
This planning process can help with availability and resilience concerns too. If you aren’t dependent on any one location or service, then your application should be able to survive the loss of any one of those services. This is about more than any single node or server going down — instead, your service should be able to carry on running despite the loss of any data centre or cloud service provider. The ubiquity of services like AWS or Microsoft Azure is great for businesses, but your availability planning should include their success, not be dependent on it.
Being data independent means that it is simpler to migrate over time. Rather than snapshotting a dataset then having to pause your application to complete the cut over, you can copy that data set over in the background then synchronise the old and the new instances. Once the new data set is in place and up to date, the old version can be decommissioned. For users, there is no pause or gap in service that they can perceive.
What database is right for you?
There are multiple types of database and different ways to deliver them. What one is right for you will depend on how much you value availability and service levels compared to management responsibility and overheads. For some use cases, simple implementations that can be supported by developers alone will be enough; for others, partition tolerance and availability will be prime concerns.
The key considerations should be how you will structure your data, how much you can stand potential downtime, and how many customers you will have to support over time. For applications with high concurrent user numbers and no ability to accept downtime, Apache Cassandra can be an excellent option. It supports running across multiple services independently, and it can be available as a managed service, as a standalone implementation, or alongside other services like analytics and search.
Organisations of all kinds have to support more data and more performance demands. As applications get more distributed and dispersed across the world, the infrastructure side is getting more complex. Database services like Apache Cassandra are essential to support these new cloud-native applications, so they can deliver what customers expect.