Pushing Bits: Everything Under the Rug of Data Replication
One of the most challenging technical bits when designing a disaster recovery plan is moving the data. Unfortunately, it is too easy (both mentally and verbally) to brush over the myriad of complexities with just one word… “replication”.
Don’t get me wrong, there are a lot of important facets to getting DR right, from strategy to playbooks, but if you are going to have a useful and reliable recovery, your data moving strategy needs to be sound.
Don’t hit your boat on the weight of the iceberg under the surface, consider these five areas as part of your replication strategy.
Network Topology — While it seems obvious that you need to know where plumbing goes, one nuances that is often overlooked is the distinction between the replication channel and the control channel.
- The control channel often does not need much bandwidth, but it does need administrative access to you SAN controller or replication software, which often requires OS or hypervisor admin privileges. This highly privileged access crosses departmental and functional boundaries, so care should be taken to ensure that access to these control networks is very secure and that role-based access control is setup with only the permissions needed to manage the replication.
- The replication channel is where the heavy work is done. All the bits are going to flow over this channel and it is critical that the replication channel is as simple and robust as can be. In many cases the control channel can be interrupted without impacting the data replication, but if the replication channel is broken, it can cause resending or even re-checksum of large sections of data. The replication channel will be carrying large, encrypted data blocks which are easy to route, but expensive to inspect and virtually impossible to compress. Consider deploying technology that performs security and optimization measures before the data is transmitted to get the most out of your bandwidth, routing and switching technology. Also, be careful that the network components on this channel can handle the full bandwidth and packet sizes of your replicated data.
Security — The fact that data is being lifted and sent out of the datacenter is probably the reason the entire security budget exists. To ensure that it stays safe during this process, take a close look at your security policy that applies to the data being replicated and ensure that the way it is being transmitted, landed, and handled is in line with your internal policy. Many security concerns can be handled with a combination of VPN encryption and by ensuring that it lands on a target with encrypted storage.
Rate of Change — How much data are you going to be moving? This is a multifaceted variable based on rate of change, data compression, encryption, and caching to name a few factors. One way to get a rough estimate of the amount of data you will be moving is to take a look at the size of the nightly backups for the workloads you are protecting. This won’t capture data that changes state multiple times in a day, but it will get you started.
Bandwidth — Once you know roughly how much data you’ll be moving, calculating the amount of bandwidth needed is a math problem. A rough rule of thumb is that it takes roughly 1 Megabit to move 10GB in 24 hours. So, if you have 1TB daily change, you’d need to start thinking about over a 100Mb pipe. But, you don’t want to make the pipe that tight. You need to be able to handle resync and change spikes and you can only really only count on 60%-70% of that pipe as usable. TCP and VPN overhead can eat up the rest depending on replication tools and packet size. You get better pipe utilization if your replication is tuned to send fewer, larger packets instead of lots of smaller ones.
Transaction Consistency — Transaction consistency isn’t just about making sure the database comes up clean. If your App server has queued transactions that the DB server has already processed, it could end up running them again if the two servers are not in sync. Make sure to choose replication technology that understands how to keep multiple servers replicating in lock-step.
While designing your Disaster Recovery plans, it’s easy to sweep a lot of details under the rug of the word “replication”. Don’t underestimate the impact the topology, security, rate of change, bandwidth and transaction consistency needs can have on the initial and ongoing success of your strategy, and plan to take time for a deeper look into these details.
— Ben Miller is Product Solution Director at Bluelock, an industry-leading Disaster Recovery-as-a-Service provider for complex environments and sensitive data. An expert in cloud computing solutions with a background in technical consulting, managed IT services, and disaster recovery, Miller has more than 20 years of experience in solving cloud, virtualization and enterprise system IT problems across industries. Miller holds an MS in Instructional Systems Technology from Indiana University Bloomington. Connect with Ben on twitter at @vmFoo.