What exactly RPO means

During any Disaster Recovery negotiations, planning and designing you every-time operating by SLA terms like RTO and RPO. They are very common, but extremely important notions which can simplify research of right method/tool to meet your DR requirements. In short, this abbreviations stands for Recovery Time Objective and Recovery Point Objective which you of course know and without my boring notions like that. But, I found that far not every one specialist which participate DR negotiations understand what exactly mean RPO and how it is actually performs.

So, in short, RPO is measured in time and indicates that portion of data that you tolerate to lose during failure of main/protected dataset. Most people think that if we will set RPO to, for example, 1 hour, replication job will be scheduled to performs one time every hour. THIS IS NOT TRUE AT ALL! Actually we will have more than one replication job if of course we will have what to replicate, i.e. dataset will be changed in that period of time. You may ask, WHY? Ok, I will explain on the given example.

RPO set to 1 hour. Our dataset, i.e. combination of data blocks (or any other granular objects), is changing all the time. Not fast, but this changes very important to us. And we will close our business if we will be unable to recover data younger than given 1 hour interval. Each 1 hour interval! For each of data block! See? No? Ok, read further.

Here is illustration. For example, our given data block was committed at 10am and replication job took up to 50 minutes to to transfer this point to recovery site. Thus at 10:50 we have recovery point with condition of 10am and it will have only 10 minutes before it will be stale and don’t much our given RPO rule. So, we must immediately perform another replication job with lightning speed to fully transfer changes of last 50 minutes of that given block to recovery site in next 10 minutes. And no longer! If transferring for secondary recovery job will performs longer that 10 minutes, for example 40 minutes, we will have RPO violation between 11:00 and 11:30, because first recovery point will stale at 11:00 and second recovery point will be received only at 11:30.

RPO timeline

So, why I must to know actually how often replications perform if replication algorithm deals with this, you may ask? Answer as usually lies in properly preparing your design, and in this case — in network saturation planning to fully comply with replication requirements on channel bandwidth between protected and recovery sites.

BTW, ask your vendor representative to clarify how exactly their replication tool deals with RPO in your environment.

follow Igor Nemylostyvyi on Twitter