Aug 31, 2017

A Beginner’s Guide to RPO and RTO

HorizonIQ, test

Data Centers | Disaster Recovery

I’m constantly reminding my wife to hit ‘Ctrl + S’ in Microsoft Word as she’s writing papers or lesson plans for her classroom. She’s a first-grade teacher, and I love her dearly. But why do I harp on this subject? Because too many times she’s been a page or more into her writing and the application will lock up or crash, or her computer will shut down. Oops… All that work has to be redone. She has to restart her PC, load up the Word app, and get started again, losing 10 minutes for the restart process and another 20-30 minutes just to regain her thoughts on where she left off.

Why am I telling you about my wife’s questionable habits when it comes to saving her work at reasonable intervals? Well, it’s because I work in information technology, specifically within the managed cloud hosting and infrastructure space. The scenario I describe above is a simple way to understand some commonly misunderstood terms—RPO and RTO. That is, ‘Recovery Point Objective’ and ‘Recovery Time Objective’, respectively.

What is your Recovery Point Objective?

Imagine you’re writing a document in Microsoft Word. After each short paragraph, you hit ‘Ctrl+S’, which is the shortcut to saving your work. You can also mouse up to the floppy disk icon and click, but nobody has time for that. Nonetheless, each time you save, you are effectively establishing your recovery point for that particular document.

In other words, if you hit Ctrl+S at 1:43 PM, and your PC crashes at 1:57 PM, but you type another few paragraphs during that time, you will only be able to recover your work up to 1:43 PM. So when you restart your computer, boot up Microsoft Word, and reopen your document, you will see the last thing you typed up to 1:43 PM. That’s your recovery point. Personally, I set a personal RPO of every few sentences. After writing 2-3 sentences I hit Ctrl+S.

What is your Recovery Time Objective?

We covered Recovery Point Objective, so now let’s look at the other three-letter term: RTO. Or, Recovery Time Objective. Again, using the scenario above with the Word document, we’ll focus on how long it takes to get back to work once you’ve experienced a failure on your computer or within the application you’re working. In my wife’s case, it’s not the end of the world if it takes her 30 minutes to get back to work. Frustrating and annoying, yes, but no major impact if it happens. Restarting the computer, booting up the application, and getting her thoughts focused again on the work at hand- finishing the document.

Imagine, though, if she was working under an incredibly tight timeline and every minute was critical to her success. That 30 minutes could make or break her completing an assignment on time, submitting it to her boss, or otherwise. This is a more serious situation that could stand to have a continuity plan in place to shrink that 30 minutes down to say, 5 minutes as an example. If you don’t have the right capabilities or tools in place, however, this may be challenging to solve. The tolerance for ‘time not at work’ in this example is the equivalent of your Recovery Time Objective. Can it be an hour, 30 minutes, 2 minutes? That’s for you to decide based on the criticality of the work you are doing and the time constraints you are working under.

How does RPO and RTO impact my business?

Hopefully, you now have a better understanding of the terminology and concept of Recovery Point and Recovery Time Objectives. Let’s take it a step further and draw a direct parallel to the businesses we work in.

Backups

You probably already have a good understanding of what backups are, but as a refresher, I’ll give a quick explanation. Simply put, backup refers to the copying of physical or virtual files or databases to a secondary location or site for preservation in case of equipment failure or other catastrophe. Using the Word document analogy from earlier in the article, a backup in that example would be to have the document saved also on another PC, in a Google Doc, or otherwise. It’s someplace else you can go get the document should you lose it completely from your primary source.

For businesses, backups could consist of a wide variety of data. Customer records, billing information and financial records, patient data, web files, intellectual property such as application code, and so on. In the unfortunate event of a system or application failure, you never want to be in the position of not having a backup plan in place and ready to execute when the time calls for it.

There are a number of ways to achieve a sound backup strategy in the B2B world. We promote Veeam’s 3-2-1 method, which consists of :

Have at least three copies of your data.
Store the copies on two different media.
Keep one backup copy offsite

At a very base level, it’s important to think about how you are currently backing up your data, and what gaps you may need to consider filling when considering the 3-2-1 rule above.

Disaster Recovery

The ultimate in business continuity planning is having a thoughtful Disaster Recovery plan in place. This answers the question to stakeholders when they ask: “What happens in the worst-case scenario?” This could mean natural disaster strikes, catastrophic hardware failure, malicious activity from an employee, software bug, or failure, and so on. How long will it take your business to be operational again when disaster strikes? Do you currently have an answer to that question? If not, then read on.

With backups, the good news is that your data is retrievable. It’s somewhere you can go get it and it’s in good shape to get you to an operational state again, at some point. But, just because you have backups doesn’t mean that you have the capabilities or delivery mechanisms in place to actually do anything with it. Let’s use the event of a catastrophic hardware failure.

Your servers were so old that parts failed and you simply can’t get them to work again. In this scenario, you would need secondary servers to connect to your backup system in order to retrieve that data and begin allowing access to your users again. Now, if you don’t already have those secondary servers on hand, it may take you weeks, if not months just to get the hardware to your site. Then, you still have to do all the setup in order for them to run and function like you need them to. You could be inoperable for months. There are not many businesses, if any, that can afford to be in that situation.

A much better scenario is one where you have a plan in place that would enable you to quickly restore data and services and maintain operations with little to no downtime. Having a disaster recovery solution ready to go at the time of disaster keeps the risk of downtime to a minimum. Think about the question we asked earlier. “What happens in the worst-case scenario?”

What if you could answer that by saying, “Our worst-case scenario is mitigated due to our ability to replicate our critical data and applications to our secondary site? In the event of a disaster, our maximum downtime would only be 1 hour because we have the systems and technology in place to seamlessly failover to continue operations.”

Now that’s an answer your boss would love to hear!

If you want to learn more about how you can have the right answers to these important business and technology questions, contact us and one of our specialists would be delighted to help.