INAP IS NOW HORIZONIQ.
Click here to LEARN more.

Feb 19, 2019

Business Continuity and Disaster Recovery Basics: Making a Plan

Paul Painter, Director, Solutions Engineering

A friend of mine lives on a high prairie ranch surrounded with highly flammable Gambel oak.

Besides trimming the brush to create a defensive perimeter around his house, he has a bag prepared with important, difficult-to-reproduce documents, such as birth certificates and passports. In the event of a wildfire, his plan is to grab the bag, cell phone, backup drive for the computer and evacuate in his car. He’s willing to let everything else be “toast,” becoming part of his claim to the insurance company.

It’s impossible to clear all valuable items from his home, so he has selectively planned to save what is the most important. Whether we’re talking about your personal belongings or your enterprise’s most valuable assets, having a plan like this is the first critical step toward an effective disaster recovery posture.

Disaster Recovery Planning: Determining What’s Essential

A big part of business continuity and disaster recovery planning is just deciding what to recover. Certainly, it’s “simpler” to just plan to recover all operations for all applications, regardless of any individual application’s level of importance to business continuity. But providing resources—financial or otherwise—to recover everything is not feasible for every organization, nor is it necessarily a wise use of limited time, money and staff.

Some compliance standards may dictate what must be protected and may be audited to prove that data is protected. This can make it easy to determine what to protect, since in many ways, compliance regulations already decide for you.

Most businesses, however, just need to restore operations in order to remain viable after a disaster. In this way, what’s saved is usually just a raw factor of what makes the company money. Let’s walk through a couple of hypothetical examples.

Business Continuity Case No. 1

A COO originally placed priority on preserving his organization’s accounting systems; he wanted to ensure that his business was still able to bill for services even after a disaster.

However, he rethought his strategy to focus on revenue-generating systems: He opted to replicate operational systems that supported customers—the systems that actually contributed to his company’s bottom line. Accounting systems were still next up on the list of priorities, but the COO opted to use cloud backups, with a plan to rebuild systems within a month of disaster—quick enough to make the next invoicing period.

In other words, he prioritized making money with the next priority awarded to billing systems, while still protecting the critical data in those systems.

Business Continuity Case No. 2

An organization’s human resource records were preserved in offsite cloud backups, but it didn’t have a concrete plan for restoring them after a disaster event. At the time, HR systems were still run on local servers. (But most HR support systems have gone to hosted applications now, so this was less of an issue.)

A disaster-recovery-as-a-solution (DRaaS) service would quickly solve the issues, regardless, providing straightforward redundancy for critical systems.

Related to HR were the servers that fed training content to employees, which were relegated to the “toast” category. This doesn’t mean you lose them forever though: Using a service like cloud backups will preserve the training data until a plan can be formulated to restore the servers.

But the important part of the decision-making process is determining what needs to be online right away and what can be restored at a later time. I always ask prospective customers about their priorities.

As a third party, it’s useful to try to understand the decision-making process and whether there might be biases or gaps. But there should always be internal stakeholders asking the same questions—before disaster strikes.

Recovery operations can be chaotic enough without having to triage applications’ importance on the fly. Having a prioritized list of applications ready to go is invaluable.

How Should I Use Risk Impact Analysis to Make the Most of Limited Resources?

Beyond the monetary expense of determining what to save is the effort and manpower expense. As poet Robert Burns wrote in 1786, the “best laid plans of mice and men often go astray.”

Still, it’s best to plan out as much as possible and triage applications ahead of a disaster. This way, your IT staff already know where to spend effort and what to put on the back burner when disaster strikes.

A risk impact analysis helps you think through the probability of disaster and how much any given event would impact your business, allowing you to find a balance between your tolerance for risk with what is actually likely to happen.

For example, a server being corrupted during patch operations is high-probability but might only have a medium impact on customers, unless it’s an extended outage. On the other hand, the probability of a long-duration regional power outage affecting the entire eastern interconnection grid is far less likely but would have an enormous impact.

Risk impact analyses usually include:

  • potential threats
  • probability of threat occurring
  • human impact
  • property impact
  • business impact

The spreadsheet used for a risk assessment can be made more elaborate by adding up the impact scores (1-3) and then multiplying by probability (1-3) to produce an overall score (here, the max score is 27). See below for an example of a fictitious firm located in St. Louis, Missouri:

Threat Probability Human Impact Property Impact Business Impact Total Score
Key server failure 3 1 1 3 15
Hurricane striking St. Louis 1 3 3 3 9
Earthquake striking St. Louis 3 3 3 3 27
Loss of manufacturing space to fire 2 3 3 3 18

 

In the above example, a key server failure is something nearly guaranteed to happen and would have an absolute impact to the business. A hurricane striking St Louis is highly unlikely and garners a low overall score.

However, because St. Louis is near the New Madrid Fault, it’s at risk of a catastrophic earthquake affecting personnel, property and the business and thus scores very high.

To show how much this can vary from location to location, the following is based on another fictitious firm in Cheyenne, Wyoming:

Threat Probability Human Impact Property Impact Business Impact Total Score
Key server failure 3 1 1 3 15
Hurricane striking Cheyenne 1 3 3 3 9
Earthquake striking Cheyenne 1 3 3 3 9
Loss of manufacturing space to fire 2 3 3 3 18

 

The key server failure keeps the same score, and Cheyenne is probably as unlikely as St. Louis to experience a hurricane. However, Cheyenne has a less than 5 percent chance of experiencing any seismic activity[1] and therefore has a very low probability of an earthquake. This business may thus want to prioritize not only mitigating server failures but also planning for fire within the manufacturing space.

How to Build Your Business Continuity and Disaster Recovery Plan: Exercises and Templates

I’m a big fan of checklists because I think they’re easier to follow than wordy, step-by-step instructions. (I also buy Atul Gawande’s The Checklist Manifesto for all of my direct reports.) And my friend agrees: If a wildfire is threatening his ranch, he can go down his list quickly and get out.

Similarly, creating a simple five to 10-page plan that is fast, readable, testable and executable will be far more useful than a complex or highly detailed plan that is hard to follow. And once it’s been written, it should be tested, whether formally or informally—but certainly before it’s needed.

You can get started by downloading our Business Impact Analysis Template:

I also recommend defining a communications plan. This might include:

  • A phone tree of key business leaders
  • A preplanned location to meet at time of disaster (e.g., at another business, a hotel conference room, the CEO’s house, etc.)
  • Audio and/or web bridge information for internal discussions
  • An external plan for how to communicate to customers (e.g., using a Twitter feed, a status website with up-to-the-minute information, etc.)

The value of a thoughtful business continuity and disaster recovery plan cannot be overstated. Just like you can’t clear the house of all your belongings in the middle of a wildfire, it’s just as important to determine which systems and resources must be saved when catastrophe strikes.

[1] James C. Case, Rachel N. Toner, and Robert Kirkwood Wyoming State Geological Survey, Basic Seismological Characterization for Laramie County, Wyoming, September 2002

Explore HorizonIQ
Bare Metal

LEARN MORE

About Author

Paul Painter

Director, Solutions Engineering

Read More
Jul 12, 2018

Business Continuity Options for Colocation

INAP

Backup and Disaster Recovery services are probably unique among IT services, in that you have them in the hopes of never needing them. But in an age of sophisticated hackers, increasingly destructive natural disasters and the ever-present risk of human error, the question is not one of if but when you will need business continuity services.

We’ve written about taking a multiplatform infrastructure approach to colocation and cloud and how it’s often the best fit when designing your IT infrastructure from an application-first perspective. But even if you decide to take an all-colo approach to your production deployment, using cloud-based Business Continuity services is still an option you should consider for protecting your critical workloads and data, especially when you can simply add them on to your existing services without having to go to another service provider.

Where to start?

As with any Business Continuity project, you need to first determine your recovery goals. Ask: Do my applications need a zero-downtime solution, or can we tolerate several hours of downtime? Establishing a baseline Recovery Point Objective (RPO) and Recovery Time Objective (RTO) for critical business systems will create a solid framework that will guide your decisions.

Once recovery goals are established, you then need to look at your production workloads. Are you running your critical systems on physical hardware or virtualized infrastructure? Does your recovery solution support physical, virtual or multiplatform environments? Here, a trusted service provider can also come in handy to make sure you can have a business continuity solution that’s flexible enough for your infrastructure and your needs.

Physical

Protecting workloads on physical infrastructure can be challenging, since backup and recovery solutions can only focus on protection at the operating system level, which is bound to the specific underlying hardware. This means that copying the OS to different hardware can cause problems. Should a custom backup and disaster recovery solution be needed, look for a service provider that has additional capabilities to build out application-specific recovery options. This might take the form of colocating identical storage arrays for array-based replication by using the data replication feature built into many mainstream storage appliances. Or it might be building out custom bare metal infrastructure for Active-Active replication of application data.

Virtual

The benefits of virtualization are obvious for your applications: Instead of having just one application per server, you can run several guest operating systems and a handful of applications with the same physical hardware. In this way, virtualization offers unprecedented ability to scale and distribute workloads across your infrastructure.

These benefits extend to backups and disaster recovery as well because virtualization allows critical VM data to be restored or replicated to another location completely independently of the underlying hardware. In addition to VM backups powered by companies like Veeam, R1Soft and Commvault, there are other disaster recovery options that your provider may offer as a service. INAP offers Standby DRaaS and Dedicated DRaaS, which protect your critical VM data either in a pay-as-you-go standby state or with dedicated cloud resources for organizations with the strictest business continuity needs.

Multiplatform Infrastructure

In an ideal world, all workloads would fit in one bucket or the other. But for most, a multiplatform approach will be the most optimal to achieving the operational and financial goals of any business continuity implementation. For example, a company might have virtualized most of their critical infrastructure but still have a legacy inventory system that needs to stay on physical servers because of technology limitations. A service provider like INAP has the ability to provide the virtual and physical infrastructure, as well as management services needed—all packaged into a multiplatform Disaster Recovery environment. This is why it’s important to work with a service provider that has a multitude of options and expertise in any and all infrastructure solutions, whether colocation, bare metal or private cloud deployments.

Explore HorizonIQ
Bare Metal

LEARN MORE

About Author

INAP

Read More
Mar 27, 2013

The business impact of downtime

INAP

business impact of downtimeWhen hearing the phrases “disaster recovery” or “disaster preparedness,” many people immediately think of how they can prevent or mitigate the downtime caused by an extreme weather event – such as a hurricane, earthquake, flood or tornado.

But the reality is that these phenomena are rare, and the much more common causes of downtime are events like power outages, IT failures, and human error. Gartner projects that “through 2015, 80% of outages impacting mission-critical services will be caused by people and process issues.”

Even major global brands are not immune to man-made disruptions, as seen recently in these well-publicized outages: Facebook went down for 2 hours last May and saw its stock price drop nearly 6% the next day; the entire GoDaddy hosting network failed last September, causing millions of sites and emails to stop working; and, of course, the infamous Netflix outage last Christmas Eve which set off a firestorm of angry tweets from subscribers.

While a disruption to your business may not receive quite the same media coverage as the above, the impact can still be significant and, in some cases, disastrous.

For example, ask yourself the following:

  • If your website or payment processing system went down, how much revenue would you lose?
  • How much revenue would be lost if your employees couldn’t work at their full capacity because a critical system was unavailable?
  • What would be the impact if your company was unable to comply with a regulatory audit because of system unavailability or data loss?
  • What if a system outage meant you couldn’t meet service-level agreements (SLAs) to your customers or partners?
  • How would your company’s reputation be affected if your critical IT systems were unavailable for more than a few hours?

Once you start quantifying the potential financial impact of downtime, it should be pretty clear why having a disaster recovery plan is so important.

For a deeper dive on the impact of downtime and guidance on disaster recovery delivery models, attend our April 25th Disaster Recovery webcast featuring guest presenter Rachel Dines, Senior Analyst, Forrester Research, Inc.

Explore HorizonIQ
Bare Metal

LEARN MORE

About Author

INAP

Read More
Mar 14, 2013

Disaster preparedness: recovery vs. prevention

Ansley Kilgore

Data_Center_floor_DSC_5354_680x340Many IT and operations professionals focus on establishing processes and procedures to get systems back up and running after a disruption, but it’s also important to have the right IT Infrastructure in place before disaster strikes. As part of your 2013 IT strategic planning, disaster recovery and prevention capabilities should always be one of the factors you evaluate.

Fortunately, it’s easy to mitigate disruptions when you have the right foundation for your infrastructure. Data centers, colocation services and cloud hosting are designed with business continuity in mind, plus you get the added benefit of improving internet performance. Let’s look at three elements of disaster resistant design and infrastructure that can help you prepare and sometimes prevent disruptions from happening.

Redundant power circuits
Internap colocation facilities help mitigate the likelihood of power outages by providing a second circuit path. Having redundancy options and backup power systems in place can help prevent a disruption before it begins. When evaluating providers, make sure their redundant network devices don’t connect to the same patch panel, Uninterruptible Power Supply (UPS) system, breaker or other infrastructure.

Routing Control
Whether you experience a major outage due to a natural disaster, or are simply having internet performance problems in your local area, our Managed Internet Route Optimizer™ (MIRO) will dynamically seek out the fastest route for optimal internet speed. This results in minimal impact on your business operations, even if your main internet provider goes down.

State-of-the-art fire prevention
Our data centers are equipped with the most advanced fire detection and control technology. We also have strict rules in place to prevent dangerous situations such as power surges from becoming a larger problem.

Don’t overlook the importance of disaster recovery during your 2013 IT strategic planning – put the right preventative measures in place before something unexpected happens. If you’re making decisions on new technologies or services that affect your infrastructure, be sure to evaluate their disaster recovery capabilities. Building the right IT foundation can help you prevent disruptions and avoid lost revenue, waning customer confidence and costly maintenance. The ability to recover your data and maintain business continuity after a disaster is critical to the success of your business.

At Internap, we go to great lengths to mitigate disasters. To learn more about maintaining business continuity, check out our ebook, Data Center Disaster Preparedness: Six Assurances You Should Look for in a Data Center Provider.

Explore HorizonIQ
Bare Metal

LEARN MORE

About Author

Ansley Kilgore

Read More