The Strategic Guide to Disaster Recovery and DRaaS
Achieving IT Resiliency Through Disaster Recovery Planning
A disaster can strike anywhere at any time. When it comes to your IT systems, disasters include storms, floods, and earthquakes, of course, but also human error and cyber-attacks. Any of these can significantly affect operations, so the survival of your business depends on your disaster recovery plan.
Disaster recovery (DR) is the process of returning an organization’s mission-critical IT systems to a functional state. A well-tested disaster recovery plan lets an organization continue serving its customers, partners, and employees during unplanned downtime, and can greatly reduce data loss.
Disaster Recovery as a Service (DRaaS) is an increasingly popular option for handling disaster recovery. It is expected to post a CAGR of close to 36% between 2018 and 2022. DRaaS uses orchestration technologies to automate replication and recovery for better protection and manageability.
Fortunately, with cloud computing and DRaaS, disaster recovery becomes cost-effective for any size organization. A good DRaaS partner helps clients evaluate their downtime risks and system dependencies and design a disaster recovery program, before disaster strikes.
In this guide, you’ll learn about disaster recovery and why you need a DR plan, as well as how to write a disaster recovery plan. You’ll also learn the meaning of certain key terms, such as RPO and RTO, and discover tips on how to choose a DRaaS provider. Let’s get started.
Businesses are investing more in disaster recovery and business continuity because of the need to keep services available 24×7, meet compliance requirements, and mitigate downtime costs. (Source: Forrester’s Always-On, Always-Available Digital Enterprise)
A disaster recovery plan focuses on IT systems and is just one component of a business continuity (BC) plan. A business continuity plan covers how the business will protect its employees, minimize losses, and continue to serve its customers under adverse conditions. In comparison, a disaster recovery plan only covers how the business will protect the IT systems that the business depends upon.
4 questions to answer in a business continuity plan
Start with four fundamental questions:
- How much downtime can your business afford?
- What applications does your business need to function?
- How current does the data need to be for each application?
- How long can the business function without each application?
Even if you are not in a tornado alley or flood zone, your organization needs to prepare for likely hazards.
Business leaders are becoming less confident in their abilities to recover from a disaster. Almost one in five (18%) said they “had concerns” about or were “not confident at all” in their disaster recovery plan.(Source: TalkingPoint | A TierPoint Blog)
Reason #1: Downtime costs are high
Downtime is a dirty word for CIOs. When applications and data are inaccessible, employees can’t do their jobs, transactions don’t go through, and business revenue comes to a halt.
Downtime also affects your customers and partners, who expect your IT systems to be always available; any downtime can drive them to a competitor. Complying with government regulations also puts pressure on IT to maintain an always-on, always-secure infrastructure.
The costs of downtime are incurred in many ways, including claims by customers and partners, regulatory penalties and legal expenses, and executives’ time. Intangible costs can include loss of customer goodwill and company reputation.
Fast facts about downtime
Disaster recovery planning and testing can reduce your risk of downtime.
Reason #2: Cyber-threats are on the rise
Cybercrime represents the fastest growing cause of data center outages and have doubled since 2016. According to Ponemon’s 2019 Cost of a Data Breach report, the average global data breach costs 3.92 million. In the United States, it’s an average of $8.19 million per breach.
While an unprepared organization may face weeks of downtime from a hurricane, cyber-threats such as ransomware attacks can cause significant downtime as well. A ransomware attack is designed to gain access to and encrypt an organization’s data and files. The attacker makes it impossible to decrypt the data without a private key, which is usually stored on the attacker’s server, until the ransom is paid.
DRaaS supports timely patch management to improve cybersecurity and enable faster recovery of damaged files.
Reason #3: A previous disaster or threat caught your organization by surprise
IT professionals who have experienced a downtime event say their top recovery challenges include:
- Business-side expectations that didn’t match actual IT capabilities
- Insufficient testing of the disaster recovery plan
- Lack of staff
- Out-of-date or inadequate plans
- Lack of communication between IT and business
Testing a disaster recovery plan annually keeps it up to date and helps prove it will work when the business needs it.
Reason #4: You need to stay compliant
HIPAA, PCI DSS, and other regulations aren’t suspended when a disaster strikes. If your business is regulated or working with confidential customer information, you need to maintain compliance, even when things get messy. This includes records retention rules, which have been part of regulatory compliance since before computers were commonplace in business.
A disaster recovery plan can help your business stay compliant when things go wrong.
Every business is unique, so to be effective, your disaster recovery plan needs to be closely aligned with the needs of your business.
Start building an effective disaster recovery plan by acquiring a deep understanding of how the continuity of your business depends on your IT environments.
To write your disaster recovery plan, start by analyzing your business and understanding how a disruption will affect it. Identify which systems, applications, and data are needed and prioritize how vital each is. These insights will inform the replication frequency and retention time for backups that you will set.
Make a DR plan that overcomes data center outages large and small.
Disasters are not only natural events; cyber-attacks and human error also cause significant IT downtime that can harm business continuity. How will the business respond to, and recover from, a natural disaster, cyber-threat, or rogue employee?
DRaaS uses the same tools as cloud migration in many cases, so the availability of these tools can increase the agility of your business and IT. Consider the business’s long-term goals for the cloud in your disaster recovery plan.
The 3 components of a disaster recovery plan
Your plan for disaster recovery should prepare you to:
1. Prevent a disaster, whether man-made or natural, from affecting your IT systems
2. Keep your IT systems and applications running, or restore them quickly when downtime strikes
3. Preserve and protect your business’s mission-critical data
A 10-point disaster recovery plan checklist
To build a better disaster recovery plan:
- Build your disaster recovery plan on business continuity
- Understand your dependencies
- Tier your applications to get the most important applications recovered first
- Understand the impact of data change rates on replication bandwidth
- Set requirements for recovery environments, including service level agreements (SLAs)
- Choose your replication methods based on recovery point objectives (RPOs) and recovery time objectives (RTOs)
- Identify internal resources and application experts
- Test your disaster recovery plan regularly
- Derive short-term return on investment (ROI) from your DRaaS environment
- Derive long-term ROI from your disaster recovery environment
Set recovery point objectives (RPOs) and recovery time objectives (RTOs) by individual application to ensure the most important applications are recovered first.
Requirements for minimizing data loss and downtime will differ by industry and type of business. A bank will have different requirements than a wholesaler, for example. Recovery point objectives (RPOs) and recovery time objectives (RTOs) tie the disaster recovery plan to the real needs of each business.
Recovery point objectives (RPOs) identify acceptable data loss
Recovery time objectives (RTOs) identify acceptable downtime
A recovery point objective (RPO) is the point in the past from which you want data recovered. This is measured in minutes, hours, or even days. For example, an RPO of 15 minutes would mean the business can only afford to lose 15 minutes’ worth of data.
A recovery time objective (RTO) indicates the amount of downtime to be allowed between a disruptive event and a return to operational status – how long before an organization needs to get its servers back up. RTO gets to the heart of disaster recovery.
A short RPO, such as 15 minutes, indicates very little data loss is acceptable; a longer RPO, such as 24 hours, indicates less critical time frames for preserving data.
In traditional disaster recovery, it’s common to set (or at least grudgingly accept) an RTO of 24 to 48 hours, or even longer, which can be achieved by an offsite data center or tape backups. When disaster hits, however, these methods are slow to recover.
An RPO largely determines the frequency of data replication required in a disaster recovery plan. RPO also factors into trade-offs between budget and tolerable data loss.
RTO largely determines what technologies will be used to recover systems and data, and in what order.
The cloud-based automation behind disaster recovery as a service (DRaaS) allows for a much shorter RTO.
With a DRaaS solution such as cloud-to-cloud recovery services, it can take as little as 15 minutes to get back up and running. Because there’s no need to spin up new hardware and servers and restore from backups, DRaaS can compress traditional disaster recovery processes from days to minutes.
Application tiering minimizes downtime in disaster recovery
In a good disaster recovery plan, you will prioritize applications to ensure the most important ones return to service first. In application tiering, you make strategic decisions about which applications and data are the most urgent. Not only do different businesses have different RPOs and RTOs, different applications within the same business will have different RTOs and RPOs.
Customer-facing applications demand a shorter RPO and a lower RTO, because data loss and downtime can have a severe impact on the business.
Internal or administrative applications that aren’t mission-critical may be able to withstand a higher level of data loss and more downtime.
With tiering, applications are grouped by their RPOs and RTOs, allowing them to be prioritized for disaster recovery.
Setting an RTO too long or an RPO too high can put the organization at unacceptable levels of risk. Conversely, setting RPO and RTO too aggressively increases costs and ties up capital.
Application tiering prioritizes applications for recovery based on two factors: business importance and dependencies on other systems and applications. Mission-critical applications are near the top, but they depend upon the availability of network services and other systems that need to be recovered first. Those other systems are therefore an even higher priority for recovery.
There are many reasons to embrace DRaaS, but the one overriding motivation is decreased downtime.
Disaster recovery as a service (DRaaS) is the replication of data and the hosting of physical or virtual servers by an expert third-party service provider. That provider can deliver quicker and more complete disaster recovery, protecting the organization by maintaining business continuity.
Compare DRaaS to traditional DR
DRaaS offers significantly faster recovery times than traditional shared-storage DR or expensive secondary data centers, which were once a standard solution for large corporations. It is thus becoming the first choice of disaster recovery options for businesses. More and more organizations will be using DRaaS than traditional recovery services.
While traditional DR solutions can protect data and enable companies to recover after a disaster, traditional disaster recovery technologies don’t do it quickly – and can result in substantial data loss. In traditional DR, the communication between the primary production environment and secondary site typically happens on a set schedule, often after business hours, which results in a 24-hour recovery point.
With DRaaS, recovery time can be minutes. The primary and secondary environments can stay in near-constant contact, with bandwidth availability being the only major constraint. As a result, DRaaS can get applications back and up and running in minutes.
DRaaS can also help organizations comply with data storage requirements, ease migration to the cloud or between clouds, and enhance security and patch management.
The 3 types of replication
DRaaS allows organizations to replicate primary sites to the cloud, from which data, servers, and applications can be restored as needed in the event of a disaster. As data changes at the primary site, it is replicated to a recovery site.
Many organizations use a combination of replication technologies to address application tiers with different recovery requirements. Each of the three key types of replication technologies has its own pros and cons.
Synchronous replication offers the shortest recovery point objective
Data is written to multiple sites at the same time, so the data remains current among sites. Synchronous replication is more expensive than other types. It is also extremely sensitive to latency, so it requires the sites to be close to each other.
Asynchronous replication is the most popular replication technology
Data is written to the primary storage array first and copied to replication targets in real time or at scheduled intervals. Asynchronous replication requires less bandwidth, is less expensive, and works over larger distances.
Backup services limit data loss but do not enable recovery
Data is archived and stored or used for recovering non-critical data. Backup services include cloud backup services and other online backup services, remote file backup, and local tape or disk backup. Backup as a service (BaaS) offers the best protection from data loss.
DRaaS uses automation and orchestration for failover and failback
Automating manual processes with automation and orchestration simplifies the complex tasks of recovery, implements them much faster, and eliminates mistakes.
Automated failover and failback can dramatically speed recovery
DRaaS allows for automated and almost instantaneous failover to one or more clouds. When the primary site fails, control is automatically switched to the cloud site with the replicated data.
Later, when the outage is resolved, DRaaS lets the organization return control to the primary site through a process called failback that ensures data stays current.
Orchestration minimizes manual processes for a faster, safer recovery
Failover and failback are complicated. Orchestration of DR processes makes failover and failback much easier and faster. Specifically, DRaaS solutions use orchestration to ensure that priority applications and virtual machines (VMs) are brought up in the proper order and with the correct settings.
Cloud backup services will protect your data, which makes backup-as-a-service (BaaS) ideal for data retention and compliance. But backup does not provide fast recovery.
An offsite backup plan, including cloud backup services such as backup-as-a-service (BaaS), preserves your data, but it doesn’t constitute a disaster recovery plan.
Securing data with offsite backups is good for many purposes, but it’s not good for recovery. If you have backed up your data offsite, you can retrieve it as of the last recovery point. But your backup only captured data; you’ll need the applications, too.
Offsite backups typically don’t back up the applications needed to access the data – or any of the systems those applications depend upon.
You can’t use the backup data until you restore the applications, because you need the application to access the data to get moving again. Cloud backup services do not provide automation or orchestration for application recovery. In DRaaS, the data is saved, the applications are replicated and recovered, and recovery is orchestrated with automation to get your data back online quickly – in minutes, compared to the hours or days it can take with backups. It can be easy to move from cloud backup services to DRaaS. Once you’ve identified your mission-critical applications and databases (such as ERP, CRM, and Active Directory), you’re ready to find a DRaaS partner. The benefits of DRaaS could be yours within weeks.
The success of your DR plan depends upon choosing the right disaster recovery partner.
Here are five recommendations to get you started:
Determine whether the DRaaS provider has the expertise you need in regulatory compliance, security services, and hybrid cloud deployments, as well as experience with the platforms and applications you use.
Certifications can tell you a great deal about which platforms the vendor is qualified to support.
Determine how long the DRaaS provider has been in business. More specifically, evaluate its experience in delivering disaster recovery and DRaaS solutions.
A DRaaS provider that has a menu of multiple solutions can generally deliver a more flexible, customizable disaster recovery solution.
Look for a provider that can meet your business needs for RPO, RTO, service level agreements, and data center locations.
Synchronous replication costs a lot more than asynchronous replication. To manage costs, make sure your RPOs and RTOs reflect true business needs.
Ensure the provider offers solutions that allow the testing of DRaaS systems with minimal to no disruption to your business operations.
Test your disaster recovery plan at least once a year.
Consider other DR questions your organization will face in a disaster scenario. For example, if workers can’t get to the office or data center, where will they work? Ideally, your disaster recovery provider should be able to help you.
A DRaaS provider that offers multiple disaster recovery solutions may offer other relevant services too, such as business continuity workspace services.
Take a fresh look annually at the providers you rely on to help you meet your recovery objectives. Ask the right questions to choose a new DRaaS provider or evaluate the strengths of your current disaster recovery vendor.