The loss of data and disruption caused by unplanned data center downtime can cost businesses. Businesses need to have a disaster recovery plan that minimizes application and data losses. But obsessing about downtime can drive up costs, too, as it can lead you to invest in a more robust solution than you need.
To create the most effective disaster recovery and business continuity plan for your organization, you need to assess the needs of each of your workloads separately using two critical metrics: Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
- Recovery time objective (RTO) – The acceptable amount of time from failure to restoration of business systems and services after a disaster.
- Recovery point objective (RPO) – The maximum amount of data loss the business deems acceptable following a disaster or failure.
Here are three questions you should ask when defining RTOs and RPOs for each of your workloads:
Q1: How mission-critical is this workload?
The answer to this question largely defines your RTO (and somewhat your RPO), so we’re going to put a lot of emphasis on this one before moving on to a couple additional questions you should ask.
We can divide the criticality of applications into four tiers:
We tend to use the term “mission-critical” loosely, so to help you set your disaster recovery objectives effectively, we need to lock down what we mean. In this discussion, mission-critical applies only to those applications and systems that the organization requires in order to continue to do business. If they’re down, the business is down.
For an organization doing the majority of its business online, this might be the customer-facing website as well as any order entry or inventory systems that allow customers to continue to place orders. Obviously, these systems require the lowest RTOs and RPOs and are candidates for high-availability, synchronous replication. With the technologies available today, some of these solutions can bring recovery times down to near zero.
These are vital workloads in which you don’t want to lose any data, but the business can keep functioning for at least a little while even without them. Financial systems, human resources, and payroll often fall into this category. Asynchronous replication, where the data is written to the primary storage array first and then to the fail-over systems, may be adequate. There will be some data loss due to the delay, but it can be measured in minutes.
Important workloads may also be vital, but they are one step further removed from mission-critical. You might decide, for example, that you can do without your marketing execution systems for several days without suffering any serious side effects. These systems can be handled with the least expensive disaster recovery options such as asynchronous replication or physical backup system, e.g., tape backup.
Finally, you may have an archive of old data that you need to keep for one reason or another, though it doesn’t need to be accessed frequently (or even at all) in the course of doing business. Again, the least expensive solutions can be used to back up these systems. Just remember that the mediums used in physical backups can degrade over time, so if you absolutely need to keep these files in an archive, you may want to back up your backups from time to time.
Q2: How easily can the data be reconstructed?
The answer to this question can help you refine both your RTOs and RPOs for each workload. Let’s say, for example, that you prioritized something simple like the systems your field service personnel use to log their time and call details as important, but not critical or mission-critical.
Your technicians can continue to make customer calls and just log their time and the results of the calls on the old paper systems you used to use. Once their systems are back online, a data entry clerk can enter the details from the call sheets into the system. In your initial assessment, you put this workload in the “important” tier and marked it as a candidate for tape backup.
You may want to rethink that. If your data backups get run every night, you could lose as much as a day’s worth of data. Depending on the business you’re in, that data might not be so easy to construct. For example, if a technician makes six or seven calls a day, it could be extremely difficult for them to recall what happened on every call, the time spent, and what the results were.
This could have a significant negative impact on customer service.
Q3: What compliance requirements govern this workload?
The answer to this final question can affect both RTO and RPO. For instance, HIPAA requires healthcare providers (and their Business Associates) to have a disaster recovery plan that covers any systems that contain ePHI to ensure availability and data protection.
Though the regulations stop short of dictating RTOs and RPOs, if you’re in healthcare, the nature of the data you collect may require lower targets than other industries. If a natural disaster strikes your area, faster recovery and less data loss for your business operations means you’ll be better able to treat the healthcare needs in your community.
Find the best disaster recovery plan for you
When we work with clients on their disaster recovery strategies, setting RTOs and RPOs per workload is a vital early step. It’s vital to ensuring their disaster recovery plan meets their minimal downtime goals, without blowing their budget. Whether it’s using Backup as a Service or cloud backup, we can help find the best solution for your business. Contact us to learn more about building a disaster recovery plan.