If 2020 has taught us anything, it’s that we need to expect the unexpected. Although the pandemic has presented new challenges, the ongoing threat of cyberattacks continues to be a concern for businesses. The threats, whether man-made or mother nature, are growing and businesses need to ensure their data, applications and workloads are protected. The best way to do that is to review your disaster recovery plan and test it regularly.
What is a Disaster Recovery Plan?
A disaster recovery plan is often a component of a larger business continuity plan that adds structure to the company’s response to various types of disaster scenarios, as it related to IT systems and infrastructure. Having a plan allows you to respond and recover quickly when everyone is running around in panic mode.
Disaster recovery planning is falling squarely into IT’s domain as their responsibility is to keep the business’ information systems up and running. The metrics used when planning are primarily Recovery Time and Recovery Point. Disaster recovery plan success is all about how well you meet those targets when the chips (and your systems) are down.
Recovery Time Objective (RTO) – The targeted length of time from failure to restoration of business systems and services after a disaster.
Recovery Point Objective (RPO) – The maximum amount of data loss the business deems acceptable following a disaster or failure.
In contrast to disaster recovery planning, the business continuity plan often involves decision-makers from other departments, including legal and public relations. The business continuity plan will answer questions such as: What is the company’s communication strategy to customers and the market in the event of a data breach? How do I continue business operations after a disaster event? How do I communicate with employees after a disaster?
Of course, there will be overlap, so it’s vital for IT and decision-makers from other functional groups to create a disaster recovery and business continuity plan that is aligned on all fronts.
Testing your disaster recovery plan
As I mentioned at the start of this post, recovery point and recovery time are the primary metrics used to measure disaster recovery success. As you evaluate your 2021 disaster recovery plan, you should take another look at the RTOs and RPOs you set. So many businesses experienced at least a short-term shutdown in 2020. These organizations should have a better feel for their real cost of downtime. My guess is that a lot of business leaders will decide to tighten up their RTOs and RPOs.
For more tips on setting RTOs and RPOs, read our Strategic Guide to Disaster Recovery Planning.
You should also test your disaster recovery plan regularly, more often for mission-critical workloads, to ensure everything works as expected and your RTOs and RPOs are achievable. Judging from the number of horror stories I’ve heard from businesses whose backups didn’t work when needed, this seems to be a step that is regularly skipped. Not a good idea.
Of course, not every disaster recovery testing needs to be a live simulation of a disaster. You can think of disaster recovery testing on three levels:
Level one: Plan the work, then work the plan
One of the easiest types of testing is the walkthrough of your disaster recovery procedures. But just because it’s easy doesn’t mean it’s less essential than the other types of testing.
A procedural walkthrough is particularly important for processes like backups that, for many businesses, are at least semi-manual. In the walkthrough, the team evaluates the defined process, including such elements as how often different workloads are backed up, who is responsible, and where these backups are stored. The evaluation should also consider what happens in the event of a disaster, e.g., who will be responsible for procuring the backups from the offsite facility and how the backups will be restored.
Now, comes the most crucial step in your walk-through evaluation. The team needs to honestly assess how well the company is adhering to the defined processes. Daily backups may help you meet your RTOs and RPOs for certain workloads, but not if these “daily backups” are only performed once a week. If you’ve contracted with a vendor for offsite storage of your physical backup media, but then store your backups in a utility closet . . . well, you get the idea.
Level two: We’re experiencing technical difficulties. Stand by.
While level one focused on how well the plan is being executed, level two looks at the technical functionality of each element. Both physical and cloud backups need to be restored regularly, at least in a test environment to ensure restored systems are functional.
Replication of primary sites to a failover site in the cloud can shorten recovery time, but these systems also need to be tested to ensure they work as planned. Mission-critical systems, like ERP with their numerous add-ons, can be more temperamental than others. It’s not enough to make sure the applications and data are “all there.” Functional tests need to be run to verify the recovered systems still perform the way they are supposed to.
Level three: All hands on deck
The final level is an all-hands-on-deck “fire drill” that simulates a real disaster. While level three disaster recovery testing does not need to be done as often as level one and level two, there are lessons to be learned from seeing how your plan works when tested by employees that don’t sit in IT.
Expert help for Disaster Recovery planning
Regular disaster recovery testing can take time and resources. That’s why so many businesses never get around to it. But with 2020 being what it’s been, are you ready to take your chances with 2021? A qualified managed service provider like TierPoint can help you develop a disaster recovery plan that meets your recovery goals. We can also perform regular disaster recovery testing to ensure your plan works as designed. Learn more by contacting us today.