Actions

Disaster Recovery Planning

Disaster Recovery Planning is the process of creating a document that details the steps your business will take to recover from a catastrophic event. [1]

Arguably the most critical parts of disaster recovery planning are the disaster recovery strategy and the detailed recovery plan. It is here that you will codify the concrete steps to take to get up and running again following an unplanned outage. The disaster recovery strategy and detailed recovery plan are based on a risk assessment and business impact analysis of your organization and your understanding of the systems most critical to the business and what you need to do to get them working again in an acceptable time frame. The crux of disaster recovery planning is a detailed recovery plan based on a disaster recovery strategy tailored to your organization’s unique risk profile.[2]

Formulating a detailed recovery plan is the main aim of the entire IT disaster recovery (DR) planning project. It is in these plans that you will set out the detailed steps needed to recover your IT systems to a state in which they can support the business after a disaster.[3]


Basics of Disaster Recovery Planning (Figure 1.)[4]

As a subset of business continuity planning, disaster recovery planning begins with a business impact analysis. The idea behind this analysis is to work out two key metrics:

  • A recovery time objective (RTO), which is the maximum acceptable length of time that your application can be offline. This value is usually defined as part of a larger service level agreement (SLA).
  • A recovery point objective (RPO), which is the maximum acceptable length of time during which data might be lost from your application due to a major incident. This metric will vary based on the ways that the data is used; for example, frequently modified user data could have an RPO of just a few minutes, whereas less critical, infrequently modified data could have an RPO of several hours. Note that this metric describes the length of time only; it does not address the amount or quality of the data lost.

Taken together, these metrics have a roughly asymptotic impact on your bottom line. Typically, the smaller your RTO and RPO values are, the more your application will cost to run:


Disaster Recovery Planning
Figure 1. Ratio of cost to RTO/RPO source: Google


Because smaller RTO and RPO values often come with an increase in complexity, the associated administrative overhead follows a similar curve. A high-availability application might find you managing distribution between two physically separated data centers, managing replication, and more.


Objectives of Disaster Recovery Planning (DR Planning)[5]

The primary objective of disaster recovery planning is to protect the organization in the event that all or part of its operations and/or computer services are rendered unusable. Preparedness is the key. The planning process should minimize the disruption of operations and ensure some level of organizational stability and an orderly recovery after a disaster. Other objectives of disaster recovery planning include:

  • Providing a sense of security
  • Minimizing risk of delays
  • Guaranteeing the reliability of standby systems
  • Providing a standard for testing the plan.
  • Minimizing decision-making during a disaster


Stages of Disaster Recovery Planning (Figure 2.)[6]

The four stages of Disaster Recovery Planning are illustrated diagrammatically in Figure 2. below:


Stages of Disaster Recovery Planning
Figure 2. source: secure24


Disaster Recovery Planning Methodology (Figure 3,)[7]

DR planning methodologies are often branded to specific consulting practices and represented as complex and convoluted processes known only to a few privileged practitioners. But in fact, the DR planning methodology is a straightforward application of common sense that follows a pragmatic project plan similar to the systems development lifecycle methodology.


Disaster Recovery Planning Methodology
Figure 3. source: Toigo Partners International LLC

As shown in this Figure 3, the initial DR planning project involves 10 tasks, which may be further refined into three subsets or phases.

  • Data Collection and Risk Assessment: the tasks in this phase include project initiation, data collection, and the completion of a preliminary risk analysis.
  • Design: in which the capabilities are created actually to recover from a disaster
  • Implementation: in which the strategies selected for recovery are tested, and tests provide feedback to the planning process


Key Roles & Responsibilities For The Disaster Recovery Planning Team[8]

Your disaster recovery planning team should consist of the following:

  • Management Steering Committee: Executive team members who oversee the process are involved at a high level, which means they may not technically need a seat at the table—but they should be standing in the room. They play an important role in approvals for things like budgetary issues, policy considerations, strategic direction, and overcoming roadblocks or intradepartmental issues. These individuals might be part of an existing business continuity oversight committee or form a separate disaster recovery steering committee, depending on the organization.
  • Disaster Recovery Coordinator" The disaster recovery coordinator is an individual from IT who manages the overall recovery in the event of an actual disruption. They are typically also a member of the emergency management team. The disaster recovery coordinator is responsible for setting recovery plans into motion among the team and coordinating those efforts as they progress. They also help facilitate the resolution of problems encountered along the way and remove roadblocks that slow the process down.
  • Business Continuity: Business continuity and disaster recovery go hand-in-hand. The business continuity “expert,” so to speak, fulfills two important roles on the team:
    • To ensure that IT recovery plans align with business needs. Business needs are determined by a Business Impact Analysis (BIA), which is completed before disaster recovery planning is set in motion. If you haven’t done a formal BIA, an informal one will do in the short term, so you can move forward with the DR process. But it’s important to realize that a BIA in some form is critical to DR; without it, you have no clear goals, and your efforts will undoubtedly fall short of meeting real recovery needs. The business continuity representative bridges the gap between business and IT to ensure that critical business needs will be met through IT recovery plans and that any gaps in alignment are addressed.
    • To ensure that the necessary components of business continuity are present in the disaster recovery plans. While IT brings technology expertise, IT participants may not be well-versed in basic business continuity essentials involving emergency or crisis management—how to report information during an event, contact lists for key personnel, vendor information, etc. These components pave the way for a smooth and effective recovery process.
  • IT Infrastructure: Because their areas of expertise apply to the building blocks of an organization, these team members do the lion’s share of the actual recovery work. Each of the infrastructure representatives is responsible for identifying strategies and solutions that will recover critical operations in their areas of expertise, implementing them, and testing them to ensure they work. The strategies they design must meet the requirements for critical business units as outlined in the BIA. You’ll want three individuals from IT infrastructure on the team - one from each of the following areas: Servers/Storage/Databases
    • Servers - Almost all technology runs on some type of server. This person should be intimately familiar with the server and operating system infrastructure and the backup or replication technologies needed to meet the recovery needs. With the increased use of virtual machines, the implications of the differences between the use of physical and virtual environments must be addressed and understood.
    • Storage - Data protection or replication is a critical recovery component. It is now often the major component of the recovery strategy and capability. In most organizations, the storage used in the processing environment is not completely local to the servers (whether physical servers or servers running the virtual environment).
    • Database administration - Databases house the data that applications depend on. This is an architecture unto itself; databases may be shared across applications or run on individual or shared servers. Depending on the organizational structure, the database admin may be part of the infrastructure or application team. No matter the organizational setup, database administration, and the impact that the data protection strategy may have on the data, database recovery requires participation from this area.
  • Networks/Telecom:
    • Network - Nothing works without firewalls, servers, storage, etc. This person should be intimately familiar with the network infrastructure of your organization and be able to take charge of recovery strategies related to it.
    • Telecom - Disruptions often affect voice communication infrastructure, making it difficult for employees to communicate inside the organization, as well as with external business partners and customers.
  • IT Applications: Depending on IT infrastructure recovery plans and the extent of the actual disruption, the individual(s) responsible for applications may play a greater or lesser role in recovery. But they need to understand, based on how the infrastructure team proposes to restore the environment, what additional application tasks may need to occur—i.e., changes to app configuration and settings, data consistency, or application integrations. They should work closely with the infrastructure representative to identify recovery steps and design an appropriate plan that meets the needs of critical business units.
  • Advisors From Critical Business Units: An optional but useful addition to the team - Though it’s not a necessity, you may want to invite representatives from critical business units (those who participated in the BIA) to advise on disaster recovery planning efforts as needed. Rather than presenting the team’s plan as a done deal to business units, it’s helpful to discuss the plan earlier in the process to gather input. How will business processes be impacted by your proposed recovery plans? Is your plan feasible, or will it require the business units to create additional workarounds? Sometimes DR teams may propose alternate recovery methods that impact the requirements stated by the business unit; input from the business units is a must. For example—“We can recover this in four hours, but if you can wait six hours we can save $500,000. Will that work?” This is a good strategy for integrating IT and business and boosts the likelihood of the plan’s overall success.


Future of Disaster Recovery Planning[9]

In determining the components of a disaster recovery plan, businesses typically need to make tough compromises, sacrificing the level of recovery (maximum amount of downtime and data loss) with cost. A relatively new form of technology - server virtualization - is beginning to gain popularity as a viable and cost-effective means of achieving highly available, redundant systems. Server virtualization allows companies to consolidate multiple server functions on one host server, thus lowering the total cost of operation and effectively managing emerging hardware advancements. At first glance, server virtualization may appear risky and counter-productive when trying to achieve a highly available, redundant IT infrastructure. After all, server virtualization increases the risk of multiple server failures by housing numerous server services on a single host server. But, with the combination of hardware advancements and software ingenuity, companies can capitalize on server virtualization as a practical and effective means to achieve disaster recovery. In the case of a natural disaster or power outage that impacts a company's primary facility, a host server in a separate location connected to a SAN targeted for virtual server replication can be enabled quickly and with little effort. By capitalizing on increased virtual server performance as a result of software advancements and lower hardware costs with higher capacity, a robust and full-featured disaster recovery plan will be more readily attainable by more organizations.


See Also

Disaster Recovery Plan (DRP)
Business Continuity
Business Continuity Plan (BCP)
Recovery Point Objective (RPO)
Recovery Time Objective (RTO)


References


Further Reading