Feature Article
January 2002 


Protect Your Business-Critical Communications With Disaster Recovery

BY RYAN O'NEIL AND MICHAEL CAVE

In the wake of a most unexpected and devastating attack on companies in the United States, many companies have shifted much of their focus to disaster recovery. There is one question many companies are asking today: “How can we protect our critical communications and our communications processes from disasters of any kind?” Although you may have seen articles explaining disaster recovery for the general environment, we would like to offer some ideas and some guidance on how to achieve a disaster recovery system for your communications.

In this article, we present a disaster recovery “Top Ten” list specifically for critical communication systems.

Once you have had a chance to review the Top Ten, you’ll have a fair idea of how to order your priorities in a “disaster-proofing” project. The next step, of course, is to translate these priorities into action -- or, more accurately, a sequence of actions, a process. Moving actively through a disaster-proofing project, you will implement the following stages: Discovery; Trial; Implementation; and Evaluation. In each of the first three stages, you will manage two phases, the application phase (where you will evaluate network and application connectivity) and the communications phase (where you will determine the implications for your critical business communications). In the last stage, evaluations will include an immediate post-implementation evaluation phase and a maintenance phase. Each stage, and its constituent phase, will be discussed in turn, chronologically, to help you form an impression of the time and effort required (Table 1).

The steps we will outline will utilize an application-to-communications analogy of implementation. By application, we mean the physical and logical layers that implement the business process in an enterprise; by communications, we mean the associative layers that connect the business process to external applications, people, and processes. Each requires the other to complete a total system, and each is equally important to the operation of the entire enterprise. Our focus will be on the communications aspect; however, a reliable recovery plan requires analysis of critical applications that drive an enterprise’s communication requirements.

DISCOVERY STAGE
Applications Phase
This phase focuses on the critical applications in your business process, but begins with a short review of the communications schematics obtained in step three of the Top Ten list. The schematics should highlight all critical applications and the required infrastructure. Once this review is accomplished, critical application elements, like hardware sizing and network bandwidth, will be much easier to estimate.

This is the perfect time to consider a location for your disaster system. Many companies choose to house their disaster systems at distant locations within their organization, while others choose to co-locate these systems at data centers owned and operated by other companies. Each option brings economical and functional advantages and disadvantages. The most critical part of the this phase is that by identifying critical applications, you ensure that critical communications are accounted for and that connectivity between those applications and the communications tools, be it standard telephone lines, digital lines, or network bandwidth, is sufficient.

The goal of this phase is for the initial design to account for as much as possible. Granted, in many instances, particularly in large, complex environments, this task can be very difficult. However, going back to fill in spaces is not only time-consuming, but costly as well.

After completing the network and application design of the new disaster recovery environment, it is always recommended to engage a third party to evaluate what has been completed. Many consulting firms and outsourcers provide these services on a contractual basis and prove to be helpful in protecting the investment at each stage of development. This step could also highlight shortfalls in the current design that would have gone unnoticed.

Communications Phase
Once discovery of critical applications is complete, you would continue development with discovering critical communications. This phase takes the design completed in Phase I and adds critical communications tools to the mix. Critical communications tools could include unified communications, fax, e-mail, voice, text messaging, alerts, wireless communications, computer telephony, IVR, and others.

Your critical communications should have been successfully identified in step 4 of your Top Ten list. These communications elements are the ones that can bring your business to a halt faster than a set of vented disk brakes on a racecar if they are not taken seriously. In this phase, you must include connectors or “links” between applications and the assets (PBX/CO, third party) that deliver these communications as well. Different from the connections designed in Phase I, these connectors are created by communications specifically for communications.

An example of this type of connection is when an application (application link between critical business applications and communication server) sends a critical alert notification to a systems administrator during a “system down” situation. The communications connection makes it possible for notifications to be sent to the administrator regardless of location, and they can be sent, for example, as a voice mail to a cell phone, as a text message to a pager, or as a fax to a fax machine (utilizing a communications link between communications server and PBX/CO).

Since these types of communications rely on the applications they connect with, they are critical in disaster situations and should be assigned due importance during this stage. During the last efforts in this phase, we would also recommend having a third-party specialist with expertise in communications tools and infrastructure review the design of the communications portion of your disaster recovery system. As in the application discovery phase, third-party experts can also identify the shortfalls or missed communications opportunities at this point in development.

TRIAL STAGE
Application Phase

The application phase of the trial stage involves establishing trial environments, identifying parameters, and executing realistic trial situations. Establishing the environment will be the most critical part of a successful trial. During this process, you must create a candidate data set and establish trial subjects. You must also isolate a part of the existing infrastructure to successfully “shut down” the existing systems.

From the application side, you should create a candidate data set on the existing system and duplicate it on the disaster system. You must also establish a set of subjects within the organization during the trail period, preferably on the disaster team, which can receive the test communications as they are generated from the data set. During this process, limitations of the system, such as insufficient bandwidth or configuration issues, may be identified and corrected. A score sheet can be generated during each revision to track the progress of the trial phase.

Communications Phase
The communication phase or the trial stage also involves establishing a trial environment and focuses on ensuring critical communications are linked and available during the trial. Since we are discussing communications with disaster recovery requirements, we must be most cautious in this phase of trials. Many times, business applications can be working error-free, while communications are at a standstill. If these critical communications falter, no application or network can prevent interruption of the business process.

To execute this phase properly, you must first allocate communications resources to the trial environment. We recommend establishing a separate trial system. (You can imagine how running tests on your production servers may arouse anxiety.) It will emulate the true response of the system during a disaster better than if you were to isolate a piece of your current environment. It also protects your existing critical communications during the testing process.

After your test environment is configured, you should test the communications tools separately. Have your application send requests as fax, e-mail, voice mail, text messages, and alerts separately from each other. To ensure that you evaluate just the fax component of the system, you must make the test mutually exclusive from all other communications during the trial. This will allow you to evaluate each tool and ensure that there is the required infrastructure for each during periods of high activity. For example, an application may send 2,000 faxes per day, but the faxes may all be sent at 2:00 am. The system must be able to support the full fax workload, regardless of the other communications that are taking place simultaneously. Just as you did in the application phase of the testing stage, you should track the progress after changes and modifications in this phase with a score sheet as well.

IMPLEMENTATION STAGE
Application Phase

Now that the phases of the trial stage are complete and all required modifications and adjustments have been accomplished before going online, we can begin to discuss the process in bringing the systems together in a production environment for the application phase of the implementation stage. Since you proved the reliability of your systems during the trials, this stage will most likely be quite simple and uneventful.

Keep in mind that the geographic location of your disaster site could prolong the time you are offline. If you chose a distant data center for your disaster recovery site, scheduling and organizing resources will surely add some excitement to the implementation process. The focus of this phase of implementation is the applications that link to your critical communications. You should have confirmed that all applications are configured and connected properly during the application phase of the trial stage, allowing you to avoid possible slowdowns in this phase of implementation. This will also allow you to check all application software links on the communication server in the beginning of the communications phase of the implementation stage.

Communications Phase
In this phase, you will be bringing the critical communications online by connecting them to the production applications, IP network, and PBX/CO. In this phase, there are efficiencies that can be realized by housing all critical communications on a single server or system. It would be beneficial to perform this phase during a downtime in the regular production schedule, so there is less risk of delaying communications from the applications. (Yes, such consideration means more late-night hours and pots of coffee, but aren’t we used to that by now?)

We would also suggest modifying the delivery schedule of the applications to send batches rather than execute real-time delivery, so that you can define a time window to accommodate implementation, as well as ensure that high levels of communications traffic may be supported without issue. Once the application links are ready, you should see that data is being exchanged over the IP network and that the PBX/CO is delivering fax, voice, and wireless traffic. You have successfully implemented a disaster recovery system for your critical communications. Congratulations!

EVALUATION STAGE
Post-Implementation Phase

Now that you have been through the exercises, it is time to determine the results of your efforts. About two months after the final implementation is complete, we recommend having the disaster system development team complete an internal process survey. This survey should review the steps that were taken during the discovery, trial, and implementation stages of developing the disaster recovery system. It should also survey the team members to determine where the process works best, and where it needs improvement.

Upon completion of the survey, the information you gain from your experiences can be shared easily with management for justification (probably the most important part of any large project), or for others interested in preparing their systems for a disaster. The other important part of a post-implementation evaluation is the development of a “wish list” for the future. As we all know, these systems are never truly “complete,” and there will always be upgrades, modifications, and additions.

In addition, your team surely will have been able to learn more about the capabilities of the communications system and its link to applications. By having your team provide a communications wish list, you can include these items in already-scheduled maintenance procedures. This wish list will also act as a guide for evaluating new vendors if there is ever a need to do so. By completing the survey and the wish list, you will be able to aid others in the process and prepare expectations for future actions.

Recurring Evaluation
Thankfully, the discovery, trial, and implementation stages are complete, as well as the post-implementation stage of evaluation. You have successfully developed and implemented a disaster recovery system for your critical communications. Of course, there always will be maintenance measures put in place to ensure that your disaster system is ready for action.

We recommend regular maintenance of the system to be performed every six to nine months, or as major engineering changes (within applications and communications) are introduced into your environment. Regular maintenance includes, but is not limited to, testing the fail-over procedures, adding/removing business applications, and adding/removing communications tools. Testing the fail-over procedures can be accomplished on an application-by-application basis with most systems. The process involves shutting down the link to the primary application. When this is done, you can verify that the communications are transferred properly to the disaster site.

With some systems, you can shut down each link separately and test not only applications, but also each voice and fax line separately. This helps identify any load issues that might arise during a disaster. As your system grows, the maintenance process will also allow you to schedule the addition of other data sources, like a CRM application, e-commerce application, or enterprise database. New critical communications tools, like secure e-mail delivery or wireless messaging, can also be added during this time. Maintenance is key to providing a disaster recovery system that is not only reliable, but also sufficiently flexible to grow with the needs of your critical communications.

SUMMARY
This article is only a suggestion of general issues that might be used as a starting point for a complete project. This should not be used as a complete roadmap, but merely as a set of ideas for consideration. Each enterprise has unique requirements. Consultants specializing in the area of communications disaster recovery and/or internal staff with extensive knowledge of the communication processes should be involved in creating the comprehensive disaster recovery project plan specific to your enterprise’s communication requirements.

Ryan O’Neil is the marketing manager and Michael Cave is the manager of consulting services for TOPCALL Corporation. TOPCALL (www.topcall.com) is a market and technology leader in the end-to-end delivery of unified communications solutions. TOPCALL focuses on the development and deployment of communications solutions that enhance business processes. The company offers world-class communication products, global support services, and on-site consultation.

[ Return To The January 2002 Table Of Contents ]


Top Ten List

  1. Review and create a list of current business-critical processes requiring communications and associated applications.
  2. Review and create a list of groups involved in mission-critical business communications (customers, suppliers, partners, etc.).
  3. Draw a schematic of the communications process, identifying applications, communication facilities, links between applications and communications, and links between communications and target groups. Evaluate each application and process and order by level of significance to daily operations (1 low – 10 high).
  4. Identify communications tools used by the critical applications or in the business process identified earlier (fax, e-mail, voice mail, IVR, EDI, Telex, text messaging/SMS, postal scanning solutions, etc.). Rank each communication tool by level of importance for continuing critical business processes (1 low – 10 high).
  5. Evaluate all critical communication solutions for fault-tolerance and disaster recovery capabilities.
  6. Ask “What do I want to protect?” and “What processes must the business keep running?” Stay focused on the truly critical communications to streamline this effort.
  7. Identify the requirements for ensuring critical communications processes are fault-tolerant or able to recover during and after a disaster.
  8. Establish a budget for disaster recovery development and then implement the plan.
  9. Determine its location -- your alternative site can be next door, the next town, or across the other side of the world if reliable network capacity is available.
  10. Test systems in simulations regularly, and keep documentation on the design of your systems up to date.

[return to article]


Table 1. The goal in each stage and phase of a disaster-proofing project. [return to article]

Stage Phase Goal
Discovery Application Establish baseline for requirements of critical applications
Communications Establish baseline for communications requirements
Trial Application Assessment of disaster environment for applications
Communications Assessment of disaster environment for communications
Implementation Application Establishment of disaster-ready environment for applications
Communications Establishment of disaster-ready environment for communications
Evaluation Post-implementation Re-assessment of disaster recovery infrastructure, implementation, and process
Recurring Semi-annual metering of disaster recovery plan