Protect Your
Business-Critical Communications With Disaster Recovery
BY RYAN O'NEIL AND MICHAEL CAVE
In the wake of a most unexpected and devastating attack on companies in
the United States, many companies have shifted much of their focus to
disaster recovery. There is one question many companies are asking today:
“How can we protect our critical communications and our communications
processes from disasters of any kind?” Although you may have seen
articles explaining disaster recovery for the general environment, we
would like to offer some ideas and some guidance on how to achieve a
disaster recovery system for your communications.
In this article, we present a disaster recovery “Top
Ten” list specifically for critical communication systems.
Once you have had a chance to review the Top Ten, you’ll have a fair
idea of how to order your priorities in a “disaster-proofing” project.
The next step, of course, is to translate these priorities into action --
or, more accurately, a sequence of actions, a process. Moving actively
through a disaster-proofing project, you will implement the following
stages: Discovery; Trial; Implementation; and Evaluation. In each of the
first three stages, you will manage two phases, the application phase
(where you will evaluate network and application connectivity) and the
communications phase (where you will determine the implications for your
critical business communications). In the last stage, evaluations will
include an immediate post-implementation evaluation phase
and a maintenance phase. Each stage, and its constituent phase, will be
discussed in turn, chronologically, to help you form an impression of the
time and effort required (Table 1).
The steps we will outline will utilize an application-to-communications
analogy of implementation. By application, we mean the physical and
logical layers that implement the business process in an enterprise; by
communications, we mean the associative layers that connect the business
process to external applications, people, and processes. Each requires the
other to complete a total system, and each is equally important to the
operation of the entire enterprise. Our focus will be on the
communications aspect; however, a reliable recovery plan requires analysis
of critical applications that drive an enterprise’s communication
requirements.
DISCOVERY STAGE
Applications Phase
This phase focuses on the critical applications in your business process,
but begins with a short review of the communications schematics obtained
in step three of the Top Ten list. The schematics should highlight all
critical applications and the required infrastructure. Once this review is
accomplished, critical application elements, like hardware sizing and
network bandwidth, will be much easier to estimate.
This is the perfect time to consider a location for your disaster
system. Many companies choose to house their disaster systems at distant
locations within their organization, while others choose to co-locate
these systems at data centers owned and operated by other companies. Each
option brings economical and functional advantages and disadvantages. The
most critical part of the this phase is that by identifying critical
applications, you ensure that critical communications are accounted for
and that connectivity between those applications and the communications
tools, be it standard telephone lines, digital lines, or network
bandwidth, is sufficient.
The goal of this phase is for the initial design to account for as much
as possible. Granted, in many instances, particularly in large, complex
environments, this task can be very difficult. However, going back to fill
in spaces is not only time-consuming, but costly as well.
After completing the network and application design of the new disaster
recovery environment, it is always recommended to engage a third party to
evaluate what has been completed. Many consulting firms and outsourcers
provide these services on a contractual basis and prove to be helpful in
protecting the investment at each stage of development. This step could
also highlight shortfalls in the current design that would have gone
unnoticed.
Communications Phase
Once discovery of critical applications is complete, you would continue
development with discovering critical communications. This phase takes the
design completed in Phase I and adds critical communications tools to the
mix. Critical communications tools could include unified communications,
fax, e-mail, voice, text messaging, alerts, wireless communications,
computer telephony, IVR, and others.
Your critical communications should have been successfully identified
in step 4 of your Top Ten list. These communications elements are the ones
that can bring your business to a halt faster than a set of vented disk
brakes on a racecar if they are not taken seriously. In this phase, you
must include connectors or “links” between applications and the assets
(PBX/CO, third party) that deliver these communications as well. Different
from the connections designed in Phase I, these connectors are created by
communications specifically for communications.
An example of this type of connection is when an application
(application link between critical business applications and communication
server) sends a critical alert notification to a systems administrator
during a “system down” situation. The communications connection makes
it possible for notifications to be sent to the administrator regardless
of location, and they can be sent, for example, as a voice mail to a cell
phone, as a text message to a pager, or as a fax to a fax machine
(utilizing a communications link between communications server and
PBX/CO).
Since these types of communications rely on the applications they
connect with, they are critical in disaster situations and should be
assigned due importance during this stage. During the last efforts in this
phase, we would also recommend having a third-party specialist with
expertise in communications tools and infrastructure review the design of
the communications portion of your disaster recovery system. As in the
application discovery phase, third-party experts can also identify the
shortfalls or missed communications opportunities at this point in
development.
TRIAL STAGE
Application Phase
The application phase of the trial stage involves establishing trial
environments, identifying parameters, and executing realistic trial
situations. Establishing the environment will be the most critical part of
a successful trial. During this process, you must create a candidate data
set and establish trial subjects. You must also isolate a part of the
existing infrastructure to successfully “shut down” the existing
systems.
From the application side, you should create a candidate data set on
the existing system and duplicate it on the disaster system. You must also
establish a set of subjects within the organization during the trail
period, preferably on the disaster team, which can receive the test
communications as they are generated from the data set. During this
process, limitations of the system, such as insufficient bandwidth or
configuration issues, may be identified and corrected. A score sheet can
be generated during each revision to track the progress of the trial
phase.
Communications Phase
The communication phase or the trial stage also involves establishing a
trial environment and focuses on ensuring critical communications are
linked and available during the trial. Since we are discussing
communications with disaster recovery requirements, we must be most
cautious in this phase of trials. Many times, business applications can be
working error-free, while communications are at a standstill. If these
critical communications falter, no application or network can prevent
interruption of the business process.
To execute this phase properly, you must first allocate communications
resources to the trial environment. We recommend establishing a separate
trial system. (You can imagine how running tests on your production
servers may arouse anxiety.) It will emulate the true response of the
system during a disaster better than if you were to isolate a piece of
your current environment. It also protects your existing critical
communications during the testing process.
After your test environment is configured, you should test the
communications tools separately. Have your application send requests as
fax, e-mail, voice mail, text messages, and alerts separately from each
other. To ensure that you evaluate just the fax component of the system,
you must make the test mutually exclusive from all other communications
during the trial. This will allow you to evaluate each tool and ensure
that there is the required infrastructure for each during periods of high
activity. For example, an application may send 2,000 faxes per day, but
the faxes may all be sent at 2:00 am. The system must be able to support
the full fax workload, regardless of the other communications that are
taking place simultaneously. Just as you did in the application phase of
the testing stage, you should track the progress after changes and
modifications in this phase with a score sheet as well.
IMPLEMENTATION STAGE
Application Phase
Now that the phases of the trial stage are complete and all required
modifications and adjustments have been accomplished before going online,
we can begin to discuss the process in bringing the systems together in a
production environment for the application phase of the implementation
stage. Since you proved the reliability of your systems during the trials,
this stage will most likely be quite simple and uneventful.
Keep in mind that the geographic location of your disaster site could
prolong the time you are offline. If you chose a distant data center for
your disaster recovery site, scheduling and organizing resources will
surely add some excitement to the implementation process. The focus of
this phase of implementation is the applications that link to your
critical communications. You should have confirmed that all applications
are configured and connected properly during the application phase of the
trial stage, allowing you to avoid possible slowdowns in this phase of
implementation. This will also allow you to check all application software
links on the communication server in the beginning of the communications
phase of the implementation stage.
Communications Phase
In this phase, you will be bringing the critical communications online by
connecting them to the production applications, IP network, and PBX/CO. In
this phase, there are efficiencies that can be realized by housing all
critical communications on a single server or system. It would be
beneficial to perform this phase during a downtime in the regular
production schedule, so there is less risk of delaying communications from
the applications. (Yes, such consideration means more late-night hours and
pots of coffee, but aren’t we used to that by now?)
We would also suggest modifying the delivery schedule of the
applications to send batches rather than execute real-time delivery, so
that you can define a time window to accommodate implementation, as well
as ensure that high levels of communications traffic may be supported
without issue. Once the application links are ready, you should see that
data is being exchanged over the IP network and that the PBX/CO is
delivering fax, voice, and wireless traffic. You have successfully
implemented a disaster recovery system for your critical communications.
Congratulations!
EVALUATION STAGE
Post-Implementation Phase
Now that you have been through the exercises, it is time to determine the
results of your efforts. About two months after the final implementation
is complete, we recommend having the disaster system development team
complete an internal process survey. This survey should review the steps
that were taken during the discovery, trial, and implementation stages of
developing the disaster recovery system. It should also survey the team
members to determine where the process works best, and where it needs
improvement.
Upon completion of the survey, the information you gain from your
experiences can be shared easily with management for justification
(probably the most important part of any large project), or for others
interested in preparing their systems for a disaster. The other important
part of a post-implementation evaluation is the development of a “wish
list” for the future. As we all know, these systems are never truly “complete,”
and there will always be upgrades, modifications, and additions.
In addition, your team surely will have been able to learn more about
the capabilities of the communications system and its link to
applications. By having your team provide a communications wish list, you
can include these items in already-scheduled maintenance procedures. This
wish list will also act as a guide for evaluating new vendors if there is
ever a need to do so. By completing the survey and the wish list, you will
be able to aid others in the process and prepare expectations for future
actions.
Recurring Evaluation
Thankfully, the discovery, trial, and implementation stages are complete,
as well as the post-implementation stage of evaluation. You have
successfully developed and implemented a disaster recovery system for your
critical communications. Of course, there always will be maintenance
measures put in place to ensure that your disaster system is ready for
action.
We recommend regular maintenance of the system to be performed every
six to nine months, or as major engineering changes (within applications
and communications) are introduced into your environment. Regular
maintenance includes, but is not limited to, testing the fail-over
procedures, adding/removing business applications, and adding/removing
communications tools. Testing the fail-over procedures can be accomplished
on an application-by-application basis with most systems. The process
involves shutting down the link to the primary application. When this is
done, you can verify that the communications are transferred properly to
the disaster site.
With some systems, you can shut down each link separately and test not
only applications, but also each voice and fax line separately. This helps
identify any load issues that might arise during a disaster. As your
system grows, the maintenance process will also allow you to schedule the
addition of other data sources, like a CRM application, e-commerce
application, or enterprise database. New critical communications tools,
like secure e-mail delivery or wireless messaging, can also be added
during this time. Maintenance is key to providing a disaster recovery
system that is not only reliable, but also sufficiently flexible to grow
with the needs of your critical communications.
SUMMARY
This article is only a suggestion of general issues that might be used as
a starting point for a complete project. This should not be used as a
complete roadmap, but merely as a set of ideas for consideration. Each
enterprise has unique requirements. Consultants specializing in the area
of communications disaster recovery and/or internal staff with extensive
knowledge of the communication processes should be involved in creating
the comprehensive disaster recovery project plan specific to your
enterprise’s communication requirements.
Ryan O’Neil is the marketing manager and Michael Cave is the
manager of consulting services for TOPCALL Corporation. TOPCALL (www.topcall.com)
is a market and technology leader in the end-to-end delivery of unified
communications solutions. TOPCALL focuses on the development and
deployment of communications solutions that enhance business processes.
The company offers world-class communication products, global support
services, and on-site consultation.
[
Return
To The January 2002 Table Of Contents ]
|