FleetBoston Financial recently took on its disaster recovery time. Our Datamation reporter takes a look at how the company improved its system without breaking the bank.
When FleetBoston Financial evaluated its disaster recovery responsiveness, the company received a shock.
IT administrators there realized that it would take at least two full days to recover their data and systems from a collapse during a serious disaster. By applying EMC Symmetrix in conjunction with SunGard electronic vaulting services, FleetBoston's disaster recovery window shrank from 48 hours to less than an hour.
''We have achieved a recovery window of less than one hour on critical systems and four to eight hours for a complete recovery,'' says Lari Sue Taylor, senior vice president of technology at FirstBoston Financial.
FleetBoston Financial is a financial services company with assets of $196 billion, and over 18 million individual, corporate, and institutional customers. Products and services are available through a variety of channels, including 1,460 stores and more than 3,400 ATMs from Maine to Pennsylvania, its HomeLink online banking, and telephone banking. Fleet focuses mainly on small business and commercial banking in the Northeast U.S. market.
The bank now is in the process of merging with Bank of America.
A few years ago, when its ATM network had expanded significantly and online banking started to take off, management realized that disaster recovery needed a complete rethink. They used two key metrics in evaluating disaster recovery or business continuity planning:
Recovery Time Objective (RTO) is the maximum length of time that a business process can be unavailable. This is measured in terms of time elapsed from the beginning of a disaster until the systems are operating again.
Recovery Point Objective (RPO) is how much work in progress can be lost. If all work must be recovered, then the business must align its disaster recovery actions to achieving zero RPO. Some businesses, however, may elect to have an RPO of one day, for example, on the understanding that if they lost one day's transactions, they could recreate them by interviewing sales staff etc.
FleetBoston chose what at that time was regarded as an aggressive RTO of 24 hours and an RPO of zero.
''We would suffer significant business impact from transaction loss,'' says Taylor. ''so we had no choice but to opt for the zero RPO.''
In support of this, she cites a Gartner Group study which revealed that 93 percent of companies which experience a major data loss go out of business within five years.
Improving Recovery Time
FleetBoston was already utilizing EMC Symmetrix and tape libraries for the purposes of daily and weekly backup. During testing of its disaster recovery responsiveness, however, the company discovered it would take 24 hours alone to restore data from tape to disk. And it was only possible to meet the 24-hour time frame if everything went smoothly.
The company, therefore, searched for an improved approach to disaster recovery.
Administrators selected Electronic Vaulting Services by SunGard, in conjunction with an EMC product called Symmetrix Data Remote Facility (SDRF). SDRF is a combination storage hardware and application. It lets users copy data to a remote, secure location without requiring any IT downtime. In the event that backup data needs to be retrieved, SDRF can recover hundreds of terabytes of information within hours, according to EMC.
Prior to purchase, the FleetBoston auditors voiced concerns about this being too bleeding edge an approach that lacked proven results in the real world. The company, therefore, interviewed early adopters to ascertain any problems that may exist. This brought several issues to the surface. Distance limitation and channel extension were the top concerns.
A multiplexer channel provides the physical connection which allows input and output devices to communicate with the computer. The multiplexer channel typically requires devices or their control units to be within 200 to 400 feet of the mainframe computer. Channel extension technology allows you to extend the multiplexer channel of the computer to anywhere in the world regardless of distance.
''Our primary data center was 120 miles away from our remote recovery center so channel extension was necessary,'' says Taylor.
She evaluated channel extension products from Computerm and InRange before choosing Computerm Adaptive Copy. The use of Computerm and Symmetrix, though, meant that the company would have to use an asynchronous mode of data transfer between one site and another. That means there would be a delay of a few seconds between transactions being processed in the main data center and their being transferred to the remote disaster recovery site.
The combined EMC/SunGard/Computerm architecture adopted by FleetBoston was successfully implemented.
From 48 hours or more, the RTO came down to less than one hour for critical systems. During one major emergency when all systems were down at one data center, the remote site took over seamlessly. According to Taylor, this one event paid for the technology immediately since it prevented large-scale revenue loss.
While FleetBoston administrators are happy with current disaster recovery functionality, it is still evolving. One major issue is whether the company should continue to replicate all mainframe data, or if it can apply Information Lifecycle Management techniques to minimize the amount of data that has to be transmitted during backups and during system recovery. This also would free up bandwidth for more productive uses.
The company also is looking to increase its current rate of mirroring. Currently, it mirrors all of its data every two hours. Taylor is investigating ways to shorten the length of time between mirrors without significantly increasing costs.