The Mainframe's Role in Disaster Recovery Planning

Friday Sep 6th 2002 by Christian Traue

When it comes to disaster recovery, no piece of the IT infrastructure gets more attention than the 'big iron'. This being true, it makes sense to make your mainframe the center of your DR strategy.

We have all heard the statistics: when disaster strikes, 30 % of the companies affected went straight out of business and another 29 % closed shop within the next two years. With today's geographical distribution of companies, remote locations and, accordingly, remotely located business critical data, disastrous events and their impact are not just limited to corporate headquarters. Now, disaster can happen anywhere, and even if the systems are physically away from headquarters data center, the overall business can be affected in the worst possible way.

"But we do backup of all our servers", you may say --- and that's good. However, it may not be good enough.

My theory is that you need to consolidate backup data at the most safe and secure point in the enterprise. The S/390 mainframe, where available in a corporate IT infrastructure, is that safe and secure point. Backup of all open systems data to this machine allows for sophisticated, efficient recovery strategies. It concentrates all business data at the one location which is most likely to go back on-line first, even in worst-case scenarios.

It's like back in school, where this one kid in your class had a big brother who would always be there in times of trouble. It is the same with the mainframe --- it will likely be the last system to go down and the first one to go on-line again.

The corporate data center and mainframe typically have the most elaborate disaster recovery plans which are tried and proven. Integrating open systems data into that platform integrates all open systems with the mainframe data center's DR.

Business continuity planning is more than IT's plan for disaster recovery, and certainly more than a schedule for nightly backups. Business processes rely on applications and data, and open systems applications on departmental servers are essential to the business. Many backup/recovery solutions create discrete islands of data, one per department or, even worse, one per server.

In disaster situations, those islands of data, dozens, if not hundreds or thousands of them, all need to be managed. Then, decisions must be made on what to recover first - which systems to rebuild and which to let go.

If your company runs a S/390 mainframe, consider making it the hub for all enterprise backup --- and the central point for managing all recovery. Here's how it works. Day-to-day, all open systems data from applications, file servers and, potentially, workstations, is backed up to the mainframe. From there, according to set policy, all or a subset is replicated to a disaster recovery site.

In a disaster situation, the mainframe operations get switched over to the secondary site, and the open systems backup data goes along with it. At the secondary site, all open systems data is now available and can be recovered according to policy. The number of people involved is kept at a minimum due to centralized management and automation. This ties in with the volume of data storage administrators can manage on open systems (roughly 750 GB per admin on average, according to sources like Horison Information Strategies) and the order of magnitude difference to S/390 (around seven to ten TB per administrator on average per Horison Information Strategies).

For clarity, let's look at an example. Consider an insurance company with a sprawling campus at their corporate headquarters. In addition to the data center with its S/390 mainframe, the company has departments with their own 'little' server rooms. Each department with a server infrastructure of its own relies on the applications and data on those servers. At the corporate data center, a number of the most critical departmental servers are co-located; however, not all departmental systems can be kept there.

At this insurance company, disaster recovery policies and procedures are in place, both on the data center level and for the departments. Also, backup systems are also in place, again, on the department and data center levels. Depending on the value that specific data has to the business, disk-mirroring is used to protect some data.

In the real world, the 'pretty picture' ends here and things become messy. Even in the data center we often find more than just one open system backup/recovery solution. Typically, additional products can be also found for backup/recovery in the departmental server rooms. In some places, we even find more than one backup/recovery product in the same department, and in many places, the number of different backup/recovery products in use is near the number of departmental server rooms.

Now consider a disastrous event in such an environment. The data center will have a hot site, either internally or with an external service provider. Some of the departments will also have such a service set up for some of their servers.

But if, for example, the sprinkler goes off in the server room of the marketing department (the server room was set up for an interim period, but that was two years ago), systems and data will get lost. And if a nearby river floods more than the expected area (see Europe this summer of 2002), all systems and data are at risk.

If the data center is prepared for such an event, the switch-over to the disaster recovery site happens quickly and mainframe operations resume with very little delay. The situation is different for the open systems servers, though. Those highly critical systems, which were mirrored to the departmental hot site will go on-line fairly quickly, too. The others wont. Also, consider the number of people and the co-ordination effort required to achieve even this level of recovery.

On the mainframe side, a staff of two can handle disaster recovery for all applications and data. In order to speed up the recovery of the different open systems, the ratio of administrators to systems is much higher.

If the open systems data had been backed up to the mainframe, it would have been available at the data center disaster recovery site and servers could be rebuilt there, quickly, managed by a small staff, and according to set policies which adequately reflect the business relevance of specific data and applications.

Such a solution is coordinated and managed by default because all recovery is based on a single system.

The Meta Group describes the difference between data center and open systems backup like this: "....mainframe data centers have backup procedures down to a science, with elaborate policies, procedures, storage sites, [and] proven backup tools..." Whereas "..outside mainframe environments, backup processes remain spotty, with highly critical servers being the only universally guaranteed assets." (Source: Meta Group Market Study Disaster Recovery and Business Continuity Planning: Key to Corporate Survival)

So, when you're faced with consolidating backup and recovery for an enterprise, include the mainframe in the solution. Like that kid's big brother, the big iron will be there when you need it most.

About the Author : Christian Traue is the Director of Product Management for Tantia Technologies

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved