Data centralization has brought the advantages of lean operation and management to banks but meanwhile the risks of information system also have come along and aggregated, which means there is increasing pressure from the operation development of the banks on running information system non-stopped Consequently, it is essential to reinforce the establishment of backup system for disaster recovery and to keep continuous business operations, which are expected to hugely affect the competitiveness in the banking markets.
Development of the backup system for disaster recovery is a complicated and systematic project, which should look to the domestic and international best practices, follow the developed standards and specifications, and take into account the characteristics of the organization and the particulars of the information systems. , The developed countries have accumulated lots of experience with regard to the backup and recovery of information systems in disasters and established standard systems which have been long tested, such as the Professional Practices for Business Continuity Planers issued by the DRI International (DRII) and the COBIT Management Guidelines by the Information System Audit and Control Association (ISACA). China has also made relevant standards and specifications, including the Information Security Technology - Disaster Recovery Specifications for Information Systems (GB/T 20988- 2007) and the Information Security Technology - Risk Assessment Specification for Information Security (GB/T 20984-2007).
In order to regulate and guide the disaster recovery of information systems in the banking industry, effectively prevent information system risks, and protect the rights and interests of customers, the PBC developed and published the Management Specification of Information System Disaster Recovery for Banks (JR/T 0044-2008) (hereinafter the “JR/T 0044”) based on advanced international and national standards. It is a financial industrial standard catering to characteristics of the information system, business process, organizational structure and regulatory requirements of the banking industry. The JR/T 0044 has become an important guidance for the development of the backup system for disaster recovery in China’s banking industry.
II. Overview of the Standard
The JR/T 0044 presents systematic demonstration and regulation of disaster recovery consisted of7 major parts, i.e. Organization, Disaster Recovery Requirements, Disaster Recovery Strategy, Backup System for Disaster Recovery, Operation and Maintenance of Disaster Recovery Center, Disaster Recovery Plan, Emergency Response and Disaster Recovery.
(I) Structure and responsibility of organization. According to the JR/T 0044, there should be three departments within the organization, decision making, management and execution, which should be responsible for decision making of major issues, management and coordination, and execution, respectively.
(II) Disaster recovery requirements analysis. The JR/T 0044 describes in detail the methods and steps for analyzing the requirements of disaster recovery.
(III) Formulation of disaster recovery strategies. The JR/T 0044 specifies methods for developing disaster recovery strategies and major contents, i.e. determining the disaster recovery ability level of key functions, the disaster recovery distribution model, and how to obtain and guarantee back-up resources for disasters , based on the principle of balancing costs and risks.
(IV) Implementation of the backup system for disaster recovery. The JR/T 0044 makes clear the basic requirements for constructing the disaster recovery infrastructure and developing the backup system for disaster recovery.
(V) Operation and maintenance of disaster recovery center. The JR/T 0044 specifies the operation and management rules for disaster recovery centers and main contents of operation and management.
(VI) Disaster recovery plan. The JR/T 0044 introduces in detail the main contents, preparation principles and process of disaster recovery plan.
(VII) Emergency response and disaster recovery. The JR/T 0044 describes the emergency measures to take and the important steps to pay attentions in case emergencies or disaster events happen to the information system.
III. Application of the JR/T 0044 at ABC
ABC has attached great importance to the development of its backup system for disaster recovery and taken the system as a major project for its IT applications. During the system development, ABC has, following the JR/T 0044 and considering its actual situation, planned and designed a comprehensive framework comprising several disaster recovery projects. By arranging tasks and orders appropriately and promoting the system development steadily, ABC has achieved desirable results, effectively improved its risk control capabilities, and raised confidence of shareholders, customers and business partners.
(I) Establishing the Backup and Rescue Test Center and defining the work of disaster recovery management
The backup system for disaster recovery is vast and complicated, whose development requires collaboration of multiple departments and takes a long time to accomplish. Therefore, it is desirable to establish a dedicated department responsible for leading the work. In order to promote the system development, ABC set up the Disaster Recovery and Test Center in 2008, under which the Disaster Recovery Management Department is formed to plan and manage the project. According to the JR/T 0044, a disaster is “an emergency that occurs due to man-made or natural reasons, causes material breakdown or failure of the information system or material data loss of the information system, and leads to operation interruption or unacceptable underperformed service for a certain period of time”. From the view of emergency response and disaster recovery (there are introductions in Appendix A, the JR/T 0044), disaster recovery management is the management of backup and disaster recovery of the information system in case of extreme disasters. Backup and emergency pre-warning management are conducted based on the routine operational management system. In case of the information system breakdown due to an emergency, the decision making level need to determine whether the emergency should be identified as a disaster by assessing whether the time required for system recovery exceeds the Recovery Time Objective (RTO: the duration of time within which the information system must be restored after a disaster) and whether to announce it as disaster. Afterwards, the disaster recovery process or emergency response process will be activated. Based on analysis and studies, ABC therefore defines the conceptual relationships between disaster recovery management, emergency management, routine operational management, IT continuity management and business continuity management, as illustrated in the figure above. It is noteworthy that the disaster recovery management applies to physical disaster only. Logical disaster is under the emergency management.
(II) Planning the system development and setting out a roadmap
Based on ABC’s information systems framework and arrangement, ABC has planned the development of the backup system for disaster recovery. The work will be conducted “at three levels and through five projects”. Three levels refer to the Head Office, tier-1 bank branches and tier-2 bank branches in the information framework. Five projects include the Phases 1 and Phase 2 of the remote disaster recovery project sponsored by the Head Office Data Center, the local disaster recovery project sponsored byte Head Office Data Center, the disaster recovery project carried by tier-1 bank branches, and the disaster recovery project taken by tier-2 bank branches. The Phases 1 and Phase 2 of the remote disaster recovery project supported by the Head Office Data Center are based on the existing resources in Beijing and will also get resources from the Beijing Data Center which is now under construction. ABC plans to complete the basic backup system for disaster recovery through the five projects, and keep on maintaining and improving the backup system in response to the feedbacks from any drills and tests.
(III) Designing a framework and developing the backup system for disaster recovery
According to the JR/T 0044, disaster recovery plan is “the action plan prepared in advance for the organization, process and resources required for the recovery of an information system under a disaster, which is used to guide relevant personnel how to restore key business functions supported by the information system in light of the preset recovery objective”. ABC has, based on the definition, designed a framework for the system development, which considers the disaster recovery strategies, organization, disaster recovery plan and management rules.
1. Analyzing business impact and developing disaster recovery strategies
As required by the JR/T 004 to “determine the recovery priority of the information system by analyzing the business functions and impacts of the business interruption and considering the dependence between all information systems”, ABC has sorted out all of its information systems. Firstly, ABC analyzes the locations of all the information systems, the relationships between them, the businesses supported by them, and the business interruption impacts on them. Secondly, ABC ranked recovery demands of all information systems, determined recovering priorities. Thirdly, ABC made clear the scope of information systems and the RTO for disaster recovery based on the current available resources.
ABC adopts the "two locations and three centers” mode at the Head Office level. The remote DRC shares the infrastructure of computer rooms, network, system and supporting operation facilities with the software development and testing sector, complying with the principle of balanced costs and risks as advocated by the JR/T 0044. At tier-1 and tier-2 bank branches, ABC adopts the mode of “one backup for multiple hosts”.
2. Establishing the disaster recovery command system and specifying responsibilities
In accordance with the requirements in the JR/T 0044 on the disaster recovery organization , ABC has established a disaster recovery command system comprising of three levels, decision making, management and execution, in the context of ABC organization structure. The decision makers are the senior managers, who take the responsibilities to give permission to activate the disaster recovery plan and make decisions on major issues. . The management level are formed of the heads from the departments of business, technology and logistics, who are expected to work under the leadership of the decision makers on resource coordination and work arrangement. The executers are the staff from the departments of business, technology and logistics as well as people from relevant external institutions, who are supposed to work under the leadership of the management level to actually engage in recovering disaster and maintaining operation afterwards. Information is expected to transmit timely between the three levels and between staff at each level with the help of disaster recovery work mechanisms, such as command/reporting, coordination, liaison and logistics mechanisms, so that the process of recovering disaster will run smoothly.
3. Developing disaster recovery plan and organizing exercises
ABC first designed a framework for the disaster recovery pre-arranged plan, which includes the main part of what is described in the JR/T 0044, such as “organization, process and resources". Next, ABC designed the document template for the pre-arranged plan and complied the Disaster Recovery Plan of Information Systems for Data Center. The pre-arranged plan comprises three parts. The first part mainly introduces the establishment of the backup system for disaster recovery, including the strategy formulation, establishment of organization, and resource allocation. The second part demonstrates the disaster recovery workflow, which shows how the DRC respond to and recover a disaster in order to reconnect and operate the information system. The third part is appendices, which include confidential information (such as technical configuration of the system and network as well as operation manuals), frequent changes (such as summary of tasks and list of contacts) and working document templates (such as reports and work order formats). The biggest advantage of this pre-arranged plan structure is that the first two parts, which are the main body of this plan, are relatively stable and requires little modification. Therefore it’s easy to maintain and manage. Another implication is the lower-level branches can basically apply this framework to their own plans directly and only need to work out the appendices according to their own specific situations..
While building up the backup system for disaster recovery, ABC drafted corresponding disaster recovery plan and then organized disaster recovery drills. The JR/T 0044 states that “The drills aim to test the completeness, usability, clarity, effectiveness and compatibility of disaster recovery plans so that the gaps will be covered to improve the institution’s execution capability”. Therefore, a complete disaster recovery plan so far doesn’t mean this plan is perfect. It’s necessary to conduct drills in different ways and to different extents in order to test whether this plan can work well and effectively ABC have made detailed drill plans according to what is stated in the JR/T 0044 regarding the objectives, forms, levels, implementation and assessment of drills, and revision of the plan. The drill plans made by ABC elaborates the drill plan templates, which include (1) preparations like drill organization and management as well as technical equipments, (2) objectives and orientations of virtual drills, mock drills and real-case drills, (3) implementation plans of drill workflow script, expected results, drill risk control, drill assessment standards, and (4) drill implementation process like how to control, record, collect problems and make conclusions.
4. Formulating disaster recovery management rules and maintaining the backup system for disaster recovery
The JR/T 0044 dedicates a chapter to explain “operation, maintenance and management of the disaster recovery center”, and a section to states “management of the disaster recovery plan”. Furthermore the JR/T 0044 gives priority to the disaster recovery management rules which are expected to guarantee the sustainable development and effective operation of the disaster-recovery backup system . ABC plans to make the Implementation Rules for Disaster Recovery Management at the Data Center for the head office and the Implementation Rules for Disaster Recovery Management at Branches for its local branches, which are consistent with the higher-level Emergency Response Plan for Material Emergencies and Implementations Rules for Emergency Management for Information Systems. The disaster recovery management rules mainly include plan management (like problem management, change management, version management and release management), test management, drill management, disaster recovery project management, training management, supervision management, operation and maintenance management of backup system for disaster recovery, disaster response and action management.
IV. Achievements and Social Benefits of the Development of ABC Disaster-Recovery Backup System
As guided by the JR/T 0044, ABC has made remarkable achievements in the development of its backup system for disaster recovery. In “the remote disaster recovery project Phase 1”, (1) the disaster recovery strategy and disaster recovery plan have been completed; (2) The core business data can be transmitted between Shanghai and Beijing, which is called real-time asynchronous remote backup; and (3) the remote disaster-recovery backup system and network have been established, which matches what is required by level-5 disaster recovery. In case of a disaster occurred at the Shanghai Data Center, ABC can fast restore transaction channels of counters, ATMs and self-service terminals as well as the 95599 hotline, which means there would be little impact from the disaster on ABC’s fundamental businesses including passbook business, bank card business, asset management business and allied bank business. In October 2010, ABC for the first time completed a remote disaster recovery drill on a few important information systems at the Head Office Data Center, which is comprehensive, business-oriented, and integrates multiple technologies. This indicates ABC is becoming to be able to quickly recover its core business in case of a disaster. In addition, the ABC Beijing Data Center is under construction. It is designed that the 20,000 square meter computer cluster where the Head Office’s remote backup system for disaster recovery and the management information system will be based. Shanghai Local Disaster Recovery Center is currently under design.
The “disaster recovery projects at tier-2 branches” have been experimented in the pilot ABC branches where have developed the disaster recovery strategies, disaster recovery plans and templates of the response plans. The Head Office, Zhejiang Provincial Branch and Shaoxing Municipal Branch conducted a joint disaster recovery drill on information systems at tier-2 branches and subordinate outlets in August 2010. In this drill, all cash and non-cash counters and ATMs were switched to the disaster recovery line and servers to conduct businesses for one day, signifying a breakthrough in ABC’s development of backup system for disaster recovery.
ABC has established and is improving its backup system for disaster recovery, which constitutes an important part of comprehensive risk management. The system not only protects the business data and business continuity, but also builds up confidence of shareholders, customers and partners. What ABC has been doing significantly increases ABC’s market competitiveness and provides a solid foundation to realize its “3510” plan and build itself into “an outstanding and large listed bank”.
ABC has gains extensive experience in the development of its backup system for disaster recovery thorough study, research and application of the JR/T 0044.
(I) Focusing on orientation of the disaster recovery management
The senior management should (1) pay attention and support the development of the backup system for disaster recovery, (2) define the orientation of the disaster recovery management, (3) differentiate the disaster recovery management from, the emergency management, IT continuity management and the business continuity management, and (4) connect the disaster recovery management with routine operational management. Subsequently, an organization structure for disaster recovery could be established and improved to merge with the emergency management organizational system. In addition, the development of the backup system for disaster recovery involves the whole bank and requires active participation of many departments. Therefore, in order to manage disaster-recovery backup system smoothly, the work orientation should be clarified and the investment in human resources and other resources should also be secured, so as to set up a core working group comprising backbones of each department which can direct and coordinate relevant departments to participate in the initial system development.
(II) Focusing on the business impact analysis and business recovery plan
The objective of business impact analysis is to reveal the needs of business recovery, and the analysis results are the basis for making disaster recovery strategy which has decisive impact on the preparation of the disaster recovery plan and drill. However, since business departments may have different standards, perspectives and measurements for business impact analysis, it is necessary to choose an appropriate department to lead the analysis. The business recovery plan is also very important. In a disaster, technology and business are two lines inseparable during disaster recovery. Merely technological recovery is inadequate, and a recovery plan from the perspective of business management must be developed to ensure the restoration of business operation.
(III) Focusing on continuous drills on disaster recovery
The disaster recovery plan should be effective, complete, useable, clear and compatible. The best way to test these features is to conduct repetitive disaster recovery drills. Drills can help to test the disaster recovery capacity of a backup system, improve the disaster recovery process, correct defects in the system, and form a disaster recovery team with high professional morale and expertise. ABC divided drills into three types, i.e. the virtual drill, mock drill and real drill in accordance with the JR/T 0044. Following the principle of “from simple to complex, from virtual to real, from one aspect to the whole system”, ABC gradually escalated the drills level and carried out continuous drills.
(IV) Focusing on whole-process management of the disaster-recovery backup system
In order to ensure the sustainable development of the backup system for disaster recovery, maintain the long-term effectiveness and feasibility of the disaster recovery plan, and continuously improve the skills of the disaster recovery staff in switching and operation, an effective whole-process management and maintenance mechanism should be set up to cover the system development and ongoing operation. Such a mechanism should be incorporated into the initial development and routine management of the backup system for disaster recovery, and implemented in accordance with the disaster recovery management rules.