FDRDRP


FDRDRP

FDRDRP (Disaster Recovery Program) is a utility for optimizing full-volume recovery from ABR volume backups (full-volume and incremental backups). It is intended for use at disaster recovery sites. This can be used for volumes where full-volume recovery is more appropriate, such as system volumes and high-priority production volumes that must be recovered quickly.

FDRDRP recovers the complete image of each DASD volume like a jigsaw puzzle. Different pieces of the backup of each volume (the ABR incremental and full-volume backups) may be read at various times, with restore activity for other volumes interspersed, but FDRDRP assembles the backups in their proper places on DASD to reconstruct the image of the original DASD volume. Even though the backup tapes contain backups for many DASD volumes, the tapes are mounted a minimum number of times.

ABR volume recovery

ABR full-volume recovery starts by reading the most recent ABR cycle (incremental backup) for a given DASD volume, then reads the next most recent, and so on, until it reads the most recent full-volume backup (cycle 00) of that DASD volume and completes the recovery. Since these backups are probably on separate tapes, the restore usually mounts and unloads a backup tape for each cycle to be read. For more information on volume backups, generations, cycles, and full-volume recovery, see Overview-of-FDRABR-Volume-Backups 

Although an ABR full-volume recovery step may request the restore of many DASD volumes, a normal ABR full-volume recovery restores one DASD volume at a time. Since a given backup tape may contain the backups of many DASD volumes, the same tapes may be unloaded and remounted repeatedly, taking considerable time and overwhelming operators and an Automated Tape Library (ATL).

Even if ABR recoveries are run in separate jobs, allowing the restores to be run in parallel, the same tape mounts occur and the restore jobs may contend for the same backup tape volumes, so the situation is no better. Multiple restore jobs are useful only when each uses a different set of input tapes.

This is particularly a problem on modern high-capacity tapes such as the IBM TS1130, since they can hold the backups of many DASD volumes. High-capacity tapes use fewer tape volumes to hold your backups, but those volumes must be mounted repeatedly.

However, even on lower-capacity tapes, such as 3490E, tapes may need to be remounted many times during the restore of the requested DASD volumes.

The FDRDRP solution

FDRDRP processes multiple full-volume recovery tasks in parallel. It manages usage of the backup tapes required for those restores, so that each backup tape is mounted a minimum number of times, usually one mount per tape volume. This greatly reduces the elapsed time required to recover the volumes and eliminate most extra tape mounts required by ABR.

The nature of tapes is that only one task can use a given tape volume at a time. FDRDRP manages the use of tapes by this process (slightly simplified):

  • A recovery subtask is started for each volume specified by a SELECT statement in the FDRDRP input. However, the DASD volumes are sorted by the tape volume serial and file sequence number required for the first backup each reads, so that the subtasks read the backup files on a tape in physical order with minimal positioning.
  • A restore subtask dynamically allocates and mounts the first backup tape it requires,
  • If additional restore subtasks require other backup files on the same tape, they wait on the owning subtask to release it. As a subtask finishes with a tape volume, it passes the tape to a waiting subtask without dismounting the tape. If no other subtask is waiting for that tape, the tape is no longer needed and is de-allocated and dismounted (unloaded). If more than 512 DASD volumes are requested, FDRDRP restores them in groups of 512 volumes at a time.
  • If the subtask that just passed a tape volume needs another tape volume, it allocates and mounts it (or waits if another subtask happens to be using that volume). If the restore is complete (after reading cycle 00, the full-volume backup), the subtask terminates.
  • When a restore subtask requires a backup tape that is not currently in use by another subtask, it allocates and mounts the tape. However, the MAXTAPES= operand on the RESTORE statement is used to limit the total number of tape drives, allowing you to specify the number of drives you are devoting to the FDRDRP restore step. If a task requires a backup tape not currently in use but MAXTAPES drives are already in use by this FDRDRP step, those tasks wait until the count of active tape drives decreases.
  • Once all restore subtasks have completed, the FDRDRP step terminates.
  • If you are running multiple FDRDRP jobs, each restoring a different set of DASD volumes, but two such jobs happen to require the same tape volume at the same time, FDRDRP passes the volume from one job to the other without dismounting it.
Warning

Before you do a full-volume restore, make sure that the target volume is offline to all systems other than the system where the restore is to be run. If you do not, the other systems may access the original VTOC of the restored volume and access or delete the wrong data.

FDRDRP operation example

Here is a simple example to show how FDRDRP operates. Three DASD volumes have been backed up, once by full-volume and twice by incremental. The off site COPY2 backup tapes look like:

Tape Volume:

333333

222222

111111

File 1:

FDRABR.VPROD01.
C2012202

FDRABR.VPROD01.
C2012201

FDRABR.VPROD01.
C2012200

File 2:

FDRABR.VPROD02.
C2001702

FDRABR.VPROD02.
C2001701

FDRABR.VPROD02.
C2001700

File 3:

FDRABR.VPROD03.
C2000302

FDRABR.VPROD03.
C2000301

FDRABR.VPROD03.
C2000300

Each day’s backups are all contained on one tape. The typical sequence of FDRDRP operation is:

DASD Volume:

PROD01

PROD02

PROD03

Time T:

Mount tape 333333, restore from file 1

Wait for volume 333333

Wait for volume 333333

T+1:

Mount tape 222222, restore from file 1

Restore from file 2 on 333333

T+2:

Mount tape 111111, restore from file 1

(full-volume backup)

Restore from file 2 on 222222

Restore from file 3 on 333333 and dismount

T+3:

Wait for volume 111111

Restore from file 3 on 222222 and dismount

T+4:

Volume Restore completed.

Restore from file 2 on 111111

(full-volume backup)

Wait for volume 111111

T+5:


T+6:


Volume Restore completed.

Restore from file 3 on 111111 and dismount

(full-volume backup)

T+7:



T+8:



Volume Restore completed.

An ABR full-volume recovery of these same three DASD volumes would mount each tape three times (9 mounts) and would take time to position to the required file. FDRDRP mounted each tape only once (three mounts) and eliminated positioning delays, resulting in a typical elapsed time saving of over 80%.

The shaded blocks show where each task is waiting for a tape to become available. As of time T+2, three tape drives are in use and all three restore subtasks are actively restoring data (this can be limited by the MAXTAPES= operand). At times T+1 and T+3 two tapes are in use and two subtasks are restoring data.

This example is simple. In a real restore, where each day’s backups may be on a variety of tapes or on multi-volume tape sets, the sequence is more complicated. There may even be conditions where FDRDRP must release a tape and remount it later. However, no matter how complex the restores, FDRDRP maximizes the number of concurrent restores (subject to MAXTAPES=) and minimizes the number of tape mounts, greatly reducing restore elapsed time compared to ABR full-volume recoveries from incremental backups.

FDRDRP considerations

Each FDRDRP restore subtask must read the backup tapes in the normal order used by ABR full-volume recovery, reading the most recent incremental backup first, then the next oldest, and so on, until the full-volume backup is read.

FDRDRP works efficiently when all the backups for a given day for each DASD volume being restored are on one tape volume or multi-volume set. This is the pattern shown in the example above. Each input tape is mounted only once (it may be necessary to mount a tape twice if a backup file crosses from one tape volume to another).

This is compatible with the way that most ABR users run their volume backups. A typical installation runs ABR volume backups daily, selecting all the volumes to be backed up. Scratch tapes are used for output and ABR automatically stacks backup files on tape, so the output tape (or multi-volume tape set) contains all the backups created on that day. If you use multiple TAPEx DD statements in the ABR step or run multiple ABR backup jobs, multiple tapes or tape sets are created containing backups for a subset of your DASD volumes but they still contain only backups created on that day. In most cases, FDRDRP is able to restore all the volumes while mounting each backup tape only once.

FDRDRP works efficiently even if you use the ABR LASTAPE feature to add backup files from multiple days onto an existing tape or tape set.

However, if your backups are not so neatly ordered, perhaps because you do not backup every DASD every day or some backups failed, the restore is not so simple. For example, if the first (latest) cycle required by one subtask is on the same tape as the third cycle required by another subtask, that subtask may not be ready to read that tape when the first subtask is finished with it. In that case, the tape may be dismounted and remounted later. This is unusual and does not occur for most FDRDRP users. Dismounting and remounting may also occur if you are restoring a large number of DASD volumes in one FDRDRP step.

Important

If you invoke concurrent ABR backups by multiple TAPEx DD statements in the ABR JCL, the choice of which DASD volumes go to which TAPEx DD statement is dynamic and may vary from day to day depending on the amount of data backed up from each volume. This complicates the FDRDRP restore process and may slow it down and/or require more tape mounts.

FDRDRP executes more efficiently if you specify MAXTAPES= equal to the number of tape drives you have available at the disaster site (or the number of drives you are willing to allocate to this FDRDRP step). If it is set to a value larger than the number of drives, extra overhead is incurred while the FDRDRP subtasks contend for the available tapes.

Important

Some customers run their full-volume ABR backups with MAXFILE=1 so that each DASD volume starts its backup as file 1 on a fresh scratch tape. This improves ABR restore efficiency when restores are run in parallel with ABR jobs. With FDRDRP, MAXFILE=1 is not recommended. You can use the default of MAXFILE=255 or even specify a larger value, in order to fill tape volumes to capacity, and still get a great deal of restore parallelism with FDRDRP.

Warning

FDRDRP does many dynamic allocations for the tapes it needs. Dynamic allocation of tape is affected if there is an unsatisfied allocation recovery message on the console, such as:IEF244I job RESTORE - UNABLE TO ALLOCATE 1 UNIT(S) IEF877E job NEEDS 1 UNIT(S) 98 IEF238D job - REPLY DEVICE NAME,'WAIT' OR 'CANCEL'.

Until the operator replies to this message, no further allocations of that type of tape can be satisfied, which probably causes FDRDRP to eventually wait until the reply is made. This is true even if the allocation recovery is for another non-FDR job. If the operator replies WAIT, message IEF433D must also be satisfied, replying HOLD or NOHOLD.

FDRDRP may occasionally issue a console UNLOAD (U) command for tapes it has mounted. These UNLOAD commands may also appear in the job log of the FDRDRP job. This is normal. If a restore task is done with a tape and the next restore task needing that tape is not yet ready to accept it within a short period of time, it is unloaded so that the tape drive is not tied up unnecessarily. An UNLOAD usually means that the tape is remounted during another part of the restore process.

Unless your tape management and security databases have been recovered to a point after the creation of these ABR backups at your home site, you are likely to get tape management or security errors as ABR tries to open the tapes. We recommend that you disable tape management and security checking during the FDRDRP restores. If you cannot disable tape management, specify EXPDT=98000 on the RESTORE statement; for most tape management systems that bypasses tape management checking at OPEN time.

Important

 The use of 98000 may be limited by your security system or by options in the tape management system itself.

If you have auto-switchable (A/S) tapes on the system where you run FDRDRP (such as a disaster recovery site starter systems) you may want to modify PARMLIB member ALLOCxx to add the statement: VERIFY_VOL POLICY(NO).

This avoids an unnecessary verification of the tape volume every time it is passed from one restore subtask to another.

Testing FDRDRP

A full-blown test of FDRDRP normally must be done at the disaster recovery site, where you can restore all of the DASD volumes required.

However, you probably want to verify that FDRDRP works before you devote the resources to that full-blown test.

You can execute a limited test of FDRDRP at your home site (or on an LPAR) if you have a number of DASD volumes, such as unused volumes or scratch (temporary) volumes, which you can dedicate as target volumes for the FDRDRP restore test. You use FDRDRP to restore a subset of your production volumes to those temporary target volumes. Although FDRDRP works correctly with as little as one tape drive, it is a more realistic test if you can dedicate several tape drives to the test.

See example Test FDRDRP Example in FDRABR-Volume-Backups.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*