Recovering IAM Data Sets

Recovery

As with any data set containing critical data, it is important to have established data set recovery procedures in place. This includes having proper data set backups and a way to restore any potential data that may be lost due to a hardware or software failure. Most software packages that provide logging and recovery for VSAM files can also be used with IAM data sets. To assist in data set recovery IAM provides the utility program IAMRECVR that may be of assistance in the recovery process. IAMRECVR is not intended to take the place of established data set recovery procedures, but may aid in reducing the time required to recover a data set, or prevent the use of such procedures in some circumstances. IAMRECVR can only retrieve the data that physically exists on the DASD device in a readable form. Data that is inaccessible due to media or other failures are not recoverable with this utility. Likewise, data that has been overwritten or never written out to the storage media cannot be recovered by IAMRECVR.

IAMRECVR

Should you suspect any type of problem with an IAM file, either physical (I/O errors), or software failures, IAMRECVR can be used to aid in diagnosing and recovering IAM data sets from such types of errors. IAMRECVR offers a set of services that includes the following commands:

DIAGNOSE - Reads the entire data set to validate general data set integrity.
PRINT - Prints selected portions of the IAM data set to provide diagnostic information.
RECOVER - Reads all the data in the IAM data set, and copies the readable data to a sequential output data set that can be used to reload the file. Records with duplicate keys are optionally written to a separate sequential data set.
APPLY - Copies data from the duplicate log file into the newly loaded recovered data set.

IAMRECVR has knowledge about underlying structure of an IAM data set, and reads IAM files without using the IAM access method. IAMRECVR utilizes high-performance I/O, reading up to an entire cylinder per physical I/O. With the information about each particular data set, which is usually retrieved from the data set itself, IAMRECVR pulls out data records from the file. While the entire data set will be read, data blocks containing the index structure are not processed.

This section will explain how to validate data set integrity, how to collect any diagnostic information that may be requested, and how to recover the data set from various error situations through the use of IAMRECVR and various other utilities. Complete information on all of the functions and operands available with IAMRECVR are provided in the System Analysis Utilities IAMRECVR – IAM Dataset Recovery of the space. The majority of data set integrity problems are the result of improper data set sharing. In those circumstances, there may quite likely be some data lost because it may have been overwritten, and cannot be retrieved because it is not there. Complete data recovery in those situations will require the use of additional recovery procedures beyond those provided by IAM.

There are two general categories of failures. The first category consists of errors encountered processing the data set. For example, you may have a program that is receiving an IAMW12 message indicating a data decompression error, or perhaps an IAMW37 I/O error message, or an IAMW17 error message indicating concurrent updates. Or perhaps you are receiving unexpected errors or results from the application, which can include receiving unexpected error codes from IAM while processing the data set. These types of errors may indicate a problem with the data set itself, however, they more frequently arise due to software failures such as inadvertent storage corruption. Such errors frequently do not corrupt the file. For these situations, one would normally start by running an IAMRECVR DIAGNOSE operation to validate general data set integrity, which then may be followed with a recovery or a reorganization of the data set. If you suspect that IAM is not returning all the records that are expected, it is critical to run an IAMRECVR RECOVER on the data set before reorganizing the file or otherwise copying the data. Depending on the nature of the problem, IAMRECVR may be able to retrieve records that cannot be retrieved by normal IAM processing that utilizes the index structure of the data set.

The second category of failure is where IAM is unable to open the data set. Generally, an IAM error message will be displayed indicating the cause of the failure to open. If the problem is other than environmental, such as insufficient storage, then a data set recovery will be necessary. Generally, this type of error is accompanied by an IAMW79 or an IAMW01 with an IAMW37 error message. These messages contain information about the point of failure during the open, so it is critical to save this message for diagnostic purposes. For this category of errors it is best to start out with obtaining initial diagnostic information, and then proceed directly to use the IAMRECVR RECOVER command to copy the data records out of the file.

IAMRECVR will provide a report on any errors it may find, also, it will indicate how many records that it actually read. This record count can be used to compare with the count from other sources to determine if all the records have been retrieved. There may be some circumstances where anIAM data set cannot be opened for which IAMRECVR successfully performs the recovery, however, finds no detectable errors. This is because IAMRECVR is not using the index structure of the data set to read the data, so it may not be detecting the error condition.

DIAGNOSE

When errors are encountered when processing an IAM data set, generally the first step is to validate the general data set integrity with an IAMRECVR DIAGNOSE function. This is most useful for circumstances where you suspect that there might be a problem with an IAM data set. For example, if a job has received an IAMW12 data decompress error, or an IAMW17 indicating concurrent updating, or an I/O error message processing the IAM data set. Or, perhaps there has been a system failure and you just want to verify the integrity of the IAM data set. This is done with the DIAGNOSE command.

DIAGNOSE will read the entire IAM data set, validating basic file integrity. It will verify the following:

All data blocks can be physically read.
Records are in ascending key sequence within each data block.
All data blocks have valid structure.
All compressed records can be decompressed.
Records are in ascending key sequence within the prime area of the data set.
Verify that there are no duplicate records within a data block.

DIAGNOSE will provide information on any errors it detects, along with a count of the number of records it was able to read from the data set. The output will also include a report with basic information about the data set that is quite similar to the IAMPRINT LISTCAT report. This information will help you identify if there is actually a problem within the data set itself, and if so, how much of the data will be recoverable.

Example A: IAMRECVR DIAGNOSE

The example below demonstrates how to run an IAMRECVR DIAGNOSE. The requirements are providing a control card input on SYSIN that specifies the DIAGNOSE command, a DD statement with the name of DISKIN that allocates the IAM data set to be diagnosed, and a SYSPRINT DD card for the printed output.

Example of an IAMRECVR DIAGNOSE (EX1087A)

//DIAGNOSE EXEC PGM=IAMRECVR,REGION=0M
//SYSPRINT DD   SYSOUT=*
//DISKIN   DD   DSN=my.iam.dataset,DISP=SHR
//SYSIN    DD   *
DIAGNOSE
/*

The results of the DIAGNOSE process will provide information on the status of the data set, and how many of the data records can be recovered. If problems were found, or difficulties are continuing with the data set, then recovery must be performed. Based on the information provided by the DIAGNOSE run, you can then determine what the best recovery method will be for this particular data set. If it looks like IAMRECVR can retrieve all the data records, or there are no other alternative recovery procedures available, then the IAMRECVR RECOVER command can be used to create a sequential output file containing the data.

Example B: Multiple DIAGNOSE

IAMRECVR can also DIAGNOSE multiple data sets in a single execution. This is achieved by specifying different DD names on separate DIAGNOSE commands using the FROMDDNAME keyword on the DIAGNOSE command, as illustrated by the following example.

Example of Diagnosing multiple files (EX1087B)

//MULTDIAG EXEC PGM=IAMRECVR,REGION=0M
//SYSPRINT DD   SYSOUT=*
//FILE1    DD   DISP=SHR,DSN=my.iam.file1
//FILE2    DD   DISP=SHR,DSN=my.iam.file2
//FILE3    DD   DISP=SHR,DSN=my.iam.file3
//SYSIN    DD   *
DIAGNOSE FROMDD=FILE1
DIAGNOSE FROMDD=FILE2
DIAGNOSE FROMDD=FILE3
/*

Obtaining Diagnostic Information

One of the important considerations in recovering an IAM data set is that the corrupted data set may be needed for problem determination and resolution. If it is at all possible, it is best to save the existing problem data set where it is, and recover into a new data set. If that is not possible, then the next best choice is to back up the data set with FDR, or a comparable software product, such as DFSMSdss from IBM. Refer to the section on backing up IAM data sets for information and examples of how to obtain a backup copy with one of those products. Other types of backup copies may not preserve the exact image of the data set, which can result in not being able to perform problem determination. An IDCAMS REPRO, or the output data set from IAMRECVR RECOVER command will not be adequate for performing problem resolution.

Example C: PRINT IDPINQ

Equally important is to obtain some diagnostic information. If you call for assistance due to a potentially damaged file, you will be asked to obtain diagnostic information. The IAMRECVR PRINT command is used to help obtain some of this information. It will be quite helpful to run the job below prior to calling for technical assistance, as this is generally a diagnosis starting point. This job uses the PRINT command of IAMRECVR to print the blocks within the data set that describe the characteristics and physical structure of the file. A LISTCAT ALL is also being performed, as that will print additional information on the volumes that the data set resides.

PRINT IDPINQ Example (EX1087C)

//PRINTIDP EXEC PGM=IAMRECVR,REGION=4096K
//SYSPRINT DD   SYSOUT=*
//DISKIN   DD   DSN=my.iam.dataset,DISP=SHR
//SYSIN    DD   *
PRINT    IDPINQ
/*
//LISTCAT EXEC PGM=IDCAMS
//SYSPRINT DD   SYSOUT=*
//IAMPRINT DD   SYSOUT=*
//SYSIN    DD   *
  LISTCAT ENT(my.iam.dataset) ALL
/*

Example D: Printing Blocks

With information from the above PRINT or DIAGNOSE command, you may also be asked to print selected blocks from the IAM data set. The IAM Technical Support representative will inform you if this is needed, and provide you with the block ranges to be printed. This is accomplished with a different flavor of the PRINT command. In this case, a starting block number is specified by the FBLK= (from block) keyword, and the number of blocks to print is specified with the MAXBLKS= keyword. You can specify multiple PRINT commands in the same execution of IAMRECVR, each of which will specify different ranges of blocks to be printed. The example below shows how this is done.

Example of printing out blocks of an IAM Data Set (EX1087D)

//PRINTBLK EXEC PGM=IAMRECVR,REGION=4096K
//SYSPRINT DD   SYSOUT=*
//DISKIN   DD   DSN=my.iam.dataset,DISP=SHR
//SYSIN    DD   *
PRINT FBLK=65,MAXBLKS=4
PRINT FBLK=100,MAXBLKS=1
/*

RECOVER

To perform the recovery, the IAMRECVR RECOVER command is used to obtain a sequential copy of the data. If there are records in the Extended Overflow (or Independent Overflow for Compatible format files), the sequential output file must be sorted. The RECOVER command will invoke the sort product you have installed at your installation. The JCL you provide must specify whatever DD cards are needed for the sort. Frequently this requires specification of sort work space, with three or more SORTWK0x DD cards. This sort work space must be adequate enough to handle the amount of data that is contained within the data set. The recover step is followed by a step to define a new cluster and reload the data with IDCAMS REPRO. The reload step also includes renaming the data sets.

Spanned Records

To perform a recovery on files with spanned records that exceed 32K, there are additional considerations. On the step executing the IAMRECVR RECOVER command, an additional DD statement is required, that is, SPANOUT. This DD statement, defines a file on tape or disk that will contain those records that are actually so spanned, that it is too large to fit within a single block. After performing the reload of the data set from the TAPEOUT file an additional step is required for the spanned records. That step is to run the IAMRECVR APPLY command with the SPANNED keyword. That will update the recover file with the spanned records from the SPANOUT file. Further information on using these keywords is in IAMRECVR – IAM Dataset Recovery.

Example E: Basic Recover

The example below, performs both the RECOVER step and the REPRO step. First, the RECOVER command of IAMRECVR is executed to create a sequential file containing all of the data records from the IAM data set. If that process is successful, then IDCAMS is executed. With IDCAMS a new data set is defined, using the original data set as a model, and then loaded with the recovered data. This is followed by renames of the data sets. You should note the use of the IDCAMS IF and CANCEL commands which are done to preserve the original data set in case a failure occurs during the IDCAMS processing.

For the execution of the IAMRECVR program, the DISKIN DD specifies the IAM data set, and the TAPEOUT DD specifies the new sequential data set. The SYSPRINT and SYSIN DD cards are required by IAMRECVR. The SORTWK0x and SYSOUT DD statements are provided for the SORT.

For the execution of the IDCAMS step, the SYSPRINT and SYSIN DD are required. The sequential file created by the RECOVER process in the prior step is included with a DD name of INFILE. The IAMINFO DD is optional, but recommended to obtain the run time report for the file load.

Note

There is a DD statement, OLDIAMDS, for the original data set that is otherwise not referenced. This is done to hold the ENQ on the original data set name until the recovery process is complete. Proper caution should be used in constructing the job stream to make sure that the data is preserved.

Basic Data Set Recovery Example (EX1087E)

//RECOVER EXEC PGM=IAMRECVR,REGION=4096K
//SYSPRINT DD   SYSOUT=*
//SYSOUT   DD   SYSOUT=*
//DISKIN   DD   DISP=OLD,DSNAME=my.iam.dataset
//TAPEOUT DD   DSN=my.seq.dataset,DISP=(,CATLG),
//    UNIT=SYSDA,SPACE=(CYL,(20,10))
//SORTWK01 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//SORTWK02 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//SORTWK03 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//SYSIN    DD   *
RECOVER
/*
//LOADNEW EXEC PGM=IDCAMS,COND=(0,NE)
//SYSPRINT DD   SYSOUT=*
//IAMINFO DD   SYSOUT=*
//INFILE   DD   DSN=my.seq.dataset,DISP=OLD
//OLDIAMDS DD   DSN=my.iam.dataset,DISP=OLD
//SYSIN    DD   *
  DELETE my.newiam.dataset
  DELETE my.newiam.dataset NOSCRATCH
  SET MAXCC=0
  DEFINE CLUSTER(NAME(my.newiam.dataset) -
OWNER($IAM) -
MODEL(my.iam.dataset))
  IF MAXCC NE 0 THEN CANCEL
  REPRO INFILE(INFILE) ODS(my.newiam.dataset)
  IF MAXCC NE 0 THEN CANCEL
  ALTER my.iam.dataset NEWNAME(my.badiam.dataset)
  IF MAXCC NE 0 THEN CANCEL
  ALTER my.newiam.dataset NEWNAME(my.iam.dataset)
  LISTCAT ENT(my.iam.dataset) ALL
  LISTCAT ENT(my.badiam.dataset) ALL
/*

Duplicate Keys

One of the circumstances that may occur is that, records with duplicate keys are discovered by the RECOVER process after the sort has been done, while the output sequential data set is being written. This circumstance does not necessarily represent a data integrity problem with the file. When a record is updated, the length of the record may be changed either by the application program itself, or by IAM, if the updated record compresses differently. If there was an increase in the record length as a result of the update, the record may no longer fit within the current block that it resides in, so it is moved by IAM to Overflow. Without Variable Overflow, because the maximum length is reserved for a record, once it is moved to an overflow block it will stay in that block. With Variable Overflow, the record may need to be moved to a different overflow block. IAM will first write out the updated record within the block it was moved to, and then subsequently write out the original block with the old record deleted. If a failure occurs, that prevents proper closing of the data set, the second write might not be completed, resulting in, the record existing in both blocks. Failures that may result in this condition include, z/OS failures resulting in an IPL without proper application shutdown, using the z/OS FORCE command to cancel an updating job from the system, or other types of address space failures, or power outage. Files that were opened for update during such a failure should be reorganized or recovered as soon as possible after such a failure. Unfortunately, such failures also prevent the file statistics from being updated as well, so accurate information may not be reflected in the statistics particularly for the actual record count.

Other possibilities for duplicate keys include sharing the IAM data set for concurrent updating or software failures that caused storage corruption. For these types of duplicates, you may need to examine which of the duplicate records you want to have in the recovered data set. This can be accomplished by editing the LOG data set that is created by IAMRECVR, and then running the APPLY step.

During normal IAM processing, the first duplicate record condition is not a problem as long as the record is not deleted. This is because with Enhanced format files, the record will always be moved to a higher relative block than it existed before the update. Hence, the valid record will always be the record in the highest block.

Note

For Compatible format files, the situation is reversed because the Overflow area is at the physical beginning of the data set.

For a recovery using IAMRECVR, a different procedure than the basic one shown in the preceding example must be used.

Example F: Recover with Duplicate keys

The first change to the original example is that the SORT must be told to pass records with equal keys back in the same order that they were passed to the SORT. This is done with the EQUALS option for DFSORT and SYNCSORT. For DFSORT, this option is specified by a control card input using the DFSPARM DD. For SYNCSORT, a $ORTPARM DD card is used. In the example, both are included. The next change for Enhanced format files is to specify a LOG data set and indicating on the RECOVER command that records with duplicate keys are to be logged, that is, specify DUP=LOG on the RECOVER command. The first record of any specific key value will always be written in the normal sequential output data set. Any subsequent records with the same key will be written out to the LOG data set, followed by doing the normal reload for the data set. Then, for Enhanced format files only, the records in the LOG data set are copied into the IAM data set with an IAMRECVR APPLY statement. You could also use an IDCAMS with the REPRO REPLACE statement. See Example G for using the REPRO REPLACE instead of the IAMRECVR APPLY. The advantage of using the IAMRECVR APPLY is that, it will print the keys of the records that are being replaced by the apply operation.

Example of Recovering Data set With Duplicate Keys (EX1087F)

//RECOVER EXEC PGM=IAMRECVR,REGION=4096K
//SYSPRINT DD   SYSOUT=*
//SYSOUT   DD   SYSOUT=*
//DISKIN   DD   DISP=OLD,DSNAME=my.iam.dataset
//TAPEOUT DD   DSN=my.seq.dataset,DISP=(,CATLG),
//    UNIT=SYSDA,SPACE=(CYL,(20,10))
//LOG      DD   DSN=my.duprec.dataset,DISP=(,CATLG),
//    UNIT=SYSDA,SPACE=(CYL,(2,1))
//SORTWK01 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//SORTWK02 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//SORTWK03 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//$ORTPARM DD   *
EQUALS
/*
//DFSPARM DD   *
EQUALS
/*
//SYSIN    DD   *
  RECOVER DUP=LOG
/*
//LOADNEW EXEC PGM=IDCAMS,COND=(0,NE)
//SYSPRINT DD   SYSOUT=*
//IAMINFO DD   SYSOUT=*
//INFILE   DD   DSN=my.seq.dataset,DISP=OLD
//SYSIN    DD   *
  DELETE my.iam.dataset
  IF MAXCC NE 0 THEN CANCEL
  DEFINE CLUSTER(NAME(my.iam.dataset) -
OWNER($IAM) -
VOL(myvol) CYL(20 10) -
RECORDSIZE(300 1000) -
KEYS(16 0) -
FREESPACE(10 10) -
SHAREOPTIONS(2 3))
  IF MAXCC NE 0 THEN CANCEL
  REPRO INFILE(INFILE) ODS(my.iam.dataset)
  LISTCAT ENT(my.iam.dataset) ALL
/*
//APPLY    EXEC PGM=IAMRECVR,REGION=0M
//SYSPRINT DD   SYSOUT=*
//IAMINFO DD   SYSOUT=*
//LOG      DD   DSN=my.duprec.dataset,DISP=OLD
//VSAMOUT DD   DSN=my.iam.dataset,DISP=OLD
//SYSIN    DD   *
APPLY OUT=VSAM
/*

Example G: Recover with REUSE

In this next example, rather than deleting and redefining the data set, it is copied into with a REPRO REUSE. If that is successful, then the duplicates, if any, are copied into the IAM data set with another REPRO, but this time with REPLACE.

Example of recover with REPRO (EX1087G)

//RECOVER EXEC PGM=IAMRECVR,REGION=4096K
//SYSPRINT DD   SYSOUT=*
//SYSOUT   DD   SYSOUT=*
//DISKIN   DD   DISP=OLD,DSNAME=my.iam.dataset
//TAPEOUT DD   DSN=my.seq.dataset,DISP=(,CATLG),
//    UNIT=SYSDA,SPACE=(CYL,(20,10))
//LOG      DD   DSN=my.duprec.dataset,DISP=(,CATLG),
//    UNIT=SYSDA,SPACE=(CYL,(2,1))
//SORTWK01 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//SORTWK02 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//SORTWK03 DD   UNIT=SYSDA,SPACE=(CYL,(20,10))
//$ORTPARM DD   *
EQUALS
/*
//DFSPARM DD   *
EQUALS
/*
//SYSIN    DD   *
  RECOVER DUP=LOG
/*
//LOADNEW EXEC PGM=IDCAMS,COND=(0,NE)
//SYSPRINT DD   SYSOUT=*
//IAMINFO DD   SYSOUT=*
//IAMFILE DD   DSN=my.iam.dataset,DISP=OLD
//INFILE   DD   DSN=my.seq.dataset,DISP=OLD
//DUPFILE DD   DSN=my.duprec.dataset,DISP=OLD
//SYSIN    DD   *
  REPRO INFILE(INFILE) OUTFILE(IAMFILE) REUSE
  IF MAXCC NE 0 CANCEL
  REPRO INFILE(DUPFILE) OUTFILE(IAMFILE) REPLACE
  LISTCAT ENT(my.iam.dataset) ALL
/*