Using hardware compression

IAM supports a hardware data compression option to use the IBM Hardware Compression instruction. The IAM use of hardware compression is specified when an IAM file is defined or loaded by using an IAM CREATE Override of DATACOMP=HARDWARE, HW or HWE. IAM offers a dynamic dictionary build function, which will be automatically invoked when a file is loaded with hardware compression requested, unless a specific customized dictionary is requested. Specification of HWE will use an enhanced dictionary build process that expands the number of data patterns examined and may result obtaining better compression.

IAM software compression

The IAM software compression technique uses a highly optimized proprietary software algorithm to compress data by eliminating strings containing repetitive byte values. Additionally, the IAM technique has very low cost for strings of data that could not be compressed. This technique, along with IAM being able to make as full use as possible of the capacity of each track, provided significant space savings for many data sets. Also, because this technique requires no dictionary, it reduces virtual storage and DASD storage requirements. While other compression techniques could provide greater compression, such techniques generally also came with a high price tag in terms of CPU consumption.

Hardware compression

The IBM Hardware Compression algorithm relies on compression and corresponding decompression dictionaries. These dictionaries allow for compression and expansion of repeating data patterns within the records. A couple of difficulties are encountered with such an algorithm. First, for optimal compression, the data must be previewed to find the repeating data patterns within the data set. Then from those repeating data patterns, the most frequently observed data patterns are converted into compression and decompression dictionaries. Also, one must make sure that the dictionaries are safely stored, because if the decompression dictionary is lost, then the data cannot be decompressed thereby becoming useless. An alternative method of building or selecting a dictionary is to scan the first few records as they are being loaded, and select a generic dictionary that appears to offer best compression, or build one based on the initial data. Such dictionaries can provide decent amount of compression, although not the most compression possible for any given file.

IAM use of hardware compression

IAM provides support for hardware compression, which can be selected as the default compression technique via the IAM Global Options table. The IAM software compression algorithm is still fully supported, and remains the default compression technique as the IAM Global Options are shipped. The hardware compression option can be selected either as the default compression technique or on an individual file basis by specification on an IAM CREATE Override. When IAM uses hardware compression, IAM will either dynamically build a dictionary based upon the data contents of the initially loaded records, or use a compression dictionary selected by the user via the DICT= IAM CREATE Override. In either case, IAM will store the dictionary within the data set itself when the data set is loaded.

The objective of the IAM dynamic dictionary build function is to provide users with an easy method to utilize the hardware compression functions to achieve beneficial data compression with minimal overhead in creating compression dictionaries. There is a second dynamic dictionary build process that is specified ty the override HWE. This enhancement enables IAM to gather a larger set of patterns to consider when building the dictionary, and may result in greater compression. Users that require the best possible compression will in most circumstances achieve that only by the use of a customized hardware dictionary, which can be built as per the instructions provided below.

Creating a compression dictionary

There are a few steps required to build compression dictionaries for your files. The procedure consists of the following steps:

Create the control statements required for executing the IBM REXX EXEC that will read the data, and generate the dictionary. Review the information in ‘SYS1.SAMPLIB(CSRBDICT)’, which includes detailed instructions on using the exec.
Create a sequential data set containing the data you want to build a dictionary for, or a representative subset of the data.
Execute the CSRBDICT REXX EXEC, with the sequential data set you created previously, and with the control information you’ve decided on.
Assemble and link the dictionary generated by the CSRBDICT REXX EXEC into load module format.

Using CSRBDICT

Using the CSRBDICT REXX exec can be a rather intimidating task, as there are many parameters that can be specified. Determining the best settings for many of the parameters may require multiple executions of CSRBDICT, varying the various parameters, and then reviewing the results. The exec can run for a long time, using lots of CPU time to come up with a dictionary. It is easiest to start with a very basic execution and testing the resulting dictionary. If you are satisfied with the amount of compression you obtain, then go with that. For many files, this basic approach can yield excellent results. If you are looking for more compression or attempting to create a dictionary for multiple data sets, then you can get more involved with varying parameter settings and providing a more detailed layout of your data records to CSRBDICT.

To get you started, a basic set of control card input is being provided in the IAMSAMP. The member name is BDICTEX1 and is shown below. This example uses the basic IBM recommended parameters and provides two field statements. The first field statement describes the data up to and including the key, which is going to be ignored by CSRBDICT, because IAM will not attempt to compress that data. The second field statement is for the rest of the data in the record, which will actually be subject to compression. All you need to do is, to alter the starting position on the second field card to indicate the appropriate position of the first byte after the key within the data file for which the dictionary is being built. When selecting the values to use in the CSRBDICT “spec” file, for the “dicts” field you must specify either “AF ASM” or “AFD ASM” so it will generate a file containing assembler language representation of the dictionaries.

Example input to CSRBDICT REXX exec.

**The following is an example for building a 4k entry dictionary
**just using a basic pattern scan. The first field card indicates to skip
**the data that is up to and including the key. The second field card is
**for the rest of the data in the record.
**results maxnodes maxlevels msglevel stepping prperiod dicts
r       40000    64        3        f 7 2 7 1000     afd asm
**colaps opt treedisp treehex treenode dupccs
aam    opt x        h       n        x
**FLD col type dcenmen              INT intspec
FLD 1   ns
FLD 15 sa
FLD end

The next step is to obtain a representative sample of the data that is contained within the file that you are going to be creating the dictionary for. If the file size is relatively small, say under 50,000 records, you can probably use the entire file. However, if it is larger or if CSRBDICT is taking too long to run, you can either take a sample of the records in the file or revise the “stepping” value in the BDICTEX1 member so that the entire input file will not be used. So, for example to reduce the amount of data scanned to say a little less than half the data (in this case 3/7), change the stepping values from “f 7 2 7” to “f 3 2 7”.

For execution parameters, you must also specify format-1 sibling descriptors, which is done by specifying the value 1 for the “sdfmt” field. For dictionary size try starting with 4(K) entries. From our limited testing, we had best overall compression results with a maximum of 4K entries. Depending on your data patterns, you may find that a larger or smaller dictionary size will yield better results. The CSRBDICT process can take a long time to run, so be patient. Shown below is an example of the command to execute the CSRBDICT REXX exec.

Example of executing CSRBDCIT REXX exec

ex 'sys1.samplib(csrbdict)' '4 1 eb “my.test.data” (“IAM.ICL(BDICTEX1)”'

The CSRBDICT REXX exec can also be executed from a batch job. Shown below is an example of JCL to do just that:

Example of executing CSRBDICT in a batch job

  //TSOTMP   EXEC PGM=IKJEFT01,DYNAMNBR=60,TIME=120
  //SYSTSPRT DD   SYSOUT=*
  //SYSTSIN DD   *
    PREFIX myuid
    EX 'SYS1.SAMPLIB(CSRBDICT) '4 1 EB TEST.DATA (IAM.ICL(BDICTEX1)'

Assemble the Compression Dictionary

After successfully running CSRBDICT, there will be several output files. There are two files of primary interest to the IAM dictionary build process, and they will have the suffixes of ACDICTs1 and AEDICTs1, where “s” will be the number of K entries in the dictionary, either 1, 2, 4, 8 or H for 512 entries. These two data sets will be assembled and linked into a load module that can be used by IAM for the compression and expansion (decompression) dictionary.

In the IAMSAMP is an example, HWDASM, of the JCL to assemble and link the dictionaries for use by IAM. The first step ASMACD assembles the compression dictionary. Change the name of the SYSIN data set to the name that was created by CSRBDICT for your file, it will have the suffix ACDICTx1. The second step, ASMAED, assembles the expansion dictionary. Change the SYSIN data set to the name that was created by CSRBDICT for the expansion dictionary for your file. It will have a suffix of AEDICTx1. The third and final step, LKED, will link the two dictionaries together into a load module that IAM can use. The first four characters of the dictionary name must be ‘IAMD’, and you can choose the last four characters. Make sure that the characters you use, do not conflict with any existing load module names or other dictionaries.

Tip

Set the first optional character to the “s” value of H, 1, 2, 4, or 8, and then select three other alphanumeric characters. It is recommended that you place it in a load module library other than the IAM library, so that way it will not be lost when a new level or version of IAM is installed.

Using the Compression Dictionary

To use the dictionary that you just created, define an IAM file using an IAM Override card specifying hardware compression, and a dictionary name with the last four characters that you selected for the dictionary name. For example, if you chose 4ABC in the prior step, use the following override:

//IAMOVRIDDD*
CREATEDD=&ALLDD,DATACOMP=HW,DICT=4ABC
/*

You can now load your file using your compression dictionary. After the file is loaded, if you perform a LISTCAT, the IAMPRINT report should indicate that the data set is hardware compressed, with a dictionary name of 4ABC, and that the dictionary is stored in the IAM data set.

After seeing the amount of space used, you may want to try changing some of the parameters for CSRBDICT to see if you can obtain better compression. If so, change the parameters and rerun the build process, making sure that you have noted your prior parameters and the results. Once you are happy with the results, you can save your final resulting dictionary.