IAM versus VSAM Remarks
VSAM Observations
The primary focus of the benchmark runs was on using IAMDRCC and IAMSRORG. There were a few runs done using real VSAM data sets for comparison for both comparisons. There were some performance related differences between VSAM and IAM that we noticed and thought were interesting points to mention.
Reorg with AIX
VSAM clusters that have alternate indexes are required to be defined with the attribute of NOREUSE. When a reorganization of the base cluster is performed, this forces users to delete, define, and rebuild all of the alternate indexes. This increases the time that the data set is unavailable for processing. To be compatible with VSAM, IAM does mark the base clusters as being non-reusable. However, reorganization programs such as IAMDRCC, IAMSRORG, and FDRREORG are allowed to reorganize the base clusters that have alternate indexes bypassing the non-reusable attribute. This enables IAM files to be available much sooner by avoiding unnecessary rebuilding of alternate indexes.
Insert Performance
The IAM record based overflow provides better performance for heavy insert applications than the VSAM CI/CA split technique. For example, with the insertion of 17.5 million records to setup the test file for the overflow, we noticed:
- VSAM CPU Time was 98 minutes versus IAM 17 minutes.
- VSAM Elapsed time was 5 hours, IAM was 1.1 hours.
- IAM used 83% less CPU time, and ran in 78% less time. VSAM ran almost 5 times longer than IAM.
Unused DASD Space
VSAM KSDS clusters can end up with portions of the cluster that become unusable space because of the nature of it’s index structure. When the index control interval can not hold all of the entries needed to index a control area, VSAM will then continue writing in the next control area. No messages are produced when this occurs, although an informational error code is returned on the write with a return code of 0. It is uncommon for programs to check the error code with a return code 0, many applications do not even check for a file full error code which actually has a non-zero return code. This is effectively a silent problem that users may have and are not even aware. It is primarily detected through analysis of the information in the IDCAMS LISTCAT output. In many circumstances, unless you are actively looking for this problem you are not going to know that it exists.
An example is our benchmark test case where we initially let VSAM choose the index control interval size, and it chose 2048. When examining the LISTCAT output we noticed that the amount of free space appeared to be higher than expected considering it was allocated with free space values of 0. Based on calculations from the LISTCAT report after the file load, there was 43% of free space (that is, control intervals) in the used area of the data component. Based on the High Used RBA, there were 692,145 tracks being used in the data component, of which 296,178 were not actually being used amounting to 14.6 gigabytes of DASD space of the 34 gigabytes that were listed as used.
Upon discovering that problem, we defined the file with a lager index Ci size of 4096, and gained back use of that unused area. The allocated size of the file was reduced by 43% from 49,770 cylinders to 28,413 cylinders with approximately 2,000 cylinders available space at the end of the data set waiting to accommodate the subsequent inserted data records. A basic reorganization does not resolve this issue, some changes are required to the file definition parameters.
The point is that this is another circumstance of manual tuning activity that needs to occur to keep VSAM running efficiently and using space efficiently. One also needs to have the knowledge that this can and does occur, and to be on the lookout for it. With IAM this does not occur and is one potentially time consuming task with VSAM that is unnecessary with the IAM product.
Need to Reorganize
For VSAM the primary factor in deciding to reorganize is to reduce the amount of space being used by the data set. While CA Reclaim has reduced the need for reorganizations of some files that experience high record delete activity, many data sets tend to be continually growing with additional data records being inserted. As the process progresses, CA splits occur increasing the size of the data set. Ideally the inserts would occur such that the all of the space made available by the CA splits would be used. However many times that may not be the case. So unless enough records are deleted to make the CA eligible for reclaim, some of that extra space may end up not being utilized unless the file is reorganized. Another factor to reorganize is a change in some file characteristic, or perhaps the need to change some parameter to better utilize space.
In the benchmark test case reductions in both DASD space used and allocated were realized as a result of the reorganization, so there was a benefit to performing the reorganization.
For IAM data sets, the primary factor for reorganization is the size of the record based overflow index. The disk space used for overflow is very efficient, particularly with the record based overflow area, with space being immediately available as records are deleted. So space reduction is seldom the reason to do a reorganization. The issue is that increase in overflow utilization will result in increased virtual storage usage, and may also impact performance. This was definitely the case in the benchmark test case with the large virtual storage requirements for the overflow index. So a reorganization was needed so that all of the records in overflow would be moved into the prime area with more efficient block level index. An IAM file can experience a small space increase with a reorganization due to the growth of the prime index area stored in the data set. The IAMDRCC and IAMSRORG programs were developed to provide for faster reorganization times while reducing data set unavailability during the reorganization.
There is also the Prime Related Overflow feature of IAM which offers an area indexed by block rather than record and is managed somewhat similar to a VSAM CI split. This reduces the size of the overflow index thereby decreasing the frequency of reorganizations again to minimize the need for the data set to be unavailable.
IAM Dynamic Reorg and IAMSRORG are another step in enhancing the availability of your IAM data.