ANALYZE
LOADPLUS estimates the size for the following data sets:
- Load (SORTOUT)
- Work (SYSUT1)
- Sort work (SORTWK)
- Discard (SYSDISC)
- Error (SYSERR)
- Image copy (BMCCPY, BMCCPZ, BMCRCY, and BMCRCZ)
You must either enable automated file size estimation (AUTOENUMROWS=YES or ENUMROWS AUTO), or specify a numeric value for the ENUMROWS option to allow ANALYZE processing to estimate the required space. If you specify ANALYZE PAUSE or ANALYZE ONLY without an ENUMROWS specification, LOADPLUS terminates. For more information, see ENUMROWS.
LOADPLUS provides estimates in both kilobytes and cylinders by device type. LOADPLUS calculates the data set size based on two values: cardinality (the number of rows) and average row length. To obtain the cardinality, LOADPLUS either samples or scans (depending on what you specify) one index per participating table. LOADPLUS determines the average row length from information in the Db2 catalog. The following table provides additional information.
LOADPLUS scans or samples all data sets of a multi-data-set index. LOADPLUS multitasks its analysis, scanning, or sampling of one data set per task.
How ANALYZE options determine cardinality and average row length estimation
ANALYZE keywords | How LOADPLUS determines cardinality | How LOADPLUS determines average row length |
---|---|---|
(ANALYZE not specified) | If you enable dynamic allocation, LOADPLUS defaults to ANALYZE and issues the message BMC51424I. If you specify LOAD RESUME YES and ENUMROWS AUTO is in effect, LOADPLUS defaults to ANALYZE. Otherwise, LOADPLUS does not perform analysis. | |
ANALYZE ANALYZE PAUSE ANALYZE ONLY | If running LOAD RESUME YES INDEX BUILD (with the exceptions noted in this topic), LOADPLUS decides whether to sample or scan the index leaf pages. Otherwise, LOADPLUS does not determine cardinality but uses the number of new rows to determine work data set size. | LOADPLUS obtains this information from the Db2 catalog, basing it on half the length of any VARCHAR columns in the table. LOADPLUS does not consider compression in the calculations. |
ANALYZE SAMPLE ANALYZE PAUSE SAMPLE ANALYZE ONLY SAMPLE | If running LOAD RESUME YES INDEX BUILD (with the exceptions noted in this topic), LOADPLUS samples the index space. Otherwise, LOADPLUS does not determine cardinality but uses the number of new rows to determine work data set size. | LOADPLUS obtains this information from the Db2 catalog, basing it on half the length of any VARCHAR columns in the table. LOADPLUS does not consider compression in the calculations. |
ANALYZE SCAN ANALYZE PAUSE SCAN ANALYZE ONLY SCAN | If running LOAD RESUME YES INDEX BUILD (with the exceptions noted in this topic), LOADPLUS scans the most appropriate index. Otherwise, LOADPLUS does not determine cardinality but uses the number of new rows to determine work data set size. | LOADPLUS obtains this information from the Db2 catalog, basing it on half the length of any VARCHAR columns in the table. LOADPLUS does not consider compression in the calculations. |
Restriction
This option is not valid when LOADPLUS invokes DSNUTILB. LOADPLUS issues message BMC50178E and terminates.
Additional considerations
Note the following considerations when specifying an ANALYZE option:
- LOADPLUS determines cardinality only when running LOAD RESUME YES INDEX BUILD (because the indexes will be built by using existing keys). LOADPLUS does not determine cardinality in the following cases:
- If you specify LOAD RESUME NO or LOAD REPLACE
- If you specify LOAD RESUME YES INTO TABLE tableName REPLACE
- If you specify LOAD RESUME YES INDEX UPDATE
- If you specify LOAD RESUME YES PART n REPLACE and no nonpartitioned indexes are participating in the load
- If you are running an SQLAPPLY load
- For partitions or tables that are not part of the load, unless they have keys in a nonpartitioned index that the load will build
- During the ANALYZE phase, LOADPLUS does not stop the index space and associated table space that are participating in the load. See ANALYZE-phase for performance considerations when using this option.
- When either of the following conditions exists, LOADPLUS determines whether to use sampling or scanning. LOADPLUS performs a full analysis and continues processing, using the information from the ANALYZE phase to dynamically allocate work files.
- You specify ANALYZE without a value.
- You do not specify ANALYZE, but you do enable dynamic data set allocation.
PAUSE
If you specify ANALYZE PAUSE, LOADPLUS generates a report and ends the processing after the ANALYZE phase completes.
You can use the output of the ANALYZE phase to specify the number and allocations of the work data sets. LOADPLUS leaves the utility ID active so that you can restart the load at the next phase by specifying the RESTART or RESTART(PHASE) parameter on your EXEC statement. You do not need to change the ANALYZE option on the LOAD command. When restarting an ANALYZE PAUSE job, LOADPLUS ignores the PAUSE specification. For an example of a job that uses this option, see Example 8 — LOAD REPLACE with ANALYZE PAUSE.
ONLY
If you specify ANALYZE ONLY, LOADPLUS generates a report and terminates after the ANALYZE phase. You cannot restart the load job. However, you can use the output of the ANALYZE phase to specify the number and allocations of the work data sets for subsequent load jobs.
SAMPLE
SAMPLE tells LOADPLUS to determine the estimated cardinality by reading a subset of pages from the most appropriate index. LOADPLUS determines the most appropriate index based on key length and the number of data sets in the index.
SCAN
SCAN tells LOADPLUS to determine the exact cardinality by reading every leaf page in the most appropriate index. LOADPLUS determines the most appropriate index based on key length and the number of data sets in the index.