Performance scenarios
Each scenario in this section opens with a hypothetical performance problem, and then moves through a succession of views until the source of the problem is pinpointed.
Note that these scenarios illustrate only the most common path through CMF MONITOR Online. Depending on your level of expertise, you might choose a different, more sophisticated problem-solving methodology.
Scenario 1: Why did NITEBAT finish so late?
The job NITEBAT finished well past its scheduled completion time last night.
As a result, activity in several areas of the company has been delayed. It is your job to figure out why this delay happened and, more importantly, to prevent it from happening again.
NITEBAT was supposed to finish at 1:20 A.M. this morning. Your first step is to look at the system as it existed at 1:20 A.M. and begin gathering clues.
To get the NITEBAT information
On the COMMAND line, type the TIMEcommand for window 1 (using the format mm/dd/yyyy):TIME 11/10/YYYY 01:20:00
Until you specify otherwise, all views displayed in window 1 automatically retrieve data from the historical database for the interval between 1:15 and 1:30 A.M. (the interval containing 1:20).
You know for certain that NITEBAT experienced considerable delay last night, but you want to determine whether any other workloads were delayed.
On the COMMAND line, type WDELAY.The WDELAY view is displayed, as shown in the following figure
DDMMMYYYY HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
COMMAND ===> SCROLL ===> PAGE
CURR WIN ===> 1 ALT WIN ===>
>H1 =WDELAY============SYSE=====*========DDMMMYYYY==HH:MM:SS====CMF======D===34
C Workload T #AS Total Delay% %Dly %Dly %Dly %Dly %Dly %Dly %Dly
- -------- - --- 0....50...100 CPU Dev Stor ENQ SRM Subs Idle
BATJOBS B 4 83.01 *********** 2.5 4.2 1.3 75.0
PGRP0030 P 21 22.76 *** 21.6 1.2 0.1 1.6
ALLSTC S 84 16.19 ** 13.3 3.0 0.1 0.4
ALLWKLDS C 90 1.11 0.7 0.4
ALLBAT B 1
TEMPCMP C 8
JCBATCH B
ALLOMVS O
PAYBAT B 1
PAYROLL C 6
PAYTSO T 5
PGRP0041 P
PGRP0000 P 30Scanning the Total Delay% column, you discover that no workload was as critically delayed as BATJOBS, the workload containing NITEBAT. BATJOBS spent 83 percent of the interval waiting for one or more resources. Of the total delay, 75 percent was due to enqueue contention. How much of this delay was experienced by NITEBAT in particular? Were other jobs in BATJOBS affected by enqueue delay as well?
To answer these questions, you can type JDELAY on the COMMAND line, or you can rely on CMF MONITOR Online predefined hyperlinks to anticipate your information needs. You decide to rely on the predefined hyperlinks.
Position your cursor in the %Dly ENQ field for BATJOBS and press Enter.CMF MONITOR Online hyperlinks to the JDENQ view, as shown in following figure, where you can identify the enqueue resource causing the delay and find out why NITEBAT spent so much time contending for it.
DDMMMYYYY HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
COMMAND ===> SCROLL ===> PAGE
CURR WIN ===> 1 ALT WIN ===>
>H1 =JDENQ=============SYSE=====*========DDMMMYYYY==HH:MM:SS====CMF======D====4
C Waiting JES Job T SrvClass %Delay %Delay Wait Major Minor RName EN
- Job----- Number - -------- This Enq All Enq Want QName--- -------------- St
MV50CAST STC07172 S SLOW 1.22 Excl SYSZTIOT = / En
MTADOM01 STC07219 S STCNRM 3.26 Excl SYSDSN LGS1.CNTL En
LGS11Q1 STC07177 S SLOW 3.26 Excl SYSDSN LGS1.CNTL En
NITEBAT STC07093 S STCNRM 100.0 Excl SYSDSN SYS.MCS.MCS EnThe Waiting Job column tells you that NITEBAT is waiting for the logical enqueue resource that is identified by the major name SYSDSN, indicating that the resource is a data set. The minor name, SYS.MCS.MCS, is the name of the data set itself. And if you scrolled to the right by using PF11, you would see from the Owning Job column that a job called DDBBKUP currently owns the resource.
To find out more about this job, position your cursor under Minor RName and press Enter to display the JUENQ view, as shown in the following figure.
DDMMMYYYY HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
COMMAND ===> SCROLL ===> PAGE
CURR WIN ===> 1 ALT WIN ===>
>H1 =JDENQ====JUENQ====SYSE=====*========DDMMMYYYY==HH:MM:SS====CMF======D====1
C Owning JES Job T SrvClass %Use Ownr Major Minor RName ENQ Waiting
- Job----- Number - -------- ENQ Has- QName--- -------------- Status Job----
DDBBKUP STC07093 S STCNRM 97.2 Excl SYSDSN SYS.MCS.MCS Ended NITEBATThere is the problem. DDBBKUP has been assigned exclusive (Excl) use of this enqueue resource, holding it for 97 percent during the 1:15 to 1:30 A.M. interval. All other jobs, including NITEBAT, are restricted from this resource until DDBBKUP completes execution.
Now that you know what caused last night’s delay, you are in position to ensure that it does not happen again. One solution is to reschedule DDBBKUP so that it runs after NITEBAT has been completed (although your site might prefer an alternative method).
Scenario 2: Is the problem on another system
As you survey the system, you notice from the WRT view that the workload, TSO1, experienced an extremely high response time during performance period 3—a full 17.43 seconds. Performance period 3 is typically characterized by both heavy computations and heavy I/O. Which one is responsible for the TSO1 delay?
To check if the problem is with another system
To begin your investigation, type WDELAY on the COMMAND line to display an overview of all workload delays, as shown in the following figure.
DDMMMYYYY HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
COMMAND ===> SCROLL ===> CSR
CURR WIN ===> 1 ALT WIN ===>
>W1 =WDELAY============SYSE=====*========DDMMMYYYY==HH:MM:SS====CMF======D===55
C Workload T #AS Total Delay% %Dly %Dly %Dly %Dly %Dly %Dly %Dly
- -------- - --- 0....50...100 CPU Dev Stor ENQ SRM Subs Idle
TSO1 T 3 75.01 ********** 14.77 60.3
BATCH W 2 34.84 **** 3.68 30.37 0.79
BATNRM S 2 34.84 **** 3.68 30.37 0.79
STCNRM S 67 2.99 0.04 0.02 2.93 85.74
STC W 71 2.82 0.03 0.02 2.77 80.91
ALLWKLDS C 194 1.32 0.07 0.31 0.01 0.93 66.80
ALLSTC S 162 1.14 0.04 0.01 1.10 62.84
SYSSTC S 73 0.04 0.04 44.06
SYSTEM S 18 0.04 0.02 0.02 70.26
TSONRM S 28 0.04 0.03 0.01 98.11
TSO W 28 0.04 0.03 0.01 98.11
ALLTSO T 28 0.04 0.03 0.01 97.97
SYSTEM W 91 0.03 0.03 0.00 49.19
CICST1 S 0.00
CICSNRM S 0.00
APPCHOT S 0.00
CICSHOT S 0.00
RMF W 0.00
IMSNRM S 0.00As you can see, workload TSO1 has been experiencing a delay of 75 percent during the current interval, and 60 percent of that delay was due to some type of device. How widespread is the problem--were all of the address spaces in TSO1 delayed?
To find out, open another window by using the VS (vertical split) command.
- On the COMMAND line, type VS, but do not press Enter yet.
- Position your cursor at the %Dly CPU field.
- Press Enter.
- In the CURR WIN field, type 1.
- In the ALT WIN field, type 2.
Hyperlink on the Total Delay% column for workload TSO1.The JDELAY view is displayed in window 2, as shown in the following figure.
DDMMMYYYY HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
COMMAND ===> SCROLL ===> PAGE
CURR WIN ===> 2 ALT WIN ===>
>W1 -WDELAY------------SYSE-----*---- >W2 =JDELAY============SYSE=====*========
C Workload T #AS Total Delay% | C Jobname JES Job T SrvClass Step
- -------- - --- 0....50...100 | - -------- Number - -------- Data
TSO1 T 3 75.01 ********** | USER1 JOB05805 T PGRP0002 NO 93
BATCH W 2 34.84 **** | LGS12 JOB05365 T PGRP0002 NO 12
BATNRM S 2 34.84 **** | DSF1 JOB05809 T PGRP0001 NO
STCNRM S 67 2.99 |
STC W 71 2.82 |
ALLWKLDS C 194 1.32 |
ALLSTC S 162 1.14 |
SYSSTC S 73 0.04 |
SYSTEM S 18 0.04 |
TSONRM S 28 0.04 |
TSO W 28 0.04 |
ALLTSO T 28 0.04 |
SYSTEM W 91 0.03 |
CICST1 S 0.00 |
CICSNRM S 0.00 |
APPCHOT S 0.00 |
CICSHOT S 0.00 |
RMF W 0.00 |
IMSNRM S 0.00 |JDELAY reports the delays experienced by each job in TSO1. As you can see, the job USER1 has been delayed 93.29 percent of the current interval, 92.93 percent of which was spent waiting for a device.
To find out which device is responsible, open another window by using the HS (horizontal split) command:
- On the COMMAND line, type HS and position your cursor about halfway down the screen; press Enter.
- In the CURR WIN field, type 2.
- Press PF11 to scroll to the right to see the JDELAY %Dly DEV field.
- In the ALT WIN field, type 3 to direct the forthcoming view to the new window.
Position the cursor on the JDELAY %Dly DEV field; press Enter.The JDDEV view is displayed in window 3, as shown in the following figure.
DDMMMYYYY HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
COMMAND ===> SCROLL ===> PAGE
CURR WIN ===> 2 ALT WIN ===>
>W1 -WDELAY------------SYSE-----*----- >W2 =JDELAY============SYSE=====*========
C Workload T #AS Total Delay% | C Jobname %Dly %Dly %Dly %Dly %D
- -------- - --- 0....50...100 | - -------- CPU DEV Stor ENQ S
TSO1 T 3 75.01 ********** | USER1 3.85 92.93
BATCH W 2 34.84 **** | LGS12 1.28
BATNRM S 2 34.84 **** | DSF1 1.28
STCNRM S 67 2.99 |
STC W 71 2.82 |
ALLWKLDS C 194 1.32 |
ALLSTC S 162 1.14 |
SYSSTC S 73 0.04 |
SYSTEM S 18 0.04 |
TSONRM S 28 0.04 |
TSO W 28 0.04 |
ALLTSO T 28 0.04 |
SYSTEM W 91 0.03 |
CICST1 S 0.00 |
>W3 -JDDEV-------------SYSE-----*----- |
C Jobname T SrvClass %Dly %Delay %Dly |
- -------- - -------- DASD Volser Tape |
USER1 S SYSTEM 92.93 92.93 |
|JDDEV displays information about jobs delayed because of contention for one or more devices during the interval. In this case, USER1 has a problem due to DASD device delays of 92.93%.
Using the CMF MONITOR hyperlinks that are available on most fields, you can explore any problem to the desired degree of depth. For example, you can hyperlink from VOLSER SYSR2C to DEVINFO, which shows detailed information about the device specified; from there, you can hyperlink to other fields.
You can also use the CONtext command, with an SSI context name if your site has defined one, to see if there is contention for your device on another system. For information about using CONtext and SSI, see Using-CMF-MONITOR-Online-on-multiple-systems.