Scenario 1: Why did NITEBAT finish so late?


The job NITEBAT finished well past its scheduled completion time last night.

As a result, activity in several areas of the company has been delayed. It is your job to figure out why this delay happened and, more importantly, to prevent it from happening again.

NITEBAT was supposed to finish at 1:20 A.M. this morning. Your first step is to look at the system as it existed at 1:20 A.M. and begin gathering clues.

To get the NITEBAT information

  1. On the COMMAND line, type the TIMEcommand for window 1 (using the format mm/dd/yyyy):TIME 11/10/YYYY 01:20:00

    Until you specify otherwise, all views displayed in window 1 automatically retrieve data from the historical database for the interval between 1:15 and 1:30 A.M. (the interval containing 1:20).

    You know for certain that NITEBAT experienced considerable delay last night, but you want to determine whether any other workloads were delayed.

  2. On the COMMAND line, type WDELAY.The WDELAY view is displayed, as shown in Figure 1.

    Figure 1. WDELAY view

    DDMMMYYYY  HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
    COMMAND  ===>                                                 SCROLL ===> PAGE
    CURR WIN ===> 1        ALT WIN ===>                                            
    >H1 =WDELAY============SYSE=====*========DDMMMYYYY==HH:MM:SS====CMF======D===34
     C Workload T #AS        Total Delay%   %Dly  %Dly  %Dly  %Dly  %Dly  %Dly  %Dly
     - -------- - ---        0....50...100   CPU   Dev  Stor   ENQ   SRM  Subs  Idle
       BATJOBS  B   4  83.01 ***********     2.5   4.2   1.3  75.0                  
       PGRP0030 P  21  22.76 ***            21.6   1.2   0.1   1.6                  
       ALLSTC   S  84  16.19 **             13.3   3.0   0.1   0.4                  
       ALLWKLDS C  90   1.11                       0.7         0.4                  
       ALLBAT   B   1                                                               
       TEMPCMP  C   8                                                               
       JCBATCH  B                                                                   
       ALLOMVS  O                                                                   
       PAYBAT   B   1                                                               
       PAYROLL  C   6                                                               
       PAYTSO   T   5                                                               
       PGRP0041 P                                                                   
       PGRP0000 P  30

    Scanning the Total Delay% column, you discover that no workload was as critically delayed as BATJOBS, the workload containing NITEBAT. BATJOBS spent 83 percent of the interval waiting for one or more resources. Of the total delay, 75 percent was due to enqueue contention. How much of this delay was experienced by NITEBAT in particular? Were other jobs in BATJOBS affected by enqueue delay as well?

    To answer these questions, you can type JDELAY on the COMMAND line, or you can rely on CMF MONITOR Online predefined hyperlinks to anticipate your information needs. You decide to rely on the predefined hyperlinks.

  3. Position your cursor in the %Dly ENQ field for BATJOBS and press Enter.CMF MONITOR Online hyperlinks to the JDENQ view, as shown in Figure 2, where you can identify the enqueue resource causing the delay and find out why NITEBAT spent so much time contending for it.

    Figure 2. JDENQ view

    DDMMMYYYY  HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
    COMMAND  ===>                                                 SCROLL ===> PAGE
    CURR WIN ===> 1        ALT WIN ===>                                            
    >H1 =JDENQ=============SYSE=====*========DDMMMYYYY==HH:MM:SS====CMF======D====4
    C Waiting  JES Job  T SrvClass   %Delay  %Delay Wait Major    Minor RName    EN
    - Job----- Number   - -------- This Enq All Enq Want QName--- -------------- St
      MV50CAST STC07172 S SLOW                 1.22 Excl SYSZTIOT    = /         En
      MTADOM01 STC07219 S STCNRM               3.26 Excl SYSDSN   LGS1.CNTL      En
      LGS11Q1  STC07177 S SLOW                 3.26 Excl SYSDSN   LGS1.CNTL      En
      NITEBAT  STC07093 S STCNRM              100.0 Excl SYSDSN   SYS.MCS.MCS    En

    The Waiting Job column tells you that NITEBAT is waiting for the logical enqueue resource that is identified by the major name SYSDSN, indicating that the resource is a data set. The minor name, SYS.MCS.MCS, is the name of the data set itself. And if you scrolled to the right by using PF11, you would see from the Owning Job column that a job called DDBBKUP currently owns the resource.

  4. To find out more about this job, position your cursor under Minor RName and press Enter to display the JUENQ view, as shown in Figure 3.

    Figure 3. JUENQ view

    DDMMMYYYY  HH:MM:SS ------ MainView WINDOW INTERFACE (Vv.r.mm) ----------------
    COMMAND  ===>                                                 SCROLL ===> PAGE
    CURR WIN ===> 1        ALT WIN ===>                                            
    >H1 =JDENQ====JUENQ====SYSE=====*========DDMMMYYYY==HH:MM:SS====CMF======D====1
    C Owning   JES Job  T SrvClass %Use Ownr Major    Minor RName    ENQ    Waiting
    - Job----- Number   - --------  ENQ Has- QName--- -------------- Status Job----
      DDBBKUP  STC07093 S STCNRM   97.2 Excl SYSDSN   SYS.MCS.MCS    Ended  NITEBAT

    There is the problem. DDBBKUP has been assigned exclusive (Excl) use of this enqueue resource, holding it for 97 percent during the 1:15 to 1:30 A.M. interval. All other jobs, including NITEBAT, are restricted from this resource until DDBBKUP completes execution.

    Now that you know what caused last night’s delay, you are in position to ensure that it does not happen again. One solution is to reschedule DDBBKUP so that it runs after NITEBAT has been completed (although your site might prefer an alternative method).

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC AMI Ops Monitor for CMF 6.3