Defining timeouts for jobs


By defining timeout values for jobs or parts of jobs, you can avoid or resolve issues that occur if a job encounters an unresponsive or unlicensed server. If you do not use this procedure to specify a timeout for a job or job part, a slow-running job can absorb Application Server resources for an extended amount of time, and you may be unable to determine whether a job is hung.

Considerations for setting timeout values

Review the following considerations prior to working with timeout values:

  • You define a timeout period for a job by assigning a value to a job's JOB_TIMEOUT property that specifies a maximum period of time, in minutes, for the job to complete. If the job exceeds this maximum, the system automatically cancels the job.
  • NEW IN 21.02(For advanced deploy jobs) You can configure the DEPLOY_JOB_STAGING_TIMEOUT property for the staging phase of a job to configure the time in minutes for which the staging phase should run before the job is canceled. By default, value of this property is equal to the value of the JOB_TIMEOUT property. If the payloads to be copied need additional time due to their size, you can configure a higher value for the DEPLOY_JOB_STAGING_TIMEOUT property.
  • You define a timeout period for a job part by assigning a value to a job's JOB_PART_TIMEOUT property that specifies a maximum period of time, in minutes, for each job part in the job to complete. If completion of a job part exceeds this maximum, the system automatically cancels that job part along with all other job parts in the same job that are running on the same server. The rest of the job continues.
  • Canceling all job parts on the same server prevents situations where multiple job parts must time out serially on the same unresponsive server. If necessary, you can override this capability on a global basis so only a single job part times out while all other job parts continue to execute. You can also set a value for how long the canceling of a job part should take.
  • To determine an appropriate value for job-level and job part timeouts, you must consider many factors, such as the load on a machine and the contents of each job part. You may want to test by performing multiple iterations on a job to determine appropriate timeout values. For example, if you perform some tests and determine that the processing of a job part never requires more than two minutes, you might set the job part timeout to be five minutes.

To define timeouts for a job

  1. Log in to the TrueSight Server Automation console.
  2. Navigate to a job under the Jobs folder and select the job.
  3. In the Properties view, expand the Extended node to display the list of the job's extended properties.
  4. Define timeout properties by doing any of the following:
    • To add a job-level timeout, click the cell in the Value column for the JOB_TIMEOUT property. Enter a maximum period of time (in minutes) to elapse before the job is automatically canceled.

      Note

      When applying job-level timeouts, be aware of the following issues:

      • Because Deploy Undo jobs cannot be scheduled, job-level timeouts do not apply to Deploy Undo jobs.
      • When a Batch Job runs, the job-level timeout (the JOB_TIMEOUT setting) defined in the Batch Job is applied to the entire run time of all member jobs that are grouped together in the Batch Job. If a member job is set with a longer timeout, then whenever the job is initiated by the Batch Job, the job timeout defined in the Batch Job (the shorter timeout) overrides the job timeout defined in the member job (the longer timeout).
        When a member job times out, the behavior of the Batch Job depends on the execution setting of the member jobs, as follows:
        • Member jobs are set to execute sequentially: If a member job gets cancelled because of a timeout, the Batch Job executes the next job if the overall execution time does not exceed the JOB_TIMEOUT setting for the Batch Job.
        • Member jobs are set to execute in parallel: If a member job gets cancelled because of a timeout, the Batch Job continues to execute until all the member jobs are completed.
    • To add a job part timeout, click the cell in the Value column for the JOB_PART_TIMEOUT property. Enter a maximum period of time (in minutes) that should elapse before a job part is canceled.

      Note

      When applying job part timeouts, be aware of the following issues:

      • When one job part times out, all other job parts in the same job that act on the same server are also canceled. This prevents situations where multiple parts in a job must time out serially on the same unresponsive server. However, you can override this default behavior by setting the PropagateWorkItemTimeout property to false through the Application Server Administration console (the blasadmin utility), as described in To set time-out behavior for work item threads.
      • If cancellation of a job part exceeds the maximum period of time defined, the entire job is canceled. This prevents situations where cancellation of a job is not performing as expected and the act of canceling the job is actually hanging a job. To specify a maximum amount of time for canceling a job part, modify the MaxTimeForCancelToFinish property through the Application Server Administration console (the blasadmin utility), as described in To specify a maximum time for canceling a job part.
      • For a Batch Job, the job part timeout is not relevant at the Batch Job level. Only the job part timeouts defined in the member jobs are taken into account for their respective job parts. The job timeouts (JOB_TIMEOUT settings) for the Batch Job and for the member jobs still apply, of course.
      • For Deploy Jobs, the effect of the JOB_PART_TIMEOUT property depends on the number of asynchronous BLExec tasks performed by the job during commit or simulate phase on a particular server. A Deploy Job might involve more than one asynchronous task on a target server, and the job part timeout is reset for each asynchronous task. As a result, the effective job part timeout might be higher than the value that you set through the JOB_PART_TIMEOUT property. This can occur in the following typical scenarios:
        • When a reboot is configured to occur after deploying each BLPackage item, as the deployment of each BLPackage item involves a separate asynchronous task.
        • When a UNIX target server is rebooted into single-user mode and the connection with the server is temporarily unavailable.
      • For a Deploy Job, the job part timeout does not handle the case of a target server that does not come up after a reboot. Therefore, ensure that in addition to setting the JOB_PART_TIMEOUT property, you set a value also for the JOB_TIMEOUT property or for an Agent Connection timeout.

To define timeouts for a job globally

  1. Select Configuration > Property Dictionary View to open the Property Dictionary.
  2. Select Built-in Property Classes > Job.
  3. Navigate to a job and select the job. This displays that job's properties in the Properties view or display the Properties panel in a wizard.
  4. In the properties list, click in the Value column for JOB_TIMEOUT or JOB_PART_TIMEOUT properties.  For more information, see Setting-values-for-system-object-properties.
  5. Click OK.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*