Monitoring reconciliation jobs using the Fail safe feature
The Fail safe feature in the reconciliation engine monitors the progress of all the running jobs (scheduled/ continuous / non continuous). When it comes across a non-responding job, it automatically restarts the job.
For the non-continuous jobs, this feature stops the current run and starts a new run. For continuous jobs, it stops the current run and lets the next run start when the continuous interval of that job has elapsed.
For example, if the job idle time is configured to 60 minutes; the Fail safe feature monitors all the jobs after every 60 minutes. If the Fail safe feature detects a job that is not responding (not processing a CI) for more than 60 minutes, it restarts the non-responding job.
This feature logs all the traces in the arrecond.log file. Locate the log file if you have configured it to reside in a particular directory. Usually, this log file resides in the installation directory. For example, \Program Files\BMC Software\AtriumCore\Logs.
You should consider the following while working with the Fail safe feature:
- The time interval for which a job can remain idle is configurable.
- The configured time interval is in minutes.
- The default job idle time is 60 minutes.
- The job idle time can be specified in the range between 60 to 720 minutes.
To configure the above settings, use the configuration options provided in Server Configuration.