Normalization failover for scheduled batch jobs

This topic describes the normalization failover mechanism for scheduled batch jobs. A server group consists of two or more Normalization Engines that share the same database. In a server group environment, all the Normalization Engines have identical installations. For example, all the Normalization Engines have Remedy AR System and BMC CMDB Server.

Normalization Engine ranking

You can view the Normalization Engine ranking from the Remedy AR System Service Failover Ranking form. The rank in the Remedy AR System Service Failover Ranking form is populated by setting the rank of Service Failover in the AR System Server Group Operation Ranking form. The Normalization Engines are assigned Rank 1, Rank 2, and so on, where Rank 1 is the highest.

The failover mechanism in a server group environment is applicable only for scheduled batch jobs. The jobs are always scheduled on the highest ranked Normalization Engine. If the highest ranked Normalization Engine fails, then the normalization jobs on that highest ranked Normalization Engine are cancelled and are rescheduled on the next highest ranking Normalization Engine. When the highest ranked Normalization Engine recovers, the Normalization Engine reschedules the normalization jobs on the highest ranked Normalization Engine and cancels the normalization jobs that are running on the other Normalization Engines.

Normalization Engine failover scenario 1

Consider a scenario where you configure Normalization in a server group environment, and one of the Normalization engines stops functioning. The Normalization engine that is functioning and has the highest rank at that time picks up the normalization job.

The following image displays the Normalization Engine failover mechanism in a server group with three Normalization Engines. If all the Normalization Engines are working, then the Normalization Engine that has the highest rank runs the jobs.

Normalization Engine failover scenario 2

Consider a scenario where you configure Normalization in a server group environment, and the two highest ranked Normalization Engines are down. The Normalization Engine jobs are picked up by the tertiary Normalization Engine that has the highest rank at that time.

ne_failover scenario2.png

You can schedule a job on any Normalization Engine in the server group, but the Normalization Engine that has the highest rank in the group runs the job. For example, if you schedule a job on Normalization Engine 3 in a server group of 1, 2, and 3, but Normalization Engine 1 has the highest rank at that time, then the jobs are run by Normalization Engine 1.

ne_failover scenario.png

Job schedule in a server group environment

In a server group of Normalization Engines 1, 2, and 3 where the ranks of the Normalization Engines are 1, 2, and 3 respectively, assume that you have scheduled a batch job on the highest ranked Normalization Engine 1 to start at 12.00 AM on March 1. At 12.20 AM if Normalization Engine 1 goes down, this batch job is picked up automatically at the next schedule by Normalization Engine 2. This means, that at 12.00 AM on March 2 the batch job is picked up by Normalization Engine 2.

After Normalization Engine 1 goes down on March 1 and before it is picked up by the second highest ranked Normalization Engine on March 2, you can also manually start the job on the second highest ranked Normalization Engine 2.

In the current implementation, there is no failover support for running jobs. This means that if a Normalization Engine goes down, then jobs that are run by that Normalization Engine are not re-started on any other Normalization Engine. The scheduled jobs are rescheduled to run in the next schedule.

Troubleshooting aborted jobs

In the current implementation, a single job can process a data set at a time. If a Normalization Engine that is running jobs goes down, the active Normalization Engine releases the orphaned locks not released by the Normalization Engine that went down. If the orphan locks for a specific data set are not released, then no other job can process that data set. In such a case, open the NE:ExclusiveLock form and delete the entry where the lockable field has the contents DATASET:<dataset-name>, where dataset-name is the name of the locked data set. For example, DATASET:BMC.SCCM.