tw_cluster_control

The tw_cluster_control utility enables you to perform the following operations:

Review the status of all the machines across the cluster.
Stop the services across the cluster.
Restart the services across the cluster.
Remove all failed machines from a cluster.
Revert a cluster member into a standalone machine.
Unlock the system when it is locked due to a cluster manager operation failure.
Change the coordinator when the UI is inaccessible.

To use the utility, type the following command at the $TIDEWAY/bin/ directory on a member of the cluster you need to control:

tw_cluster_control [options]

where options are any of the options described in the following table and the common command line options described in Using command line utilities.

Command Line Option	Description
`--become-coordinator`	Make this machine the coordinator.
`--cluster-start-services`	Start the services on all machines across the cluster. This command does not restart the services, to do so, you must use `--cluster-stop-services` and then `--cluster-start-services.`
`--cluster-stop-message=MSG`	Message giving the reason for stopping the services across the cluster. Used in conjunction with `--cluster-stop-services`.
`--cluster-stop-services`	Stop the services on all machines across the cluster. This option prompts for the password of the system user.
`--fix-interrupted`	Unlock the system when it is locked due to a cluster manager operation failure.
`--force`	Do not ask for confirmation for any of the options.
`--replace-vm-uuid`	Replaces a cluster member's VM UUID if it has changed, preventing the cluster from starting. Use this if the cluster does not start and logs the following critical message: Clustered machine: VM UUID has changed Replace expected VM UUID by running: tw_cluster_control --replace-vm-uuid in tw_svc_cluster_manager.log. See Troubleshooting clusters for more information.
`--remove-broken`	Remove all failed machines from a cluster. You should use this if you are unable to forcibly remove one or more failed machines using the UI. This option prompts for the password of the system user.
`--revert-to-standalone`	Revert the local failed cluster member into a standalone machine. You should only use this after removing a failed machine from the cluster using the `--remove-broken` option. This option prompts for the password of the system user.
`--show-members`	Show the status of all the machines across the cluster.
`--show-pending`	Show any pending changes in the cluster.

User examples

In the following examples, you can stop and restart the services across the cluster, and launch troubleshooting operations if the cluster members are not accessible or are locked by BMC Discovery.

Review the status of the cluster members

You can request information about the current status of all machines in the cluster using the following command:

$TIDEWAY/bin/tw_cluster_control --show-members

Following are examples of the cluster status details. Click here to expand...

Cluster status example for a totally healthy cluster

This example contains the status information for a cluster where all members operate without failures and there are no connectivity issues.

 
Cluster UUID : d5933b313a3ef13dfe647f00000104a7 
Cluster Name : ADDMCluster 
Cluster Alias : 
Number of Members : 3 

UUID : d5933b313a3ef13de8027f00000104a7 
Name : ADDMCluster-01 
Address : 10.49.16.61 
Health : MEMBER_HEALTH_OK 
State : MEMBER_STATE_NORMAL 
Coordinator : Yes 
Last Contact : Thu Nov 28 10:28:20 2013 
CPU Type : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 
Processors : 1 
Memory : 384M 
Swap : 8192M 
Free Space : /usr 4955M/8701M (44%) 

UUID : 5de1a3313a3f03a67d627f00000104a8 
Name : ADDMCluster-02 
Address : 10.49.17.64 
Health : MEMBER_HEALTH_OK 
State : MEMBER_STATE_NORMAL 
Coordinator : No 
Last Contact : Thu Nov 28 10:28:20 2013 
CPU Type : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 
Processors : 1 
Memory : 384M 
Swap : 8192M 
Free Space : /usr 5168M/8701M (41%) 

UUID : 2029c0313a3ef7b8a1fc7f00000104a5 
Name : ADDMCluster-03 
Address : 10.49.17.67 
Health : MEMBER_HEALTH_OK 
State : MEMBER_STATE_NORMAL 
Coordinator : No 
Last Contact : Thu Nov 28 10:28:20 2013 
CPU Type : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 
Processors : 1 
Memory : 384M 
Swap : 8192M 
Free Space : /usr 5170M/8701M (41%)

Cluster status example with errors

This is an example of the cluster health check results for the case when only cluster coordinator is operating normally and other members of the cluster are down and inaccessible.

  
UUID : d5933b313a3ef13de8027f00000104a7 
Name : ADDMCluster-01 
Address : 10.49.16.61 
Health : MEMBER_HEALTH_OK 
State : MEMBER_STATE_NORMAL 
Coordinator : Yes 
Last Contact : Thu Nov 28 10:24:46 2013 
CPU Type : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 
Processors : 1 
Memory : 384M 
Swap : 8192M 
Free Space : /usr 4955M/8701M (44%) 

UUID : 5de1a3313a3f03a67d627f00000104a8 
Name : ADDMCluster-02 
Address : 10.49.17.64 
Health : MEMBER_HEALTH_ERROR Communication failure 
State : MEMBER_STATE_NORMAL 
Coordinator : No 
Last Contact : None 
CPU Type : 
Processors : 0 
Memory : 0M 
Swap : 0M 

UUID : 2029c0313a3ef7b8a1fc7f00000104a5 
Name : ADDMCluster-03 
Address : 10.49.17.67 
Health : MEMBER_HEALTH_ERROR Communication failure 
State : MEMBER_STATE_NORMAL 
Coordinator : No 
Last Contact : None 
CPU Type : 
Processors : 0 
Memory : 0M 
Swap : 0M

Restarting the services across the cluster

$TIDEWAY/bin/tw_cluster_control --cluster-stop-services 
Password:
Stopping services across the cluster: 'User initiated shutdown'
    Stopping Application Server service:                   [  OK  ]
    Stopping Reports service:                              [  OK  ]
    ...
    Stopping Security service:                             [  OK  ]
Services stopped
$TIDEWAY/bin/tw_cluster_control --cluster-start-services
Starting services across the cluster
    Starting Security service:                             [  OK  ]
    Starting Model service:                                [  OK  ]
    ...                         
    Updating baseline:                                     [  OK  ]
Services started

Stopping the services across the cluster

$TIDEWAY/bin/tw_cluster_control --cluster-stop-services 
--cluster-stop-message="Machine is not responding"

Unlock the system when it is locked due to a cluster manager operation failure

Some cluster management operations might acquire the system lock. If the operation is interrupted while the system is in a locked state, you might need to run the following command to unlock it:

$TIDEWAY/bin/tw_cluster_control --fix-interrupted

Running the command unlocks only the machine that was affected by the interrupted operation. The interactive command line tool informs you if any additional intervention is required before you can run the tw_cluster_control operations again for that machine.

Further examples

For further examples using tw_cluster_control to troubleshoot cluster problems, see troubleshooting clusters.