tw_cluster_control

The tw_cluster_control utility enables you to perform the following operations:

Review the status of all the machines across the cluster.
Stop the services across the cluster.
Restart the services across the cluster.
Remove all failed machines from a cluster.
Revert a cluster member into a standalone machine.
Unlock the system when it is locked due to a cluster manager operation failure.
Change the coordinator when the UI is inaccessible.

To use the utility, type the following command at the $TIDEWAY/bin/ directory on a member of the cluster you need to control:

tw_cluster_control [options]

where options are any of the options described in the following table and the common command line options described in Using-command-line-utilities.

User examples

In the following examples, you can stop and restart the services across the cluster, and launch troubleshooting operations if the cluster members are not accessible or are locked by BMC Discovery.

Review the status of the cluster members

You can request information about the current status of all machines in the cluster using the following command:

$TIDEWAY/bin/tw_cluster_control --show-members

Following are examples of the cluster status details. Click here to expand...

Cluster status example for a totally healthy cluster

This example contains the status information for a cluster where all members operate without failures and there are no connectivity issues.

Cluster UUID : d5933b313a3ef13dfe647f00000104a7
Cluster Name : ADDMCluster
Cluster Alias :
Number of Members : 3

UUID : d5933b313a3ef13de8027f00000104a7
Name : ADDMCluster-01
Address : 10.49.16.61
Health : MEMBER_HEALTH_OK
State : MEMBER_STATE_NORMAL
Coordinator : Yes
Last Contact : Thu Nov 28 10:28:20 2013
CPU Type : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
Processors : 1
Memory : 384M
Swap : 8192M
Free Space : /usr 4955M/8701M (44%)

UUID : 5de1a3313a3f03a67d627f00000104a8
Name : ADDMCluster-02
Address : 10.49.17.64
Health : MEMBER_HEALTH_OK
State : MEMBER_STATE_NORMAL
Coordinator : No
Last Contact : Thu Nov 28 10:28:20 2013
CPU Type : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
Processors : 1
Memory : 384M
Swap : 8192M
Free Space : /usr 5168M/8701M (41%)

UUID : 2029c0313a3ef7b8a1fc7f00000104a5
Name : ADDMCluster-03
Address : 10.49.17.67
Health : MEMBER_HEALTH_OK
State : MEMBER_STATE_NORMAL
Coordinator : No
Last Contact : Thu Nov 28 10:28:20 2013
CPU Type : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
Processors : 1
Memory : 384M
Swap : 8192M
Free Space : /usr 5170M/8701M (41%)

Cluster status example with errors

This is an example of the cluster health check results for the case when only cluster coordinator is operating normally and other members of the cluster are down and inaccessible.

UUID : d5933b313a3ef13de8027f00000104a7
Name : ADDMCluster-01
Address : 10.49.16.61
Health : MEMBER_HEALTH_OK
State : MEMBER_STATE_NORMAL
Coordinator : Yes
Last Contact : Thu Nov 28 10:24:46 2013
CPU Type : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
Processors : 1
Memory : 384M
Swap : 8192M
Free Space : /usr 4955M/8701M (44%)

UUID : 5de1a3313a3f03a67d627f00000104a8
Name : ADDMCluster-02
Address : 10.49.17.64
Health : MEMBER_HEALTH_ERROR Communication failure
State : MEMBER_STATE_NORMAL
Coordinator : No
Last Contact : None
CPU Type :
Processors : 0
Memory : 0M
Swap : 0M

UUID : 2029c0313a3ef7b8a1fc7f00000104a5
Name : ADDMCluster-03
Address : 10.49.17.67
Health : MEMBER_HEALTH_ERROR Communication failure
State : MEMBER_STATE_NORMAL
Coordinator : No
Last Contact : None
CPU Type :
Processors : 0
Memory : 0M
Swap : 0M

Restarting the services across the cluster

$TIDEWAY/bin/tw_cluster_control --cluster-stop-services
Password:
Stopping services across the cluster: 'User initiated shutdown'
    Stopping Application Server service:                   [ OK ]
    Stopping Reports service:                              [ OK ]
    ...
    Stopping Security service:                             [ OK ]
Services stopped
$TIDEWAY/bin/tw_cluster_control --cluster-start-services
Starting services across the cluster
    Starting Security service:                             [ OK ]
    Starting Model service:                                [ OK ]
    ...
    Updating baseline:                                     [ OK ]
Services started

Stopping the services across the cluster

$TIDEWAY/bin/tw_cluster_control --cluster-stop-services
--cluster-stop-message="Machine is not responding"

Unlock the system when it is locked due to a cluster manager operation failure

Some cluster management operations might acquire the system lock. If the operation is interrupted while the system is in a locked state, you might need to run the following command to unlock it:

$TIDEWAY/bin/tw_cluster_control --fix-interrupted

Running the command unlocks only the machine that was affected by the interrupted operation. The interactive command line tool informs you if any additional intervention is required before you can run the tw_cluster_control operations again for that machine.

Further examples

For further examples using tw_cluster_control to troubleshoot cluster problems, see troubleshooting clusters.