This section describes clusters and how to set up and use a clustered BMC Discovery system. A cluster consists of two or more coordinated BMC Discovery machines, one of which is in control of the group and is referred to as the coordinator. You can add new machines to a cluster or remove machines from a cluster, without interrupting normal operations. Clusters can be configured with fault tolerance meaning that if a machine fails, it can be removed and replaced, without any consequent loss of data, and without interrupting normal operation.
Do not change the CMDB synchronization configuration at the same time as you change the cluster configuration.
- Changing the CMDB sync configuration means adding or removing connections, and starting or stopping a resync.
- Changing the cluster configuration means adding or removing members, moving the coordinator, or changing fault tolerance.
BMC Discovery 11.1 patch 3
BMC Discovery version 11.1 patch 3 resolves some important defects. As a result of those enhancements, an appliance running 11.1 patch 3 cannot join a cluster running 11.1 patch 2 or earlier.
In normal circumstances, all cluster members have the same version. You can upgrade a cluster running any version from 10.1 onwards to 11.1 patch 3.
There is one obscure circumstance in which you may require a new appliance running 11.1 patch 2. If you have a fault tolerant cluster running 11.1 patch 2 (or an earlier patch) and a member has failed, then your cluster is in a degraded state. In that state, you must replace the failed member before you can upgrade the cluster. You cannot replace the failed member with a new appliance running 11.1 patch 3, so you must use an appliance running 11.1 patch 2. Once the cluster has recovered from the failure, it is possible to upgrade the whole cluster to 11.1 patch 3 as usual.
If you are in this situation, and you do not have an image of 11.1 patch 2 with which to replace the failed cluster member, please contact Customer Support. They will provide a suitable download.
Cluster management and configuration is undertaken from the Cluster Management page of any of the members of a cluster.
The following section provides information of Cluster management:
Accessing the Cluster Management page
- From the main menu, click the Administration Settings icon.
- In the Appliance section, click Cluster Management.
This example Cluster Management page is from the coordinator's UI and shows a cluster of three machines, where all machines are operating normally.
The cluster management tasks that you can undertake from this page are described in the following topics:
- Creating a cluster
- Adding a machine to an existing cluster
- Changing the machine that is the coordinator
- Changing the address of a machine
- Removing a machine from a cluster
- Reverting a cluster member into a standalone machine
A cluster is not removed; it stops being a cluster when the last member is removed from the coordinator.
Information displayed on the Cluster Management page
This example Cluster Management page shows multiple pending changes. The machine that is leaving the cluster is displayed in Pending Changes as it is waiting to leave. It is also displayed in Current Members, because it has not yet left the cluster and will only do so when the changes are committed. When the machine is removed, there will be too few members in the cluster to maintain fault tolerance. Consequently, one of the pending changes is disabling fault tolerance.
You can add multiple changes to the Pending Changes section and commit them simultaneously. Many cluster changes require a rebalance when committed individually, so committing multiple changes simultaneously avoids the need for multiple rebalances.
The Cluster Management page consists of the following basic sections:
- A cluster control section with the following buttons:
- Shutdown Cluster—Shuts down all machines in the cluster.
- Reboot Cluster—Reboots all of the machines in the cluster.
- Restart Cluster Services—Restart the services on all machines in the cluster.
- Enable Maintenance Mode—Places the cluster into maintenance mode.
These buttons duplicate the cluster control functionality available on the Control page.
- An overview information section with the following items:
- Name—The name of the cluster you are viewing.
- Alias—An alias for the cluster, primarily intended for use with load balancers. For example, a cluster has members called member1, member2 and member3. DNS is configured to resolve the name cluster100 to a load balancer. The load balancer is configured to share requests to cluster100 with the following hosts: member1, member2 and member3. In this case, the cluster alias is cluster100.
- Summary—The status of the cluster, whether the cluster is operating normally, or whether any tasks such as rebalancing are in progress, and a measure of the task's progress.
- Coordinator—Whether the current machine is the coordinator or not; if not, a link is provided to the Cluster Management page on the coordinator UI.
- Fault Tolerance—Whether fault tolerance is enabled or not and a button to enable or disable it depending on the current setting.
- The Current Members panel, which contains rows with detailed information about the machine or machines currently in the cluster, and a mass actions drop-down that enables you to remove any selected noncoordinator machine. The mass actions drop-down is disabled when the cluster is rebalancing. Each cluster member includes the following information:
- Type—Whether the machine is a coordinator or a member.
- Volume—The amount of free disk space available in the /usr partition.
- Health—Whether or not there are any issues with the machine.
- Activity—Whether or not the machine is operating normally.
- Last Contact—When the machine last responded to communication from the coordinator.
Each row contains an Actions drop-down list, enabling you to perform the following actions:
- Change Address—Change the IP address or hostname used to communicate the the machine. You can use this if a machine is assigned a new IP address, or you need to communicate using its hostname.
- Ping—Pings the machine to ensure that it can be contacted. On a successful ping, the information is refreshed.
- Remove—Remove the non-coordinator machine from the cluster. On selecting this, an entry row is placed in the Pending Changes panel. Removing a healthy machine from the cluster will reset that machine to the default configuration.
- Make coordinator—Makes this machine the coordinator. See Changing the machine that is the coordinator for more information.
- Restart Services—Restarts the services on this machine. This option is only available when fault tolerance is enabled.
- Reboot—Reboots this machine. This option is only available when fault tolerance is enabled.
- Shutdown—Shuts down this machine. This option is only available when fault tolerance is enabled.
Additional sections are displayed as appropriate (for example, the Previous Members pane is only displayed when a machine has been removed from the cluster):
- Results banner—Contains the result of the most recent operation.
- Pending cluster changes list—Includes buttons to commit or discard changes.
- Pending Changes pane—Contains rows with detailed information about the machine or machines in the pending changes list. Each row contains an Actions drop-down list, enabling you to remove the row from the panel.
- Previous Members pane—Contains rows with detailed information about the machine or machines that have been members of the list. Each row contains an Actions drop-down list, enabling you to remove the row from the panel.