This section describes clusters and how to set up and use a clustered BMC Atrium Discovery system. A cluster consists of two or more coordinated BMC Atrium Discovery machines, one of which is in control of the group and is referred to as the coordinator. You can add new machines to a cluster or remove machines from a cluster, without interrupting normal operations. Clusters can be configured with fault tolerance meaning that if a machine fails, it can be removed and replaced, without any consequent loss of data, and without interrupting normal operation.
Do not change the CMDB synchronization configuration at the same time as changing the cluster configuration.
- Changing the CMDB sync configuration means, adding connections, removing connections, and starting or stopping a resync.
- Changing the cluster configuration means adding members, removing members, moving the coordinator, or changing fault tolerance.
Any cluster management and configuration is undertaken from the Cluster Management page of any of the members of a cluster. To access the Cluster Management page:
On the Administration tab, click Cluster Management in the Appliance section.
This example page Cluster Management page is from the coordinator's UI and shows a cluster of three machines where all machines are operating normally.
The cluster management tasks that you can undertake from this page are described in the following topics:
- Creating a cluster
- Adding a machine to an existing cluster
- Changing the machine that is the coordinator
- Changing the address of a machine
- Removing a machine from a cluster
- Reverting a cluster member into a standalone machine
A cluster is not removed, it stops being a cluster when the last member is removed from the coordinator.
Information displayed on the Cluster Management page
This example Cluster Management page shows multiple pending changes. The machine that is leaving the cluster is displayed in Pending Changes as it is waiting to leave, and it is also displayed in Current Members as it has not yet left the cluster. It will only do so when the changes are committed. When the machine is removed, there will be too few members in the cluster to maintain fault tolerance. Consequently, one of the pending changes is disabling fault tolerance.
You can add multiple changes to the Pending Changes section and commit them all at once. Many cluster changes require a rebalance when committed individually, so committing multiple changes at once avoids the need for multiple rebalances.
The Cluster Management page consists of the following basic sections:
- A cluster control section
- Shutdown Cluster button — shuts down all machines in the cluster.
- Reboot Cluster button — reboots all of the machines in the cluster.
- Restart Cluster Services button — restart the services on all machines in the cluster.
- Enable Maintenance Mode button — places the cluster into maintenance mode.
These buttons duplicate the cluster control functionality available on the Control page.
- An overview information section
- Name — the name of the cluster you are viewing.
- Alias — an alias for the cluster, primarily intended for use with load balancers. For example, a cluster has members called member1, member2 and member3. DNS is configured to resolve the name cluster100 to a load balancer. The load balancer is configured to share requests to cluster100 with the following hosts: member1, member2 and member3. In this case, the cluster alias is cluster100.
- Summary — the status of the cluster, whether the cluster is operating normally, or whether any tasks such as rebalancing are in progress, and a measure of the task's progress.
- Coordinator — whether the current machine is the coordinator or not; if not, a link is provided to the Cluster Management page on the coordinator UI.
- Fault Tolerance — whether fault tolerance is enabled or not and a button to enable or disable it depending on the current setting.
- The Current Memberspanel which contains rows with detailed information on the machine or machines currently in the cluster and a mass actions drop down which enables you to remove any selected non-coordinator machine. The mass actions drop down is disabled when the cluster is rebalancing. The information on each cluster member includes:
- Type — Whether the machine is a coordinator or a member.
- Volume — The amount of free disk space available in the /usr partition.
- Health — Whether or not there are any issues with the machine.
- Activity — Whether or not the machine is operating normally.
- Last Contact — When the machine last responded to communication from the coordinator.
Each row contains an Actions drop down list enabling you to:
- Change Address — change the IP address or hostname used to communicate the the machine. You can use this if a machine is assigned a new IP address, or you need to communicate using its hostname.
- Ping — Pings the machine to ensure that it can be contacted. On a successful ping, the information is refreshed.
- Remove — Remove the non-coordinator machine from the cluster. On selecting this, an entry row is placed in the Pending Changes panel. Removing a healthy machine from the cluster will reset that machine to the default configuration.
- Make coordinator — Makes this machine the coordinator. See Changing the machine that is the coordinator for more information.
- Restart Services — Restarts the services on this machine. This option is only available when fault tolerance is enabled.
- Reboot — Reboots this machine. This option is only available when fault tolerance is enabled.
- Shutdown — Shuts down this machine. This option is only available when fault tolerance is enabled.
Additional sections are displayed as appropriate. For example, the Previous Members panel is only displayed when a machine has been removed from the cluster:
- A results banner with the result of the most recent operation.
- A There are pending cluster changes list which includes buttons to Commit or Discard changes.
- A Pending Changes panel which contains rows with detailed information on the machine or machines in the pending changes list. Each row contains an Actions drop down list enabling you to remove the row from the panel.
- A Previous Members panel which contains rows with detailed information on the machine or machines that have been members of the list. Each row contains an Actions drop down list enabling you to remove the row from the panel.