App Visibility Manager high-availability deployment
With App Visibility Manager, you can configure a high-availability (HA) environment to minimize downtime when you monitor your mission-critical applications. HA is a redundancy operation that eliminates a single point of failure by automatically switching to a standby node if the active node fails or is temporarily shut down. App Visibility Manager supports out-of-the box HA to synchronize App Visibility server components. The server components require a load balancer to manage the connection to the active and standby nodes.
App Visibility Manager HA components
A high-availability deployment of App Visibility Manager consists of pairs of server components with identical configurations. HA applies to the App Visibility portal and collector, where each component has an active node and a standby node. Only one of the node pairs can be active at a time. If a component shuts down or otherwise becomes unavailable, the standby node takes over the active role, and all of the processes operate from the standby node. When you restore the previously active node, it becomes the standby node.
The App Visibility HA components automatically manage the synchronization between the active and standby nodes, and they automatically detect a failover situation. When your third-party load balancer sends data to the component, only the active node receives the data and synchronizes the content with the standby node.
When a component becomes inaccessible, HA minimizes downtime for the following processes:
- Portal: data analysis, configuration management, and communication with the Presentation Server
- Collectors: data collection and data storage for monitored applications, end-user data, and synthetic transactions
This topic does not address HA for the proxy, which does not require active and standby nodes.
An App Visibility Manager HA deployment comprises the following systems, each on a separate computer:
- Active node
- Standby node
- Third-party load balancer
The following diagram shows more than one load balancing server, but you can configure one load balancer to manage the failover for all the different components. All communication to the App Visibility server components goes through that component's load balancer. The diagram shows only one App Visibility collector cluster; your system can have many.
HA deployment for App Visibility Manager
In an HA deployment of App Visibility Manager, the load balancer is on a separate computer and redirects requests to the active server. In this way, the load balancer can provide a single point of access to the App Visibility server components.
When you install the App Visibility server, you can enable high availability for the App Visibility portal, App Visibility collector, or both.
To deploy an HA component, install the component twice to create a failover cluster, which is a pair of servers that work together to maintain high availability. When you install a component, enable HA and provide details for its alternate node. Then run the installation utility on the second computer, enable HA, and provide details for its alternate node (that is, the first component). The computers on which you install the component pairs must have the same hardware and operating system configuration.
The first server that joins the failover cluster is the active node, and its pair becomes the standby node.
When you install the App Visibility collector, provide details of a third-party load balancer for the collector pair. The load balancer information is required to establish communication between the App Visibility agents and the collector.
Switching from the active node to the standby node
The following situations cause failover for an App Visibility server component:
- Active portal or collector service is down. The standby component regularly checks the active node and as soon as the active node goes down, the standby node becomes the new active node.
- Active portal or collector database becomes unresponsive for approximately five minutes. After this time, the portal or collector service is stopped, and the standby node becomes the new active node.
The failover processes might take a few minutes until data collection normalizes again. A few minutes of data might be inaccurate or incomplete.