Health Dashboard


This section describes the Health Dashboard and its components.

Viewing the Dashboard

The homepage of the Health Dashboard provides summary information about the health of your

Some content is unavailable due to permissions.

environment. Only administrators have access to this dashboard.

Note

The values shown on the dashboard represent the information that was available at the last data collection. The values may have changed since that time.

Colors indicate the health of the 

Some content is unavailable due to permissions.

environment. Red indicates that health or performance is not as expected (may be critical) and you may need to take immediate action.

This page provides the following information about your 

Some content is unavailable due to permissions.

environment:

Area

Description

Grid Summary

This area displays the host servers in your

Some content is unavailable due to permissions.

environment and provides summary information about the hosts.

Note

This data includes the hosts and peers listed in the gridhosts file that you configured after the installation (see Configuring-your-environment-to-use-TrueSight-Orchestration-dashboards) and running in the environment. If a peer is not listed in the gridhosts file and is not running in the environment, it will not be listed in this area.

Grid Environment

This area reports on the state of the databases configured in your environment, the repository and authentication servers.

Adapters

This area displays information about the adapters in your 

Some content is unavailable due to permissions.

environment.Fault indicates that the adapter is not running as expected. You should investigate the problem. You should not have any adapters in a fault state.

For information about administering adapters in

Some content is unavailable due to permissions.

, see Managing-adapters

Jobs

This area displays information about the jobs in your 

Some content is unavailable due to permissions.

environment. 

This area provides information about the workflow jobs. It reports any jobs that have been running for over an hour. This is considered outside the expected time range. There should be less than one percent of your total jobs taking this long to run.

Some organizations have intentionally implemented long running workflows. If you have implemented these, you should not be concerned if the dashboard indicates a larger number of long running workflows.

For information about developing workflows in

Some content is unavailable due to permissions.

, see Developing-workflows and about administering processes and schedules, see Managing-processes-and-schedules.

Grid Statistics

This area provides the following three main statistics. In each case the Current value should be 0. If it is a different number, you should investigate the problem and take action.

The statistics displayed in this area are the values that were available the last time the data was collected. The values may have changed since that time.

  • Elections indicates that the current master component has changed or if there is a new master, indicating a change in the grid (when Current is greater than 0).
    This value indicates the number of times an election is called on the peer due to exceptional conditions and it represents all election types, including missed heartbeats and multiple master-induced elections.
    For more information about administering components, see Grid-components-and-their-status.
  • Link Failures indicates that there was a link failure between servers, including intermittent link failures (when Current is greater than 0).
    The value is the number of link failures over the last specified time period. The time period is configurable and the default time period is 60 minutes.
  • Peer Disconnects indicates that a server is not running (when Current is greater than 0).
    The value is the number of peer disconnects over the last specified time period. The time period is configurable and the default time period is 60 minutes.
    A peer disconnect is recorded when a component
    • Recognizes link failure.
    • Receives a shutdown message from a peer.
    • Receives a bind message for which a reconnect is needed.

For information about administering servers, see Managing-grids and Managing-peers


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*