Detect problems using the automated self health monitoring capability

Mix Technologies is a large enterprise company in the Silicon space. It has the following deployment:

  • 5000 servers in the IT infrastructure
  • 1500 servers in a virtual environment using VMware
  • 500 servers in a public cloud environment

Mix Technologies monitors its network devices using events through SNMP. It also uses deep dive network topology tools. The rest of the application infrastructure and servers are monitored using application performance and traditional monitoring tools.The help desk personnel and application owners are responsible for monitoring and managing the servers in the private cloud as well. 

Roles required

There are many user roles involved in the deployment, operation, and management of Infrastructure Management. Your company may employ the roles as described below, consolidate them into fewer roles, or divide them into roles with more granular responsibilities and may have other titles for these roles.

The following role is required to complete this use case:

  • Roger - Distributed Service Operations User

Roger handles the following responsibilities:

  • Maintaining the ongoing performance and availability of production systems with a focus on server infrastructure
  • Performing administrative functions on servers and monitoring tools
  • Monitoring the performance and solving availability, performance, and capacity problems

Solving a data collection problem

One of the two Integration Services in an Integration Service cluster goes down. As a result of this, a KM loaded on the Integrated Service cannot collect data, which in turn leads to no data being sent to the BMC TrueSight Infrastructure Management Server. Hence, no event will be generated if a problem occurs.

Roger needs to be notified immediately if there are problems with the data collection because he has got internal SLAs on availability.  Without monitoring the SLAs, he is unable to create compliance reports.

Roger can use the self health monitoring feature to solve this problem.

Whenever a KM has a problem collecting data, or a connection between the Integration Service and the BMC TrueSight Infrastructure Management Server is down, Infrastructure Management automatically generates an event that indicates the Integration Service system on which the KM is installed and from which the data collection problem emanated. To solve the problem, Roger can:

Related topics

Viewing the health of the Infrastructure Management components from the operator console

Was this page helpful? Yes No Submitting... Thank you

Comments