Quick start for observability and monitoring with BMC Helix AIOps
BMC Helix AIOps uses AI/ML to enhance IT operations. By using BMC Helix AIOps, you can easily build service models within minutes by using prebuilt templates, visualize the data collected from BMC and other third-party sources, and quickly start monitoring service health. Powered by AI/ML and BMC HelixGPT, it identifies the root cause of the problems faster, provides the best action recommendations, and reduces event noise and the time needed to resolve issues.
Watch the following video (2:04) to understand how to quickly start observability and monitoring with BMC Helix AIOps:
Scenario
The IT Administrator at Apex Global creates the Ticketing App service, and the NOC operator continuously observes it to ensure that the service health is reliable.
This Ticketing App service relies on the Kubernetes infrastructure. The Kubernetes infrastructure starts with a ticketing namespace and includes several microservices, such as admin, notification, and booking services. The overall health of the service relies on the stability and performance of the Kubernetes infrastructure service.
To start observing and monitoring the Ticketing App, the following roles and responsibilities apply:
Process overview
To quickly start monitoring the Ticketing App with BMC Helix AIOps, use this simplified five-step workflow.
1. Collect data for observability and monitoring
The IT administrators with tenant administrator permissions configure connectors to collect observability data from different BMC solutions and third-party sources. These connectors are configured in BMC Helix Intelligent Integrations for ingesting data to support the complete observability and monitoring features across BMC Helix solutions.
For more information, see Ingest third-party events and topology data in the BMC Helix Intelligent Integrations documentation.
2. Organize, reconcile, and correlate data to create a service model
The IT administrators with tenant administrator permissions use the collected data to better organize ITOps activities by reconciling topologies, building service models with out-of-the-box blueprints, and correlating events.
- Import and enable default blueprints:
- Import the default blueprints from the Manage Service Blueprints page by using the Import BMC Default Blueprints option.
- Enable the required blueprints from the list of imported BMC default blueprints.
For example, enable the Default Blueprint for Kubernetes Infrastructure, which includes Kubernetes Namespace, v2 in the description.
For more information, see Using-out-of-the-box-blueprints.
- Create the Ticketing App service by using the Kubernetes blueprint:
- Create the Ticketing App service by using the Create Service option from the Services page.
- Specify the required service definitions and add the Default Blueprint for Kubernetes Infrastructure with the ticketing namespace as the dynamic composition.
- Preview the blueprint configuration to confirm the structural validity and save it.
For more information, see Defining-a-service.
- Configure health indicators with alarm policies:
- Identify and define the health indicator metrics to be monitored; for example, monitor the performance of the following metrics:
- CPU > Length of run queue
- CPU > Time waiting for I/O
- Memory > Percent of free memory
Set up alarm policies for the selected metrics in BMC Helix Operations Management and enable them.
The alerts are triggered based on the defined thresholds and conditions in these policies.
For more information, see Adding-health-indicators and Configuring alarm policies. in the BMC Helix Operations Management documentation.
- Identify and define the health indicator metrics to be monitored; for example, monitor the performance of the following metrics:
- (Optional) Configure additional definitions:
- Add event rules.
For more information, see Adding-event-rules. - Add balancing profiles.
For more information, see Adding-balancing-profiles. - Customize the health score and status.
For more information, see Customizing-health-score-and-health-status.
- Add event rules.
- (Optional) Repeat the above steps to add additional child services as required.
- Save and review the Ticketing App service.
3. Review and understand the service health
The NOC operators with operator permissions analyze service health to understand service issues, identify root causes, and take actions based on causal events and situations.
- Navigate to the Services page and search for Ticketing App service to assess health impact.
For more information, see Monitoring service health. - Open the Ticketing App service, and from the service details page, investigate root causes and impact information.
- View the Analyze Root Cause section to understand the problem's root cause and take action.
For more information, see Performing causal analysis of impacted services. - Analyze the impact patterns of health indicators from the View Health Indicators section.
For more information, see Monitoring service health indicators. Review and investigate the top correlated situations from the Analyze Situations section.
For more information, see Analyzing situations for a service.
4. Remediate issues and reduce event noise
The IT administrators with tenant administrator permissions create automation policies in BMC Helix Intelligent Automation that contain actions for remediating open service issues. The NOC operators run these actions against the open service issues in BMC Helix AIOps to fix them.
- Based on your requirements, create one or more automation policies in BMC Helix Intelligent Automation, and enable them.
For more information, see Creating-automation-policies. Run the automation in the Analyze Root Cause section to remediate the impact and check the policy run status.
For more information, see Running an existing automation.
5. Visualize the service health and make informed decisions
Decision makers use BMC Helix Dashboards to visualize the health of business services and make informed, data-driven decisions. These dashboards offer a consolidated view of the health of all services in your environment.
View the Service Dashboard.
The decision makers can use the out-of-the-box Service Dashboard in BMC Helix Dashboards to visualize and analyze service health quickly. For more information, see Service Dashboard in the BMC Helix Dashboards documentation.(Optional) Create custom dashboards.
In addition, to track various KPIs and view metrics selected as part of a service, you can create custom dashboards. For more information, see Creating and customizing dashboards in the BMC Helix Dashboards documentation.
You are now prepared to observe, monitor, and remediate health issues in your Ticketing App service, ensuring improved performance, reliability, and stability of the service.
Learn more