Quick start for observability and monitoring with BMC Helix AIOps


BMC Helix AIOps uses AI/ML to enhance IT operations. By using BMC Helix AIOps, you can easily build service models within minutes by using prebuilt templates, visualize the data collected from BMC and other third-party sources, and quickly start monitoring service health. Powered by AI/ML and BMC HelixGPT, it identifies the root cause of the problems faster, provides the best action recommendations, and reduces event noise and the time needed to resolve issues.

Watch the following video (2:04) to understand how to quickly start observability and monitoring with BMC Helix AIOps:

icon_play.png Watch the YouTube video to understand how to quickly start observability and monitoring with BMC Helix AIOps

Scenario

The IT Administrator at Apex Global creates the Ticketing App service, and the NOC operator continuously observes it to ensure that the service health is reliable.

This Ticketing App service relies on the Kubernetes infrastructure. The Kubernetes infrastructure starts with a ticketing namespace and includes several microservices, such as admin, notification, and booking services. The overall health of the service relies on the stability and performance of the Kubernetes infrastructure service.

To start observing and monitoring the Ticketing App, the following roles and responsibilities apply:

BMC Helix AIOPs_Persona_Workflow.png

Process overview

To quickly start monitoring the Ticketing App with BMC Helix AIOps, use this simplified five-step workflow.

qsg_workflow_steps.png

 

1. Collect data for observability and monitoring

The IT administrators with tenant administrator permissions configure connectors to collect observability data from different BMC solutions and third-party sources. These connectors are configured in BMC Helix Intelligent Integrations for ingesting data to support the complete observability and monitoring features across BMC Helix solutions.

For more information, see Ingest third-party events and topology data in the BMC Helix Intelligent Integrations documentation.
 

bhii_topic_image.gif

 

2. Organize, reconcile, and correlate data to create a service model

The IT administrators with tenant administrator permissions use the collected data to better organize ITOps activities by reconciling topologies, building service models with out-of-the-box blueprints, and correlating events. 

  1. Import and enable default blueprints: 
    1. Import the default blueprints from the Manage Service Blueprints page by using the Import BMC Default Blueprints option.
    2. Enable the required blueprints from the list of imported BMC default blueprints.
      For example, enable the Default Blueprint for Kubernetes Infrastructure, which includes Kubernetes Namespace, v2 in the description.
      For more information, see Using-out-of-the-box-blueprints.
  2. Create the Ticketing App service by using the Kubernetes blueprint:
    1. Create the Ticketing App service by using the Create Service option from the Services page.
    2. Specify the required service definitions and add the Default Blueprint for Kubernetes Infrastructure with the ticketing namespace as the dynamic composition.
    3. Preview the blueprint configuration to confirm the structural validity and save it.
      For more information, see Defining-a-service.
  3. Configure health indicators with alarm policies:
    1. Identify and define the health indicator metrics to be monitored; for example, monitor the performance of the following metrics:
      • CPU > Length of run queue
      • CPU > Time waiting for I/O
      • Memory > Percent of free memory
    2. Set up alarm policies for the selected metrics in BMC Helix Operations Management and enable them.
      The alerts are triggered based on the defined thresholds and conditions in these policies.
      For more information, see Adding-health-indicators and  Configuring alarm policies. in the BMC Helix Operations Management documentation.

  4. (Optional) Configure additional definitions:
    1. Add event rules. 
      For more information, see Adding-event-rules.
    2. Add balancing profiles.
      For more information, see Adding-balancing-profiles.
    3. Customize the health score and status.
      For more information, see Customizing-health-score-and-health-status.
  5. (Optional) Repeat the above steps to add additional child services as required.
  6. Save and review the Ticketing App service.

    create_service_model.gif

 

3. Review and understand the service health

The NOC operators with operator permissions analyze service health to understand service issues, identify root causes, and take actions based on causal events and situations.

  1. Navigate to the Services page and search for Ticketing App service to assess health impact.
    For more information, see Monitoring service health.
  2. Open the Ticketing App service, and from the service details page, investigate root causes and impact information.
  3. View the Analyze Root Cause section to understand the problem's root cause and take action.
    For more information, see Performing causal analysis of impacted services.
  4. Analyze the impact patterns of health indicators from the View Health Indicators section.
    For more information, see Monitoring service health indicators.
  5. Review and investigate the top correlated situations from the Analyze Situations section.
    For more information, see Analyzing situations for a service.

    Best Action Recommendations

    If you have configured BMC HelixGPT in your BMC Helix IT Operations Management environment, the Best Action Recommendations are displayed as part of the Situation analysis. To implement BMC HelixGPT, contact BMC Support.

    review.gif

 

4. Remediate issues and reduce event noise

The IT administrators with tenant administrator permissions create automation policies in BMC Helix Intelligent Automation that contain actions for remediating open service issues. The NOC operators run these actions against the open service issues in BMC Helix AIOps to fix them. 

  1. Based on your requirements, create one or more automation policies in BMC Helix Intelligent Automation, and enable them.
    For more information, see Creating-automation-policies.
  2. Run the automation in the Analyze Root Cause section to remediate the impact and check the policy run status. 

    Important

    If the automation does not exist, request the creation of an automation policy.

    For more information, see Running an existing automation
    create_run_automation_policies.gif

 

 

5. Visualize the service health and make informed decisions

Decision makers use BMC Helix Dashboards to visualize the health of business services and make informed, data-driven decisions. These dashboards offer a consolidated view of the health of all services in your environment.

  1. View the Service Dashboard.
    The decision makers can use the out-of-the-box Service Dashboard in BMC Helix Dashboards to visualize and analyze service health quickly. For more information, see Service Dashboard in the BMC Helix Dashboards documentation.

  2. (Optional) Create custom dashboards.
    In addition, to track various KPIs and view metrics selected as part of a service, you can create custom dashboards. For more information, see Creating and customizing dashboards in the BMC Helix Dashboards documentation.

service_dashboard.png

You are now prepared to observe, monitor, and remediate health issues in your Ticketing App service, ensuring improved performance, reliability, and stability of the service.

 

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*