Integrating with Amazon Managed Services for Prometheus to collect events


Amazon Managed Service for Prometheus is a Prometheus-compatible service that monitors and provides alerts on containerized applications and infrastructure at scale.

Configure a connection with Amazon Managed Services for Prometheus (AMP), to view the alert data from AMP. 

You can view the collected data in various BMC Helix applications and derive the following benefits: 

BMC Helix application

Type of data collected or viewed

Benefits

BMC Helix Operations Management

Events ( AMP alerts are received as events) 

Use a centralized event view to monitor and manage events, perform event operations, and filter events. Identify actionable events from a large volume of event data by processing events.

For more information, see Monitoring events and reducing event noise.

BMC Helix AIOps

Situations (created from events)

Improve the mean time to resolve (MTTR) based on the situation-driven workflow.

Lower the mean time to detect or discover (MTTD) and the time required for investigating tickets.

For more information, see Monitoring situations.

  

BMC Helix Intelligent Integrations requirements

  • Choose the BMC Helix Intelligent Integrations on-premises deployment modes and review the corresponding port requirements. For information about various deployment modes and port requirements, see Deployment scenarios.
  • Based on the deployment mode, use the BMC Helix Intelligent Integrations SaaS deployment or the BMC Helix Intelligent Integrations on-premises gateway or both. For more information about the gateway, see Deploying the BMC Helix Intelligent Integrations on-premises gateway.
  • Make sure that the on-premises gateway is able to reach the third-party product on the required port  (default is 443).

In the preceding list, the third-party product refers to AMP.

Task 1: To configure the connection with Amazon Managed Services for Prometheus to collect events

  1. To access BMC Helix Intelligent Integrations, perform one of the following steps depending on your deployment mode:
    • BMC Helix Intelligent Integrations SaaS—Log on to BMC Helix Portal, and click the BMC Helix Intelligent Integrations tile.
    • BMC Helix Intelligent Integrations on-premises gateway—Use the following URL to access BMC Helix
      Intelligent Integrations: https://hostName:portNumber/swpui
  2. On the CONNECTORS tab, clickadd_icon.pngon the SOURCES panel.
  3. Click the AMS for Prometheus Alert Manager tile.
  4. Specify the following details for the source connection:
    1. Specify a unique instance name.

      Success

      Best practice
      We recommend that you specify the instance name in the following format:

      <sourceType>_<sourceControllerServerName>{_<InstanceQualifier>}

      The instance qualifier helps you to distinguish the multiple instances configured from the same source server. For example, you can name your instances as AMP_Host_PROD, AMP_Host_TEST, and so on.

    2. Specify the AMP Alert Manager host name.
    3. Specify the AMP HTTP or HTTPS port number, depending on the connection protocol (the default port number is 443).
    4. Specify the access key ID associated with your AWS account or bucket.
    5. Specify the secret access key associated with your AWS account or bucket.
    6. Specify an AWS region.
    7. Specify the workspace that is created for the storage and querying of AMP events
    8. Click Proxy and specify whether you want to configure a proxy server.
      If yes, specify the host name and port number (default value is 8888).
  5. Click VALIDATE AND CREATE.
    The specified connection details are validated, and the corresponding source connection is created in the Source Connection list.
  6. Select the source connection that you created from the list if it is not selected already.

    Warning

    Important

    The destination host connection is created and configured automatically for each tenant when the source connection is created.

  7. Make sure that you select the options for the required data types to collect data.
  8. Configure the collectors for the selected data types by clicking the respective data type in the Collectors section.
    Click here to view the collection parameters for the selected data types

    Parameter name

     Description

    Collection Schedule

    Select one of the following options to specify the data collection frequency:

    • Duration: When you select this option, data collection happens constantly. Specify the schedule in minutes, hours, or day. 
      Default: 5 minutes
      Example:
      Collection Schedule
      is set to 5 mins.
      Current time is 00:30.

      If you run the collector just after 00:30, data is collected every 5 mins, first at 00:30 and next at 00:35, and so on.  
    • Cron schedule: When you select this option, data collection happens periodically. Specify the schedule by using a cron expression.
      A cron expression is a string consisting of five subexpressions (fields) that describe individual details of the schedule.  
      These fields, separated by blank spaces, can contain any of the allowed values with various combinations of the allowed characters for that field.
      Default: */5 * * * * (evaluates to 5 minutes)

      Format:
      Minutes Hours (24-hour format) Day of Month Month Day of Week

      Example:
      If you specify 10 15 3 7 * , data is collected at 15:10 hours every third day in the month of July.

    For more information about how this parameter affects data collection, see Data collection schedule.

    Data Collection Window

    Specify the historical time period (in minutes) from the current time for which the data should be collected.

    Default: 5 minutes

    Example:

    Collection Schedule is set to 5 mins.
     Data Collection Window is set to 5 mins.
     Current time is 00:30.

    If you run the collector just after 00:30, data is collected first at 00:30 for the interval 00:25 - 00:30, and next at 00:35 for the interval 00:30 - 00:35, and so on.

    For more information about how this parameter affects data collection, see Data collection window.

    Data Latency

    Specify the time (in minutes) by which the data time window should be shifted back on the timeline.

    This parameter is useful in delayed data availability situations.

    Default: 0 minutes

    Example:

    Collection Schedule is set to 5 mins.
     Data Collection Window is set to 10 mins.
    Data Latency is set to 2 mins.
     Current time is 00:30.

    If you run the collector just after 00:30, data is collected first at 00:30 for the interval 00:18 to 00:28 and next at 00:35 for the interval 0:23 to 00:33, and so on.

    For more information about how this parameter affects data collection, see Data latency.

    Status

    Select one event status from the list.

    State

    Select all or a subset of states from the list:

    • pending
    • firing
    • inactive
    • resolved 
    • unprocessed New in 24.2.02
    • active New in 24.2.02
    • suppressed New in 24.2.02

    Note: To collect alerts with the states added in version 24.2.02, you need to configure a new connector instance. The existing instances are not updated with the additional states automatically after you upgrade to version 24.4 from a version earlier than 24.2.02.

  9. To create the required collector streams for the selected data types, click CREATE COLLECTORS.
  10. Configure the distributors for the selected data types by clicking the respective data type in the Distributors section.

    Click here to view the distribution parameters for the selected data types

    Parameter name

    Description

    Max Batching Size

    Specify the maximum number of data items to send in a single POST request to the destination API.
    The batch size 
    depends on the destination’s ability to buffer the incoming data.

    Default: 250

    Max Batching Delay

    Specify the maximum time (in seconds) to wait before building and processing a batch.

    Default: 3 seconds 

    Base Retry Delay

    Specify the initial time (in seconds) for which to wait before retrying to build and process a batch.
    The waiting time increases in the following sequence: n1, n2, n3, and so on, where n indicates the number of seconds.

    Default: 2 seconds

    Example: Base Retry Delay is set to 2 seconds. Retry is performed after 2, 4, 8, 16, ... seconds.

    Max Intra-Retry Delay

    Specify the maximum limit for the base retry delay. 

    Default: 60 seconds

    Example:Max Intra-Retry Delay is set to 60 seconds.
    Base Retry Delay is set to 2 seconds. Retries are performed 2, 4, 8, 16, 32,... seconds later.

    Max Retry Duration

    Specify the total time for retrying a delivery. For REST destinations, a delivery is a batch of data items in one POST request.

    ​​​​​Default: 5 minutes

    Example: Max Retry Duration is set to 8 hours.
    Base Retry Delay is set to 2 seconds. Requests are sent for 2+4+8+16+32+64+132... until 8 hours in total duration is reached. After that, no subsequent attempts are made to retry the delivery. The assumption here is that if there is an outage or other issue with the destination tool, recovery should take less than the value of the Max Retry Duration parameter to be completed.

    Attributes To Be Dropped When Updating Events

    Specify the event attributes that you do not want to be updated in  BMC Helix Operations Management when events are updated.

    For example, if you do not want an event's severity, source address, source category, and subcategory to be updated in BMC Helix Operations Management, you need to specify those attributes in a comma-separated format: severity,source_address,source_category,source_subcategory.

    Important:
    You can obtain the event attribute names in BMC Helix Operations Management, by exporting any event data in JSON, BAROC, XML, or CSV format. The exported file contains all attributes of the event data, and from there you can identify the attributes to be dropped. 

    Events FiltersEvents filters help remove unwanted data and send only the required events to BMC Helix applications. The data is filtered by using the regular expression (regex) provided for host, message, and detailed message and is sent to BMC Helix applications.
    Host RegexSpecify the regex for the host name. Events for the hosts whose names match the regex are sent to BMC Helix applications.
    Examples:
    • To send data for the host name /inventory/pricing, specify the regex as ^/inventory/pricing$.
    • To filter out data whose host name contains the string inventory, specify the regex as ^(?!.*inventory).*.
    • To send the data for the host whose names start with the string inventory, specify the regex as ^inventory.*. 
    Important:
    If you are using multiple regex, make sure that the regex do not conflict.
    For example, do not enter .*(inventory).* and ^(?!.*inventory).* together. The former regex sends events for the hosts whose names contain the string inventory, while the latter regex sends events for the hosts whose names do not contain the string inventory.
    Message RegexSpecify the regex for the event message. Messages for the events that match the regex are sent to BMC Helix applications.
    Examples:
    • To send events whose messages contain the string HRV alert, specify the regex as .*HRV alert*.
    • To filter out the events whose message contains the string HRV alert, specify the regex as ^(?!.*HRV alert).*.
    • To send events whose message starts with the string HRV alert, specify the regex as ^(HRV alert).*. 
    Important:
    If you are using multiple regex, make sure that the regex do not conflict.
    For example, do not enter .*HRV alert.* and ^(?!.*HRV alert).* together. The former regex sends events whose message contains the string HRV alert, while the latter regex sends events whose message do not contain the string HRV alert.
    Detailed Message RegexSpecify the regex for the detailed message. Detailed messages for the events that match the regex are sent to BMC Helix applications.
    Examples:​​
    • To send events whose detailed message contains the string ci_display_name: easyTravel-k8s, specify the regex as .*ci_display_name: easyTravel-k8s.*.
    • To filter out the events whose detailed message contains the string ci_display_name: easyTravel-k8s, specify the regex as ^(?!.*ci_display_name: easyTravel-k8s).*.
    • To send the events whose detailed message starts with the string ci_display_name: easyTravel-k8s, specify the regex as ^(ci_display_name: easyTravel-k8s).*.
    Important:
    If you are using multiple regex, make sure that the regex do not conflict.
    For example, do not enter .*ci_display_name: easyTravel-k8s.* and ^(?!.*ci_display_name: easyTravel-k8s).* together. The former regex sends events whose detailed message contains the string ci_display_name: easyTravel-k8s, while the latter regex sends events whose message do not contain the string ci_display_name: easyTravel-k8s.
  11. To create the required distributor streams for the selected data types, click CREATE DISTRIBUTORS.
  12. Click one of the following buttons:
    • SAVE STREAMClick if you plan to edit the connection details before starting the stream for data collection. After you save the stream, the connector that you just created is listed in the SOURCES panel. Move the slider to the right to start the data stream.
    • SAVE AND START STREAM—Click to save the connection details and start the data collection immediately.

Important
For a data stream, the Run Latency (max/avg), Items (Avg per Run), and Last Run Status columns on the Streams page might show the status as No Runs during the data collection process. After completion of the process, these columns are updated with an appropriate status.

 For more information about data streams, see Managing data streams.

Task 2: To verify the connection

From BMC Helix Intelligent Integrations, on the SOURCES panel, confirm that the data streams for the integration you created are running. Data streaming is indicated by moving colored arrows.

AWS_Prometheus_EventsStream.png

A moving dark blue arrow (EventsStream_Icon.png) indicates that the event stream is running. Event data will be pushed according to the configured Collection Schedule interval.

Task 3: To view data in BMC Helix applications

View data collected from Prometheus in multiple BMC Helix applications.

To view events in BMC Helix Operations Management

  1. In BMC Helix Operations Management, select Monitoring and then select Events.
  2. Filter the events by the PromotheusAlert class.
    PrometheusAlert.png

    Warning

    Important

    If an event does not include the source's host name from which the event has been received, the Host column on the Events page shows the name of the computer where Prometheus is installed.

Incoming events from AMP are processed in BMC Helix Operations Management through a set of deduplication rules to determine whether the incoming event is a duplicate event or a new event. For more information, see Event deduplication, suppression, and closure for reducing event noise.

For information about events, see Monitoring and managing events.

To view situations in BMC Helix AIOps 

Before you view situations in BMC Helix AIOps, create a Business Service model in BMC Helix Discovery. For information about creating models, see Managing models.

In BMC Helix AIOps, on the Overview page to view the situations for the event data received from AMP.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Intelligent Integrations 26.1