Integrating with Datadog

As a tenant administrator, it's important that you can monitor the connected systems and quickly identify and resolve any issues. The BMC Helix Intelligent Integrations Datadog connector collects events and maximum value of the system metrics from Datadog.

You can view the collected data in various BMC Helix applications and derive the following benefits:

BMC Helix applicationType of data collected or viewedBenefits

BMC Helix Operations Management

Events 

Use a centralized event view to monitor, filter, and manage events, and perform event operations in one place. 

Process events to help identify actionable events quickly from a large volume of data.

For more information, see Monitoring events and reducing event noise. Open link

BMC Helix Operations Management

Metrics

Use alarm and variate policies to detect anomalies and eliminate false positives for more accurate results while monitoring the health of your system.

For more information, see Detecting anomalies by using static and dynamic thresholds. Open link

BMC Helix AIOps

Situations (created from events)

Improve the mean time to resolve (MTTR) based on the situation-driven workflow.

Lower the mean time to detect or discover (MTTD) and the time required for investigating tickets.

For more information, see Monitoring situations. Open link

BMC Helix Dashboards

Events

Create dashboards to get a consolidated view of data collected from third-party products across your environment.

For more information, see Creating custom dashboards Open link


As a tenant administrator, perform the following steps to configure a connection with Datadog , verify the connection, and view the collected events and metrics data in various BMC Helix applications.

Supported versions

This connector supports Datadog API version v1 for data collection.

Planning for the connection

Datadog prerequisites

Create a Datadog user with standard role and obtain the API token and application key for that user.

The standard role has the following permissions by default:

  • APM Read
  • CI Visibility Read
  • Dashboards Read
  • Incidents Read
  • Monitors Read
  • Notebooks Read
  • RUM Apps Read
  • SLOs Read

For more information, see the following sections in the Datadog documentation:

  • Datadog roles and permissions Open link
  • API and Application Keys Open link

BMC Helix Intelligent Integrations prerequisites

  • Depending on the location (SaaS, on-premises) of the third-party product, choose one or more BMC Helix Intelligent Integrations deployment modes and review the corresponding port requirements. For information about various deployment modes and port requirements, see Deployment scenarios.
  • Based on the deployment modes, use the BMC Helix Intelligent Integrations SaaS deployment or the BMC Helix Intelligent Integrations on-premises gateway or both. For more information about the gateway, see Deploying the BMC Helix Intelligent Integrations on-premises gateway.

  • The on-premises gateway must be able to reach the third-party product on the required port (default is 443).

In the preceding list, third-party product refers to Datadog.

Configuring the connection with Datadog

  1. Access BMC Helix Intelligent Integrations:

    • BMC Helix Intelligent Integrations SaaS – Log on to BMC Helix Portal, and click Launch on BMC Helix Intelligent Integrations.
    • BMC Helix Intelligent Integrationson-premises gateway – Use one of the following URLs to access BMC Helix Intelligent Integrations:
      • http://<hostName>:<portNumber>/swpui
      • https://<hostName>:<portNumber>/swpui
  2. On the CONNECTORS tab, click in the SOURCES panel.
  3. Click the Datadog  tile.
  4. Specify the following details for the source connection:
    1. Specify the Datadog   host name and port number (default port number is 443). 
    2. Specify the Datadog HTTP or HTTPS port number depending on the connection protocol .
    3. Select the  HTTPS  option to use an https connection to the Datadog  host.
    4. Specify the API version V1 token and the application key.
      For information about creating the API token and application key, see Datadog prerequisites.

  5. Click VALIDATE AND CREATE.
    The specified connection details are validated and the corresponding source connection is created in the Source Connection list.
  6. Select the source connection that you created from the list if it is not selected already.

    Important

    The destination host connection is created and configured automatically for each tenant when the source connection is created.

  7. Ensure that the options for the datatypes for which you want to collect data are selected.
  8. Configure the collectors for the selected data types by clicking the respective data type in the Collectors section. Specify the parameters for the selected data type, as explained in the following table:

    Parameter name and descriptionData Type

    Datadog Events

    Datadog Metrics


    Collection Schedule

    Specify the data collection frequency using one of the following methods:

    • Constantly by specifying the schedule in minutes, hours, or day using the Duration option
      Default: 5 minutes

      Example:
      Collection Schedule
       is set to 5 mins. 
      Current time is 00:30.

      If you run the collector just after 00:30, data is collected every 5 mins, first at 00:30 and next at 00:35, and so on.  

    • Periodically by specifying the schedule through a cron expression using the Cron schedule option 

      A cron expression is a string consisting of five subexpressions (fields) that describe individual details of the schedule.  These fields, separated by white spaces, can contain any of the allowed values with various combinations of the allowed characters for that field.
      Default: */5 * * * * (evaluates to 5 minutes)

      Format:

      Minutes Hours (24-hour format) Day of Month Month Day of Week

      Example:

      If you specify 10 15 3 7 *, data is collected at 15:10 hours every third day in the month of July.

    For more information about how this parameter affects data collection, see Data collection schedule.

    (tick)

    (tick)

    Data Collection Window

    Specify the historical time period (in minutes) from the current time for which the data should be collected.

    Default: 5 minutes

    Example:

    Collection Schedule is set to 5 mins.
    Data Collection Window is set to 5 mins.
    Current time is 00:30.

    If you run the collector just after 00:30:

    • For events, the data is collected first at 00:30 for the interval, 00:25 - 00:30, and next at 00:35 for the interval, 00:30 - 00:35, and so on.
    • For metrics, instead of all data, only the maximum valued data is collected within the specified data collection window. As example, if you are collecting system idle time, and the data collection window is set for 5 minutes, then you will get only one metrics data, that is, the maximum time the system remains idle within a 5 minutes collection window. By reducing the data collection window you can collect more metrics data.   

    For more information about how this parameter affects data collection,  see Data collection window .

    (tick)

    (tick)

    Data Latency

    Specify the time (in minutes) by which the data time window should be shifted back on the timeline.
    This parameter is useful in delayed data availability situations.

    Default: 0 minutes

    Example:

    Collection Schedule is set to 5 mins.
    Data Collection Window is set to 10 mins.
    Data Latency is set to 2 mins.
    Current time is 00:30.

    If you run the collector just after 00:30, data is collected first at 00:30 for the interval, 00:18 to 00:28 and next at 00:35 for the interval, 0:23 to 00:33, and so on.

    For more information about how this parameter affects data collection, see Data latency.

    (tick)

    (tick)

    Metrics

    Select one or more metrics from the list. The connector does not collect metrics data if you do not select any option from the list. The list shows only the metrics whose name beginning with the term 'system', for example, system.cpu.idle and system.cpu.user.

    The system metrics are grouped as per the following table:


    Space aggregation grouping by (max only)Metric name
    Host and Devicesystem.net.bytes_rcvd
    system.net.bytes_sent
    system.net.packets_in.count
    system.net.packets_in.drop
    system.net.packets_in.error
    system.net.packets_out.count
    system.net.packets_out.drop
    system.net.packets_out.error
    system.net.tcp.en_drops
    system.net.tcp.en_overflows
    Host and Device namesystem.disk.free
    system.disk.in_use
    system.disk.read_time
    system.disk.read_time_pct
    system.disk.total
    system.disk.used
    system.disk.write_time
    system.disk.write_time_pct
    system.io.avg_q_sz
    system.io.avg_rq_sz
    system.io.await
    system.io.r_await
    system.io.r_s
    system.io.rkb_s
    system.io.rrqm_s
    system.io.svctm
    system.io.util
    system.io.w_await
    system.io.w_s
    system.io.wkb_s
    system.io.wrqm_s
    Host

    system.cpu.context_switches
    system.cpu.guest
    system.cpu.idle
    system.cpu.interrupt
    system.cpu.iowait
    system.cpu.num_cores
    system.cpu.stolen
    system.cpu.system
    system.cpu.user
    system.fs.file_handles.allocated
    system.fs.file_handles.allocated_unused
    system.fs.file_handles.in_use
    system.fs.file_handles.max
    system.fs.file_handles.used
    system.fs.inodes.free
    system.fs.inodes.in_use
    system.fs.inodes.total
    system.fs.inodes.used
    system.load.1
    system.load.15
    system.load.5
    system.load.norm.1
    system.load.norm.15
    system.load.norm.5
    system.mem.buffered
    system.mem.cached
    system.mem.commit_limit
    system.mem.committed
    system.mem.committed_as
    system.mem.free
    system.mem.nonpaged
    system.mem.page_tables
    system.mem.paged
    system.mem.pagefile.free
    system.mem.pagefile.pct_free
    system.mem.pagefile.total
    system.mem.pagefile.used
    system.mem.pct_usable
    system.mem.shared
    system.mem.slab
    system.mem.slab_reclaimable
    system.mem.total
    system.mem.usable
    system.mem.used
    system.proc.count
    system.proc.queue_length
    system.swap.cached
    system.swap.free
    system.swap.pct_free
    system.swap.swap_in
    system.swap.swap_out
    system.swap.total
    system.swap.used
    system.uptime
    system.io.block_in
    system.io.block_out
    system.net.conntrack.count
    system.net.conntrack.expect_max
    system.net.conntrack.max
    system.net.conntrack.tcp_max_retrans
    system.net.conntrack.tcp_timeout_max_retrans
    system.net.iface.mtu
    system.net.iface.num_rx_queues
    system.net.iface.num_tx_queues
    system.net.iface.tx_queue_len
    system.net.ip.forwarded_datagrams
    system.net.ip.fragmentation_creates
    system.net.ip.fragmentation_fails
    system.net.ip.fragmentation_oks
    system.net.ip.in_addr_errors
    system.net.ip.in_csum_errors
    system.net.ip.in_delivers
    system.net.ip.in_discards
    system.net.ip.in_header_errors
    system.net.ip.in_no_routes
    system.net.ip.in_receives
    system.net.ip.in_truncated_pkts
    system.net.ip.in_unknown_protos
    system.net.ip.out_discards
    system.net.ip.out_no_routes
    system.net.ip.out_requests
    system.net.ip.reassembly_fails
    system.net.ip.reassembly_oks
    system.net.ip.reassembly_overlaps
    system.net.ip.reassembly_requests
    system.net.ip.reassembly_timeouts
    system.net.ip.reverse_path_filter
    system.net.tcp.abort_on_timeout
    system.net.tcp.active_opens
    system.net.tcp.attempt_fails
    system.net.tcp.backlog_drops
    system.net.tcp.current_established
    system.net.tcp.established_resets
    system.net.tcp.failed_retransmits
    system.net.tcp.from_zero_window
    system.net.tcp.in_csum_errors
    system.net.tcp.in_errors
    system.net.tcp.in_segs
    system.net.tcp.out_resets
    system.net.tcp.out_segs
    system.net.tcp.passive_opens
    system.net.tcp.paws_connection_drops
    system.net.tcp.paws_established_drops
    system.net.tcp.prune_called
    system.net.tcp.prune_ofo_called
    system.net.tcp.prune_rcv_drops
    system.net.tcp.retrans_segs
    system.net.tcp.syn_cookies_failed
    system.net.tcp.syn_cookies_recv
    system.net.tcp.syn_cookies_sent
    system.net.tcp.syn_retrans
    system.net.tcp.to_zero_window
    system.net.tcp.tw_reused
    system.net.udp.in_csum_errors
    system.net.udp.in_datagrams
    system.net.udp.in_errors
    system.net.udp.no_ports
    system.net.udp.out_datagrams
    system.net.udp.rcv_buf_errors
    system.net.udp.snd_buf_errors

    Any other system metrics are also grouped by host.


    (error)

    (tick)

    Host Filter

    Specify hosts filter for collecting the metrics data. The connector does not collect metrics data if you do not specify any host filter. For more information about the host filtering syntax, see the Datadog documentation on Advanced Filtering Open link

    (error)

    (tick)

  9. Click CREATE COLLECTORS to create the required collector streams for the selected data types.

  10. Configure the distributors for the selected data types by clicking the respective data type in the Distributors section. Specify the parameters for the selected data type, as explained in the following table:

    Parameter name and description

    Max Batching Size

    Specify the maximum number of data items to send in a single POST request to the destination API.
    The batch size
    depends on the destination’s ability to buffer the incoming data.

    Default: 250

    Max Batching Delay

    Specify the maximum time (in seconds) to wait before building a batch and processing.

    Default: 3 seconds 

    Base Retry Delay

    Specify the initial time (in seconds) for which to wait before retrying to build a batch and processing.
    The waiting time increases in the following sequence: n1, n2, n3, and so on, where n indicates the number of seconds.

    Default: 2 seconds

    Example:

    Base Retry Delay is set to 2 seconds.

    Retry is performed after 2, 4, 8, 16, ... seconds.

    Max Intra-Retry Delay

    Specify the maximum limit for the base retry delay. 

    Default: 60 seconds

    Example:

    Max Intra-Retry Delay is set to 60 seconds.
    Base Retry Delay is set to 2 seconds.

    Retries are performed 2, 4, 8, 16, 32, 64,... seconds later again.

    Max Retry Duration

    Specify the total time for retrying a delivery. For REST destinations, a delivery is a batch of data items in one POST request. 

    Default: 5 minutes

    Example:

    Max Retry Duration is set to 8 hours.
    Base Retry Delay is set to 2 seconds.

    Requests are sent for 2+4+8+16+32+64+132... until 8 hours in total duration is reached. After that, no subsequent attempts are made to retry the delivery.

    The assumption here is that if there is an outage or other issue with the destination tool, recovery should take less than the value of the Max Retry Duration parameter to be completed.


  11. Click CREATE DISTRIBUTORS to create the required distributor streams for the selected data types.
  12. Click one of the following buttons:
    • SAVE STREAM : Click this button if you want to edit the integration details before creating the instance. After you save the stream, the connector that you just created is listed in the SOURCES panel. Move the slider to the right to start the data stream.
    • SAVE AND START STREAM : Click this button if you want to save the integration details and start receiving data immediately.

          For more information about the data streams, see Starting or stopping data streams.


Verifying the connection

From BMC Helix Intelligent Integrations , on the SOURCES panel, confirm that the data streams for the connection you created are running. Data streaming is indicated by moving colored arrows.

  • A moving blue arrow ( ) indicates that the event stream is running. Event data will be pushed according to the configured Collection Schedule interval.
  • A moving red arrow ( )  indicates that the metrics stream is running. Metric data will be pushed according to the configured Collection Schedule interval.

Viewing data in BMC Helix applications

View data collected from Datadog  in multiple BMC Helix applications.

To view events in BMC Helix Operations Management

  1. In BMC Helix Operations Management, select Monitoring > Events.
  2. Filter the events by the DatadogEvent class.

Incoming events from Datadog (except from an unknown host) are processed in BMC Helix Operations Management through a set of deduplication rules to determine whether the incoming event is a duplicate event or a new event. For more information, see Event deduplication and suppression for reducing event noise.

For more information about events, see Monitoring and managing events Open link .

To view metrics in BMC Helix Operations Management

  1. In BMC Helix Operations Management, select Monitoring > Devices.
  2. Click the links for the required device.
  3. On the Monitors tab, click the required monitor.
    The Performance Overview tab shows the metrics graph. For information about metrics, see Viewing collected data Open link .

Viewing Situations in BMC Helix AIOps

Before you view situations in BMC Helix AIOps, ensure that the following prerequisites are met: 

  1. CIs are present in BMC Helix Discovery or BMC Helix AIOps for the events that are being collected from Datadog   .
  2. Create a Business Service model in one of the following applications:
    • BMC Helix Discovery. For more information, see Managing models. Open link
    • BMC Helix AIOps. For more information, see Managing models. Open link  
  3. Perform one of the following tasks:
    • To view ML-based situations, the AIOps Situations feature is enabled in BMC Helix AIOps. For more information, see Enabling the AIOps features. Open link
    • To view policy-based situations, the correlation policy is created in BMC Helix Operations Management. For more information, see Creating and enabling event policies. Open link

To view Situations

  1. In BMC Helix AIOps , go to the Situations page.
    This page shows the Situations created from the events that are ingested into BMC Helix Operations Management
  2. Click the required Situation to view the messages contained in the Situation and other details such as priority and severity of the message. 
    For information about Situations, see Monitoring situations. Open link

Mapping of event attributes between Datadog and BMC Helix Operations Management

The following table shows the mapping between Datadog and BMC Helix Operations Management:

Event attributeDatadog BMC Helix Operations Management

Event severity

(Indicated by event status in Datadog)

Error

Critical
WarnMinor
InfoInfo
OkOk

Mapping of metrics attributes between Datadog and BMC Helix Operations Management

The following table shows the mapping between Datadog and BMC Helix Operations Management:

Datadog BMC Helix Operations Management
AttributesExampleAttributesExample
Metric Namesystem.disk.read_time_pct

Metric

system.disk.read_time_pct(Percent)
UnitPercent
Display NamediskMonitor TypeDATADOG_disk
Device Namedm-0Monitor Namedisk_dm-0
Host Namehost:vl-aus-dsmw-10Associated Devicevl-aus-dsmw-10
Was this page helpful? Yes No Submitting... Thank you

Comments