Moviri Integrator for BMC Helix Capacity Optimization - Ambari


“Moviri Integrator for BMC Helix Continuous Optimization – Ambari” allows extracting data from Hadoop deployments through the open-source component Ambari. Relevant capacity metrics are loaded into BMC Helix Continuous Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Hadoop View.

The integration supports the extraction of both performance and configuration data across different Hadoop components and can be configured via parameters that allow entity filtering and many other settings. Furthermore the connector is able to replicate relationships and logical dependencies among entities such as clusters, resource pools, services and nodes.

The documentation is targeted at BMC Helix Continuous Optimization administrators, in charge of configuring and monitoring the integration between BMC Helix Continuous Optimization and Ambari.

Requirements

Supported versions of data source software

  • Supported Hadoop distribution is Hortonworks Data Platform (HDP): version 2.2 and 2.51
  • Supported Ambari: version 1.6 to 2.51, REST API v1

 The integration, though not officially supported, is expected to work on custom deployments of Hadoop components that leverage Ambari as their management and monitoring service.

Supported configurations of data source software

Moviri – Ambari Extractor requires Ambari is continuously and correctly monitoring the various entities supported by the integration, full list available below. Any lack in meeting this requirement will cause lask in data coverage.

Datasource Check and Configuration

Preparing to connect to the data source software

The connector included in "Moviri Integrator for BMC Helix Continuous Optimization – Ambari" use the Ambari REST API v1 to communicate with Ambari. This is always enabled and no additional configuration is required.
Please note that only GET method is used by the connector, preventing any accidental change to the environments.
The connector requires a regular Cloudera Manager user (admin privileges not required) with read-only permissions on all the clusters that should be accessed.

Connector configuration attributes

The following table shows specific properties of the connector, all the other generic properties are documented here.

 

Property Name

Value Type

Required?

Default

Description

Ambari Connection

Ambari Hostname         

String

Yes

 

Ambari server hostname

Spark Hostname

String

Yes

 

Spark server hostname

Ambari Port

Number

Yes

8080

Ambari connection port

Spark Port

Number

Yes

18080

Spark connection port

User

String

Yes

 

Username

Password

String

Yes

 

Password

Connection Timeout

Number

No

20

Connection timeout in seconds

Data Selection

Import nodes

Boolean

Yes

true

Import data at node level

Import pools

Boolean

Yes

true

Import data at pool level

HDFS data import

Boolean

Yes

true

Import data about HDFS service

YARN data import

Boolean

Yes

true

Import data about YARN service

HBASE data import

Boolean

Yes

true

Import data about HBASE service

SPARK data import

Boolean

Yes

true

Import data about SPARK service

Cluster regexp whitelist, semicolon separated

String

No

 

List of clusters to be imported, semicolon separated. Regexp is supported.

Cluster regexp blacklist, semicolon separated

String

No

 

List of clusters not to be imported, semicolon separated. Regexp is supported. This setting overrides whitelist in case of conflict.

Host regexp whitelist, semicolon separated

String

No

 

List of hosts to be imported, semicolon separated. Regexp is supported. Setting this field disables aggregation at cluster level.

Host regexp blacklist, semicolon separated

String

No

 

List of hosts not to be imported, semicolon separated. Regexp is supported. This setting overrides whitelist in case of conflict. Setting this field disables aggregation at cluster level.

Maximum pool exploration depth

Number

No

 

A limit to the exploration of nested pools.

Substitute any dot char in pools names with this char

Char

No

-

Because the dot is a special char for the Loader component, it's suggested to change it.

Time Interval Settings

Maximum days to extract for execution

Number

No

7

Each ETL run will not extract more than the specified number of days.

Date limit not to extract beyond (YYYY-MM-DD HH24:MI:SS)

Date

No

 

Maximum date to be considered while extracting data.

The following image shows the list of options in the ETL configuration menu, with also the advanced entries.

ambariconf.png

Supported entities

The following entities are supported:

  • Hadoop Cluster
  • Hadoop Resource Pool
  • Hadoop Node

In addition to standard system performance metrics, data related to the following Hadoop specific services is gathered:

  • HDFS
  • SPARK
  • YARN
  • HBASE

Hierarchy

The connector is able to replicate relationships and logical dependencies among these entities. In particular all the available Clusters are attached to the root of the hierarchy and each Cluster contains its own Nodes, Resource Managers and Services.

HadoopClusterInCOTree.jpg

Services' data is available among the above entities' metrics, according to the following table.

 

HDFS

YARN

HBASE

SPARK

Cluster

X

X

X

X

Pool

 

X

 

 

Node

X

 

 

 

Troubleshooting

For ETL troubleshooting, refer to official BMC documentation available here.

Known issues

Issue

Resolution

ETL runs fine but data is partially or totally missing

 

Probably data is missing in the datasource. Check from the Ambari web frontend if data is available, otherwise the following image is shown. In such event consider to enable data collection.capture_001_20150806_124339.png

 

 

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*