Moviri - Ambari Extractor

“Moviri Integrator for BMC Helix Continuous Optimization – Ambari” allows extracting data from Hadoop deployments through the open-source component Ambari. Relevant capacity metrics are loaded into BMC Helix Continuous Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Hadoop View.

The integration supports the extraction of both performance and configuration data across different Hadoop components and can be configured via parameters that allow entity filtering and many other settings. Furthermore the connector is able to replicate relationships and logical dependencies among entities such as clusters, resource pools, services and nodes.

The documentation is targeted at BMC Helix Continuous Optimizationadministrators, in charge of configuring and monitoring the integration between BMC Helix Continuous Optimizationand Ambari.

Requirements
Installation
Datasource Check and Configuration
Supported entities
Hierarchy

Moviri Integrator for BMC Helix Continuous Optimization - Ambari is compatible with BMC Helix Continuous Optimization 19.11 and onward.

Requirements

Supported versions of data source software

Supported Hadoop distribution is Hortonworks Data Platform (HDP): version 2.2
Supported Ambari: version 1.6 to 3.1.4

The integration, though not officially supported, is expected to work on custom deployments of Hadoop components that leverage Ambari as their management and monitoring service.

Supported configurations of data source software

Moviri – Ambari Extractor requires Ambari is continuously and correctly monitoring the various entities supported by the integration, full list available below. Any lack in meeting this requirement will cause lask in data coverage.

Installation

Downloading the additional package

Network View is made available in the form of an additional component, which you may download from BMC electronic distribution site (EPD) or retrieve from your content media.

Installing the additional package

To install the connector in the form of a BMC Helix Continuous Optimizationadditional package, refer to Performing system maintenance tasks instructions.

Datasource Check and Configuration

Preparing to connect to the data source software

The connector included in "Moviri Integrator for BMC Helix Continuous Optimization – Ambari" use the Ambari REST API to communicate with Ambari. This is always enabled and no additional configuration is required.

The REST API will extract cluster version out first and use that version to access cluster.

Please note that only GET method is used by the connector, preventing any accidental change to the environments.
The connector requires a regular Cloudera Manager user (admin privileges not required) with read-only permissions on all the clusters that should be accessed.

Connector configuration attributes

The following table shows specific properties of the connector, all the other generic properties are documented here.

Property Name	Value Type	Required?	Default	Description
Ambari Connection
Ambari Hostname	String	Yes		Ambari server hostname
Spark Hostname	String	Yes		Spark server hostname
Ambari Port	Number	Yes	8080	Ambari connection port
Spark Port	Number	Yes	18080	Spark connection port
User	String	Yes		Username
Password	String	Yes		Password
Connection Timeout	Number	No	20	Connection timeout in seconds
Data Selection
Import nodes	Boolean	Yes	true	Import data at node level
Import pools	Boolean	Yes	true	Import data at pool level
HDFS data import	Boolean	Yes	true	Import data about HDFS service
YARN data import	Boolean	Yes	true	Import data about YARN service
HBASE data import	Boolean	Yes	true	Import data about HBASE service
SPARK data import	Boolean	Yes	true	Import data about SPARK service
Cluster regexp whitelist, semicolon separated	String	No		List of clusters to be imported, semicolon separated. Regexp is supported.
Cluster regexp blacklist, semicolon separated	String	No		List of clusters not to be imported, semicolon separated. Regexp is supported. This setting overrides whitelist in case of conflict.
Host regexp whitelist, semicolon separated	String	No		List of hosts to be imported, semicolon separated. Regexp is supported. Setting this field disables aggregation at cluster level.
Host regexp blacklist, semicolon separated	String	No		List of hosts not to be imported, semicolon separated. Regexp is supported. This setting overrides whitelist in case of conflict. Setting this field disables aggregation at cluster level.
Maximum pool exploration depth	Number	No		A limit to the exploration of nested pools.
Substitute any dot char in pools names with this char	Char	No	-	Because the dot is a special char for the Loader component, it's suggested to change it.
Time Interval Settings
Maximum days to extract for execution	Number	No	7	Each ETL run will not extract more than the specified number of days.
Date limit not to extract beyond (YYYY-MM-DD HH24:MI:SS)	Date	No		Maximum date to be considered while extracting data.

The following image shows the list of options in the ETL configuration menu, with also the advanced entries.

Supported entities

The following entities are supported:

Hadoop Cluster
Hadoop Resource Pool
Hadoop Node

In addition to standard system performance metrics, data related to the following Hadoop specific services is gathered:

HDFS
SPARK
YARN
HBASE

Hierarchy

The connector is able to replicate relationships and logical dependencies among these entities. In particular all the available Clusters are attached to the root of the hierarchy and each Cluster contains its own Nodes, Resource Managers and Services.

Services' data is available among the above entities' metrics, according to the following table.

	HDFS	YARN	HBASE	SPARK
Cluster	X	X	X	X
Pool		X
Node	X

Known issues

Issue	Resolution
ETL runs fine but data is partially or totally missing	Probably data is missing in the datasource. Check from the Ambari web frontend if data is available, otherwise the following image is shown. In such event consider to enable data collection.