Cloudera

“Moviri Integrator for TrueSight Capacity Optimization – Cloudera” is an additional component of BMC TrueSight Capacity Optimization product. It allows extracting data from Cloudera Enterprise, which is Cloudera Hadoop distribution composed of CDH (Cloudera Data Hub) and Cloudera Manager. Relevant capacity metrics are loaded into BMC TrueSight Capacity Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Hadoop View.

The integration supports the extraction of both performance and configuration data across different component of CDH and can be configured via parameters that allow entity filtering and many other settings. Furthermore the connector is able to replicate relationships and logical dependencies among entities such as clusters, resource pools, services and nodes.

The documentation is targeted at BMC TrueSight Capacity Optimization administrators, in charge of configuring and monitoring the integration between BMC TrueSight Capacity Optimization and Cloudera.

Requirements
Installation
Datasource Check and Configuration
Supported entities
Hierarchy
Troubleshooting

Requirements

Supported versions of data source software

Supported Cloudera Data Hub and Cloudera Manager versions: 5.1 to 5.8
The integration supports both Cloudera Manager bundled in Cloudera Enterprise and Cloudera Express products

Supported configurations of data source software

Moviri – Cloudera Extractor requires Cloudera Manager is continuously and correctly monitoring the various entities supported by the integration, full list available below. Any lack in meeting this requirement will cause lack in data coverage.

Installation

Downloading the additional package

ETL Module is made available in the form of an additional component, which you may download from BMC electronic distribution site (EPD) or retrieve from your content media.

Installing the additional package

To install the connector in the form of a TrueSight Capacity Optimization additional package, refer to Performing system maintenance tasks instructions.

Datasource Check and Configuration

Preparing to connect to the data source software

The connector included in "Moviri Integrator for TrueSight Capacity Optimization – Cloudera" use the Cloudera Java API v6 to communicate with Cloudera Manager. This is always enabled and no additional configuration is required.
Please note that only SELECT statements are used by the connector, preventing any accidental change to the environments.
The connector requires a read-only user with permissions on all the clusters that should be accessed.

Connector configuration attributes

The following table shows specific properties of the connector, all the other generic properties are documented here.

Property Name	Value Type	Required?	Default	Description
Cloudera Manager Connection
Hostname	String	Yes		Cloudera server hostname
Cloudera Port	Number	Yes	7180	Cloudera connection port
Spark Port	Number	Yes	18080	Spark connection port
User	String	Yes		Username
Password	String	Yes		Password
Connection Timeout	Number	No	20	Connection timeout in seconds
Use Encryption (TLS)	Boolean	Yes	false	Use encryption
Ignore certificate validation	Boolean	Yes	false	Ignore validation of TLS certificate
Ignore common name validation	Boolean	Yes	false	Ignore validation of TLS common name
Warn if version is unsupported	Boolean	Yes	false	Warn in the event the Cloudera Manager version is unsupported
Data Selection
Data Granularity	Multiple	Yes	10 minutes	Granularity of data to be imported
Import nodes	Boolean	Yes	true	Import data at node level
Import pools	Boolean	Yes	true	Import data at pool level
Import hbase	Boolean	Yes	true	Import data about HBASE service
Import spark	Boolean	Yes	true	Import data about Spark service
Substitute any dot char in pools names with this char	Char	No	-	Because of the dot is a special char for the Loader component, it's suggested to change it
Time Interval Settings
Default Last Counter (YYYY-MM-DD HH24:MI:SS Z)	Date	Yes		Default last counter value
Relocate data to timezone (e.g. America/New_York, leave empty to use BCO timezone)	String	No		Timezone to which relocate any imported sample
Limit extraction to date (YYYY-MM-DD HH24:MI:SS)	Date	No		Maximum date to be considered while extracting data
Max days to import in a single run (0 for no limit)	Number	No		Maximum days to collect in a single ETL run

The following image shows the list of options in the ETL configuration menu, with also the advanced entries.

Supported entities

The following entities are supported:

Hadoop Cluster
Hadoop Resource Pool
Hadoop Node

In addition to standard system performance metrics, data related to the following Hadoop specific services is gathered:

HDFS
SPARK
YARN
HBASE
MAP REDUCE

Hierarchy

The connector is able to replicate relationships and logical dependencies among these entities. In particular all the available Clusters are attached to the root of the hierarchy and each Cluster contains its own Nodes and Pools.

Services' data is available among the above entities' metrics, according to the following table.

	HDFS	YARN	HBASE	MAP REDUCE	SPARK
Cluster	X	X	X	X	X
Pool		X			X
Node	X

Troubleshooting

For ETL troubleshooting, please refer to official BMC documentation available here.