Cloudera


“Moviri Integrator for TrueSight Capacity Optimization – Cloudera” is an additional component of BMC TrueSight Capacity Optimization product. It allows extracting data from Cloudera Enterprise, which is Cloudera Hadoop distribution composed of CDH (Cloudera Data Hub) and Cloudera Manager.  Relevant capacity metrics are loaded into BMC TrueSight Capacity Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Hadoop View.

The integration supports the extraction of both performance and configuration data across different component of CDH and can be configured via parameters that allow entity filtering and many other settings. Furthermore the connector is able to replicate relationships and logical dependencies among entities such as clusters, resource pools, services and nodes.

The documentation is targeted at BMC TrueSight Capacity Optimization administrators, in charge of configuring and monitoring the integration between BMC TrueSight Capacity Optimization and Cloudera.

Requirements

Supported versions of data source software

  • Supported Cloudera Data Hub and Cloudera Manager versions: 5.1 to 5.8
  • The integration supports both Cloudera Manager bundled in Cloudera Enterprise and Cloudera Express products

Supported configurations of data source software

Moviri – Cloudera Extractor requires Cloudera Manager is continuously and correctly monitoring the various entities supported by the integration, full list available below. Any lack in meeting this requirement will cause lack in data coverage.

Installation

Downloading the additional package

ETL Module is made available in the form of an additional component, which you may download from BMC electronic distribution site (EPD) or retrieve from your content media.

Installing the additional package

 To install the connector in the form of a TrueSight Capacity Optimization additional package, refer to Performing system maintenance tasks instructions.

 

Datasource Check and Configuration

Preparing to connect to the data source software

The connector included in "Moviri Integrator for TrueSight Capacity Optimization – Cloudera" use the Cloudera Java API v6 to communicate with Cloudera Manager. This is always enabled and no additional configuration is required.
Please note that only SELECT statements are used by the connector, preventing any accidental change to the environments.
 The connector requires a read-only user with permissions on all the clusters that should be accessed.

Connector configuration attributes

The following table shows specific properties of the connector, all the other generic properties are documented here.

Property Name

Value Type

Required?

Default

Description

Cloudera Manager Connection

Hostname         

String

Yes

 

Cloudera server hostname

Cloudera Port

Number

Yes

7180

Cloudera connection port

Spark Port

Number

Yes

18080

Spark connection port

User

String

Yes

 

Username

Password

String

Yes

 

Password

Connection Timeout

Number

No

20

Connection timeout in seconds

Use Encryption (TLS)

Boolean

Yes

false

Use encryption

Ignore certificate validation

Boolean

Yes

false

Ignore validation of TLS certificate

Ignore common name validation

Boolean

Yes

false

Ignore validation of TLS common name

Warn if version is unsupported

Boolean

Yes

false

Warn in the event the Cloudera Manager version is unsupported

Data Selection

Data Granularity

Multiple

Yes

10 minutes

Granularity of data to be imported

Import nodes

Boolean

Yes

true

Import data at node level

Import pools

Boolean

Yes

true

Import data at pool level

Import hbase

Boolean

Yes

true

Import data about HBASE service

Import spark

Boolean

Yes

true

Import data about Spark service

Substitute any dot char in pools names with this char

Char

No

-

Because of the dot is a special char for the Loader component, it's suggested to change it

Time Interval Settings

Default Last Counter (YYYY-MM-DD HH24:MI:SS Z)

Date

Yes

 

Default last counter value

Relocate data to timezone (e.g. America/New_York, leave empty to use BCO timezone)

String

No

 

Timezone to which relocate any imported sample

Limit extraction to date (YYYY-MM-DD HH24:MI:SS)

Date

No

 

Maximum date to be considered while extracting data

Max days to import in a single run (0 for no limit)

Number

No

 

Maximum days to collect in a single ETL run

The following image shows the list of options in the ETL configuration menu, with also the advanced entries.

Cloudera.PNG

Supported entities

The following entities are supported:

  • Hadoop Cluster
  • Hadoop Resource Pool
  • Hadoop Node

In addition to standard system performance metrics, data related to the following Hadoop specific services is gathered:

  • HDFS
  • SPARK
  • YARN
  • HBASE
  • MAP REDUCE

Hierarchy

The connector is able to replicate relationships and logical dependencies among these entities. In particular all the available Clusters are attached to the root of the hierarchy and each Cluster contains its own Nodes and Pools.

HadoopClusterInCOTree.jpg

Services' data is available among the above entities' metrics, according to the following table.

 

HDFS

YARN

HBASE

MAP REDUCE

SPARK

Cluster

X

X

X

X

X

Pool

 

X

 

 

X

Node

X

 

 

 

 

Troubleshooting

For ETL troubleshooting, please refer to official BMC documentation available here.


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*