Cloudera
“Moviri Integrator for TrueSight Capacity Optimization – Cloudera” is an additional component of BMC TrueSight Capacity Optimization product. It allows extracting data from Cloudera Enterprise, which is Cloudera Hadoop distribution composed of CDH (Cloudera Data Hub) and Cloudera Manager. Relevant capacity metrics are loaded into BMC TrueSight Capacity Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Hadoop View.
The integration supports the extraction of both performance and configuration data across different component of CDH and can be configured via parameters that allow entity filtering and many other settings. Furthermore the connector is able to replicate relationships and logical dependencies among entities such as clusters, resource pools, services and nodes.
The documentation is targeted at BMC TrueSight Capacity Optimization administrators, in charge of configuring and monitoring the integration between BMC TrueSight Capacity Optimization and Cloudera.
- Requirements
- Installation
- Datasource Check and Configuration
- Supported entities
- Hierarchy
- Troubleshooting
Requirements
Supported versions of data source software
- Supported Cloudera Data Hub and Cloudera Manager versions: 5.1 to 5.8
- The integration supports both Cloudera Manager bundled in Cloudera Enterprise and Cloudera Express products
Supported configurations of data source software
Moviri – Cloudera Extractor requires Cloudera Manager is continuously and correctly monitoring the various entities supported by the integration, full list available below. Any lack in meeting this requirement will cause lack in data coverage.
Installation
Downloading the additional package
ETL Module is made available in the form of an additional component, which you may download from BMC electronic distribution site (EPD) or retrieve from your content media.
Installing the additional package
To install the connector in the form of a TrueSight Capacity Optimization additional package, refer to Performing system maintenance tasks instructions.
Datasource Check and Configuration
Preparing to connect to the data source software
The connector included in "Moviri Integrator for TrueSight Capacity Optimization – Cloudera" use the Cloudera Java API v6 to communicate with Cloudera Manager. This is always enabled and no additional configuration is required.
Please note that only SELECT statements are used by the connector, preventing any accidental change to the environments.
The connector requires a read-only user with permissions on all the clusters that should be accessed.
Connector configuration attributes
The following table shows specific properties of the connector, all the other generic properties are documented here.
Property Name | Value Type | Required? | Default | Description |
Cloudera Manager Connection | ||||
Hostname | String | Yes |
| Cloudera server hostname |
Cloudera Port | Number | Yes | 7180 | Cloudera connection port |
Spark Port | Number | Yes | 18080 | Spark connection port |
User | String | Yes |
| Username |
Password | String | Yes |
| Password |
Connection Timeout | Number | No | 20 | Connection timeout in seconds |
Use Encryption (TLS) | Boolean | Yes | false | Use encryption |
Ignore certificate validation | Boolean | Yes | false | Ignore validation of TLS certificate |
Ignore common name validation | Boolean | Yes | false | Ignore validation of TLS common name |
Warn if version is unsupported | Boolean | Yes | false | Warn in the event the Cloudera Manager version is unsupported |
Data Selection | ||||
Data Granularity | Multiple | Yes | 10 minutes | Granularity of data to be imported |
Import nodes | Boolean | Yes | true | Import data at node level |
Import pools | Boolean | Yes | true | Import data at pool level |
Import hbase | Boolean | Yes | true | Import data about HBASE service |
Import spark | Boolean | Yes | true | Import data about Spark service |
Substitute any dot char in pools names with this char | Char | No | - | Because of the dot is a special char for the Loader component, it's suggested to change it |
Time Interval Settings | ||||
Default Last Counter (YYYY-MM-DD HH24:MI:SS Z) | Date | Yes |
| Default last counter value |
Relocate data to timezone (e.g. America/New_York, leave empty to use BCO timezone) | String | No |
| Timezone to which relocate any imported sample |
Limit extraction to date (YYYY-MM-DD HH24:MI:SS) | Date | No |
| Maximum date to be considered while extracting data |
Max days to import in a single run (0 for no limit) | Number | No |
| Maximum days to collect in a single ETL run |
The following image shows the list of options in the ETL configuration menu, with also the advanced entries.
Supported entities
The following entities are supported:
- Hadoop Cluster
- Hadoop Resource Pool
- Hadoop Node
In addition to standard system performance metrics, data related to the following Hadoop specific services is gathered:
- HDFS
- SPARK
- YARN
- HBASE
- MAP REDUCE
Hierarchy
The connector is able to replicate relationships and logical dependencies among these entities. In particular all the available Clusters are attached to the root of the hierarchy and each Cluster contains its own Nodes and Pools.
Services' data is available among the above entities' metrics, according to the following table.
| HDFS | YARN | HBASE | MAP REDUCE | SPARK |
Cluster | X | X | X | X | X |
Pool |
| X |
|
| X |
Node | X |
|
|
|
|
Troubleshooting
For ETL troubleshooting, please refer to official BMC documentation available here.