Moviri Integrator for TrueSight Capacity Optimization - Cloudera REST
Moviri Integrator for TrueSight Capacity Optimization – Cloudera REST is an additional component of TrueSight Capacity Optimization product. It allows extracting data from Cloudera Enterprise, which is a Cloudera Hadoop distribution composed of CDH (Cloudera Data Hub) and Cloudera Manager. Relevant capacity metrics are loaded into TrueSight Capacity Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Hadoop View.
The integration supports the extraction of both performance and configuration data across different components of CDH and can be configured via parameters that allow entity filtering and many other settings. Furthermore the connector is able to replicate relationships and logical dependencies among entities such as clusters, resource pools, services and nodes.
The documentation is targeted at TrueSight Capacity Optimization administrators, in charge of configuring and monitoring the integration between TrueSight Capacity Optimization and Cloudera.
Moviri Integrator for TrueSight Capacity Optimization - Cloudera REST is compatible with TrueSight Capacity Optimization 11.3 and onward.
Collecting data by using the Cloudera REST
To collect data by using the Cloudera REST ETL, do the following tasks:
Steps | Details |
---|---|
Check the Cloudera Manager version is supported | Supported Cloudera Data Hub and Cloudera Manager versions: 7.0.3+ |
Create an administrator account for Cloudera API access. | |
Verify if the user credentials have the access |
A. Configuring the basic properties
Some of the basic properties display default values. You can modify these values if required.
To configure the basic properties:
- In the console, navigate to Administration > ETL & System Tasks, and select ETL tasks.
On the ETL tasks page, click Add > Add ETL. The Add ETL page displays the configuration properties. You must configure properties in the following tabs: Run configuration, Entity catalog, and Amazon Web Services Connection
On the Run Configuration tab, select Moviri - Cloudera REST Extractor from the ETL Module list. The name of the ETL is displayed in the ETL task name field. You can edit this field to customize the name.
- Click the Entity catalog tab, and select one of the following options:
Shared Entity Catalog:
- From the Sharing with Entity Catalog list, select the entity catalog name that is shared between ETLs.
- Private Entity Catalog: Select if this is the only ETL that extracts data from the Cloudera REST resources.
Click the Cloudera REST - Settings tab, and configure the following properties:
Property | Description |
---|---|
Cloudera Protocol (HTTP/HTTPS) | The protocol of the Cloudera Manager instance (HTTP/HTTPS) |
Cloudera Hostname | The hostname of the Cloudera Manager instance |
Cloudera Port | The port that the Cloudera Manager instance is running on. |
Spark Hostname | If spark is being used, the hostname of the spark history server. |
Spark Port | If spark is being used, the port of the spark history server. |
User | Username of the administrator account created in the pre-configuration steps. |
Password | Password of the administrator account created in the pre-configuration steps. |
Import nodes | Import data at node level |
Import pools | Import data at pool level. |
Import hbase | Import data about HBSAE service |
import spark | Import data about Spark service |
Import HDFS usage report | Import data about HDFS usage by user (requires cluster admin permission) |
6. On the same tab, and configure the following Data Selection properties:
Property | Description |
---|---|
Data Granularity * | Granularity of data to be imported, supported granularity: 10 minute, 1 hour, 6 hour, and 1 day, and raw |
Raw data aggregation | The duration in minutes to rollup the gathered raw data, default is 5 min |
Import nodes | Import data at node level |
Import pools | Import data at pool level |
Import hbase | Import data about HBSAE service |
Import spark | Import data about Spark service |
Import HDFS usage report | Import data about HDFS usage by user (requires cluster admin permission) |
*Data Granularity: When choosing 1 day granularity, Cloudera REST aggregates data for every day based on UTC time zone. For the best practice and accuracy, consider relocate the data to UTC to align with the aggregation resolution.
*Data Granularity: Cloudera REST API has default data detention period for certain resolution.
7. On the same tab, and configure the following Time Interval properties:Property | Description |
---|---|
Default Last Counter (YYYY-MM-DD HH24:MI:SS Z) | Default last counter value. Time zone is optional, if ignored, it will use the ETL engine time zone. |
Relocate data to timezone (e.g. America/New_York, leave empty to use BCO timezone) | Advanced - Time zone to which relocate any imported sample |
Max extraction period (hours), default is 24 hours (1 day) | Max extraction period in hours, default is 24 hours |
Lag hour to current time, default is 1 hour | Lag hour to the current time |
Max days to import in a single run (0 for no limit) | Maximum days to collect in a single ETL run |
Use cluster displayname for lookup instead of cluster name (default) | Advanced - Use cluster displayname as internal lookup name - useful to avoid system overwrite in TSCO if different cloudera clusters have the same cluster name and the lookup is shared between their ETL |
Add cluster name to components (cluster:component) as entity name | Advanced - Add cluster name as a prefix of all components, useful in case of multiple clusters |
The following image shows the list of options in the ETL configuration menu, with advanced properties.
7. (Optional) Override the default values of the properties:
(Optional) B. Configuring the advanced properties
You can configure the advanced properties to change the way the ETL works or to collect additional metrics.
To configure the advanced properties:
- On the Add ETL page, click Advanced.
Configure the following properties:
3.Click Save.The ETL tasks page shows the details of the newly configured Cloudera REST ETL:
After you configure the ETL, you can run it to collect data. You can run the ETL in the following modes:
A. Simulation mode: Only validates connection to the data source, does not collect data. Use this mode when you want to run the ETL for the first time or after you make any changes to the ETL configuration.
B. Production mode: Collects data from the data source.
A. Running the ETL in the simulation mode
To run the ETL in the simulation mode:
- In the console, navigate to Administration > ETL & System Tasks, and select ETL tasks.
- On the ETL tasks page, click the ETL. The ETL details are displayed.
- In the Run configurations table, click Edit to modify the ETL configuration settings.
- On the Run configuration tab, ensure that the Execute in simulation mode option is set to Yes, and click Save.
- Click Run active configuration. A confirmation message about the ETL run job submission is displayed.
- On the ETL tasks page, check the ETL run status in the Last exit column.
OK Indicates that the ETL ran without any error. You are ready to run the ETL in the production mode. - If the ETL run status is Warning, Error, or Failed:
- On the ETL tasks page, click in the last column of the ETL name row.
- Check the log and reconfigure the ETL if required.
- Run the ETL again.
- Repeat these steps until the ETL run status changes to OK.
B. Running the ETL in the production mode
You can run the ETL manually when required or schedule it to run at a specified time.
Running the ETL manually
- On the ETL tasks page, click the ETL. The ETL details are displayed.
- In the Run configurations table, click Edit to modify the ETL configuration settings. The Edit run configuration page is displayed.
- On the Run configuration tab, select No for the Execute in simulation mode option, and click Save.
- To run the ETL immediately, click Run active configuration. A confirmation message about the ETL run job submission is displayed.
When the ETL is run, it collects data from the source and transfers it to the database.
Scheduling the ETL run
By default, the ETL is scheduled to run daily. You can customize this schedule by changing the frequency and period of running the ETL.
To configure the ETL run schedule:
- On the ETL tasks page, click the ETL, and click Edit Task . The ETL details are displayed.
On the Edit task page, do the following, and click Save:
- Specify a unique name and description for the ETL task.
- In the Maximum execution time before warning field, specify the duration for which the ETL must run before generating warnings or alerts, if any.
- Select a predefined or custom frequency for starting the ETL run. The default selection is Predefined.
- Select the task group and the scheduler to which you want to assign the ETL task.
Click Schedule. A message confirming the scheduling job submission is displayed.
When the ETL runs as scheduled, it collects data from the source and transfers it to the database.
Verify that the ETL ran successfully and check whether the Cloudera Manager data is refreshed in the Workspace.
To verify whether the ETL ran successfully:
- In the console, click Administration > ETL and System Tasks > ETL tasks.
- In the Last exec time column corresponding to the ETL name, verify that the current date and time are displayed.
- In the console, click Workspace.
- Expand (Domain name) > Systems > (Cluster name)
- In the left pane, verify that the hierarchy displays the new and updated Cloudera Nodes, Resource Managers, and Services.
- Click a Cloudera REST entity, and click the Metrics tab in the right pane.
- Check if the Last Activity column in the Configuration metrics and Performance metrics tables displays the current date.
Cloudera REST Workspace Entity | Details | |
---|---|---|
Entities | ||
Hierarchy | ||
Configuration and Performance metrics mapping |
Comments
Log in or register to comment.