Moviri Integrator for TrueSight Capacity Optimization - Cloudera REST

Moviri Integrator for TrueSight Capacity Optimization – Cloudera REST is an additional component of TrueSight Capacity Optimization product. It allows extracting data from Cloudera Enterprise, which is a Cloudera Hadoop distribution composed of CDH (Cloudera Data Hub) and Cloudera Manager. Relevant capacity metrics are loaded into TrueSight Capacity Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Hadoop View.

The integration supports the extraction of both performance and configuration data across different components of CDH and can be configured via parameters that allow entity filtering and many other settings. Furthermore the connector is able to replicate relationships and logical dependencies among entities such as clusters, resource pools, services and nodes.

The documentation is targeted at TrueSight Capacity Optimization administrators, in charge of configuring and monitoring the integration between TrueSight Capacity Optimization and Cloudera.

Moviri Integrator for TrueSight Capacity Optimization - Cloudera REST is compatible with TrueSight Capacity Optimization 11.3 and onward.

Collecting data by using the Cloudera REST

To collect data by using the Cloudera REST ETL, do the following tasks:

I. Complete the pre-configuration tasks.

II. Configure the ETL.

III. Run Cloudera REST ETL.

IV. Verify Cloudera Data.

Step I. Complete the preconfiguration tasks

Steps	Details
Check the Cloudera Manager version is supported	Supported Cloudera Data Hub and Cloudera Manager versions: 7.0.3+
Create an administrator account for Cloudera API access.	Generate Credentials To generate credentials, follow the steps: Login to Cloudera Manager console. Click on the gear icon and select Users & Groups. Click the New user button. Enter a username and password and under the permissions tab select Admin User.
Verify if the user credentials have the access	Verify Access Execute the following command from a linux console: `curl --user <username>:<password> <https/http>://<host>:<port>/api/version` An output similar to the following should be obtained: `v44` Now run the following command from a linux console using the api version received from the first command: `curl --user <username>:<password> <https/http>://<host>:<port>/api/<version>/clusters` An output similar to the following should be obtained: `{ "items" : [ { "name" : "Cluster 1", "displayName" : "cluster", "fullVersion" : "7.1.7", "maintenanceMode" : false, "maintenanceOwners" : [ ], "clusterUrl" : "http://ip-172-31-23-180.ec2.internal:7180/cmf/clusterRedirect/Cluster+1", "hostsUrl" : "http://ip-172-31-23-180.ec2.internal:7180/cmf/clusterRedirect/Cluster+1/hosts", "entityStatus" : "GOOD_HEALTH", "uuid" : "88cf4b3e-5898-4e3d-bf41-d491791745b9", "clusterType" : "BASE_CLUSTER", "tags" : [ ] } ] }`

Step II. Configure the ETL

A. Configuring the basic properties

Some of the basic properties display default values. You can modify these values if required.

To configure the basic properties:

In the console, navigate to Administration > ETL & System Tasks, and select ETL tasks.
On the ETL tasks page, click Add > Add ETL. The Add ETL page displays the configuration properties. You must configure properties in the following tabs: Run configuration, Entity catalog, and Amazon Web Services Connection
On the Run Configuration tab, select Moviri - Cloudera REST Extractor from the ETL Module list. The name of the ETL is displayed in the ETL task name field. You can edit this field to customize the name.
Click the Entity catalog tab, and select one of the following options:
- Shared Entity Catalog:
  - From the Sharing with Entity Catalog list, select the entity catalog name that is shared between ETLs.
- Private Entity Catalog: Select if this is the only ETL that extracts data from the Cloudera REST resources.
Click the Cloudera REST - Settings tab, and configure the following properties:

Property	Description
Cloudera Protocol (HTTP/HTTPS)	The protocol of the Cloudera Manager instance (HTTP/HTTPS)
Cloudera Hostname	The hostname of the Cloudera Manager instance
Cloudera Port	The port that the Cloudera Manager instance is running on.
Spark Hostname	If spark is being used, the hostname of the spark history server.
Spark Port	If spark is being used, the port of the spark history server.
User	Username of the administrator account created in the pre-configuration steps.
Password	Password of the administrator account created in the pre-configuration steps.
Import nodes	Import data at node level
Import pools	Import data at pool level.
Import hbase	Import data about HBSAE service
import spark	Import data about Spark service
Import HDFS usage report	Import data about HDFS usage by user (requires cluster admin permission)

6. On the same tab, and configure the following Data Selection properties:

Property	Description
Data Granularity *	Granularity of data to be imported, supported granularity: 10 minute, 1 hour, 6 hour, and 1 day, and raw
Raw data aggregation	The duration in minutes to rollup the gathered raw data, default is 5 min
Import nodes	Import data at node level
Import pools	Import data at pool level
Import hbase	Import data about HBSAE service
Import spark	Import data about Spark service
Import HDFS usage report	Import data about HDFS usage by user (requires cluster admin permission)

*Data Granularity: When choosing 1 day granularity, Cloudera REST aggregates data for every day based on UTC time zone. For the best practice and accuracy, consider relocate the data to UTC to align with the aggregation resolution.

*Data Granularity: Cloudera REST API has default data detention period for certain resolution.

7. On the same tab, and configure the following Time Interval properties:

Property	Description
Default Last Counter (YYYY-MM-DD HH24:MI:SS Z)	Default last counter value. Time zone is optional, if ignored, it will use the ETL engine time zone.
Relocate data to timezone (e.g. America/New_York, leave empty to use BCO timezone)	Advanced - Time zone to which relocate any imported sample
Max extraction period (hours), default is 24 hours (1 day)	Max extraction period in hours, default is 24 hours
Lag hour to current time, default is 1 hour	Lag hour to the current time
Max days to import in a single run (0 for no limit)	Maximum days to collect in a single ETL run
Use cluster displayname for lookup instead of cluster name (default)	Advanced - Use cluster displayname as internal lookup name - useful to avoid system overwrite in TSCO if different cloudera clusters have the same cluster name and the lookup is shared between their ETL
Add cluster name to components (cluster:component) as entity name	Advanced - Add cluster name as a prefix of all components, useful in case of multiple clusters

The following image shows the list of options in the ETL configuration menu, with advanced properties.

7. (Optional) Override the default values of the properties:

Run configuration

Property

Description

Module description

A short description of the ETL module.

Execute in simulation mode

By default, the ETL execution in simulation mode is selected to validate connectivity with the data source, and to ensure that the ETL does not have any configuration issues. In the simulation mode, the ETL does not load data into the database. This option is useful when you want to test a new ETL task. To run the ETL in the production mode, select No.
BMC recommends that you run the ETL in the simulation mode after ETL configuration and then run it in the production mode.

Object relationships

Property

Description

Associate new entities to

Specify the domain to which you want to add the entities created by the ETL.

Select one of the following options:

Existing domain: This option is selected by default. Select an existing domain from the Domain list.
New domain: Select a parent domain, and specify a name for your new domain.

By default, a new domain with the same ETL name is created for each ETL.

ETL task properties

Property	Description
Task group	Select a task group to classify the ETL.
Running on scheduler	Select one of the following schedulers for running the ETL: Primary Scheduler: Runs on the Application Server. Generic Scheduler: Runs on a separate computer. Remote: Runs on remote computers.
Maximum execution time before warning	Indicates the number of hours, minutes, or days for which the ETL must run before generating warnings or alerts, if any.
Frequency	Select one of the following frequencies to run the ETL: Predefined: This is the default selection. Select a daily, weekly, or monthly frequency, and then select a time to start the ETL run accordingly. Custom: Specify a custom frequency, select an appropriate unit of time, and then specify a day and a time to start the ETL run.

(Optional) B. Configuring the advanced properties

You can configure the advanced properties to change the way the ETL works or to collect additional metrics.

To configure the advanced properties:

On the Add ETL page, click Advanced.

Configure the following properties:

Run configuration

Property	Description
Run configuration name	Specify the name that you want to assign to this ETL task configuration. The default configuration name is displayed. You can use this name to differentiate between the run configuration settings of ETL tasks.
Deploy status	Select the deploy status for the ETL task. For example, you can initially select Test and change it to Production after verifying that the ETL run results are as expected.
Log level	Specify the level of details that you want to include in the ETL log file. Select one of the following options: 1 - Light: Select to add the bare minimum activity logs to the log file. 5 - Medium: Select to add the medium-detailed activity logs to the log file. 10 - Verbose: Select to add detailed activity logs to the log file. Use log level 5 as a general practice. You can select log level 10 for debugging and troubleshooting purposes.
Datasets	Specify the datasets that you want to add to the ETL run configuration. The ETL collects data of metrics that are associated with these datasets. Click Edit. Select one (click) or more (shift+click) datasets from the Available datasets list and click >> to move them to the Selected datasets list. Click Apply. The ETL collects data of metrics associated with the datasets that are available in the Selected datasets list.

Collection level

Property

Description

Metric profile selection

Select the metric profile that the ETL must use. The ETL collects data for the group of metrics that is defined by the selected metric profile.

Use Global metric profile: This is selected by default. All the out-of-the-box ETLs use this profile.
Select a custom metric profile: Select the custom profile that you want to use from the Custom metric profile list. This list displays all the custom profiles that you have created.

For more information about metric profiles, see Adding and managing metric profiles.

Levels up to

Specify the metric level that defines the number of metrics that can be imported into the database. The load on the database increases or decreases depending on the selected metric level.

Loader configuration

Property	Description
Empty dataset behavior	Specify the action for the loader if it encounters an empty dataset: Warn: Generate a warning about loading an empty dataset. Ignore: Ignore the empty dataset and continue parsing.
ETL log file name	The name of the file that contains the ETL run log. The default value is: `%BASE/log/%AYEAR%AMONTH%ADAY%AHOUR%MINUTE%TASKID`
Maximum number of rows for CSV output	A numeric value to limit the size of the output files.
CSV loader output file name	The name of the file that is generated by the CSV loader. The default value is: `%BASE/output/%DSNAME%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSID%TASKID`
Capacity Optimization loader output file name	The name of the file that is generated by the TrueSight Capacity Optimization loader. The default value is: `%BASE/output/%DSNAME%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSID%TASKID`
Remove domain suffix from datasource name (Only for systems)	Select True to remove the domain from the data source name. For example, server.domain.com will be saved as server. The default selection is False.
Leave domain suffix to system name (Only for systems)	Select True to keep the domain in the system name. For example: `server.domain.com` will be saved as is. The default selection is False.
Skip entity creation (Only for ETL tasks sharing lookup with other tasks)	Select True if you do not want this ETL to create an entity and discard data from its data source for entities not found in TrueSight Capacity Optimization. It uses one of the other ETLs that share a lookup to create a new entity. The default selection is False.

Scheduling options

Property	Description
Hour mask	Specify a value to run the task only during particular hours within a day. For example, 0 – 23 or 1, 3, 5 – 12.
Day of week mask	Select the days so that the task can be run only on the selected days of the week. To avoid setting this filter, do not select any option for this field.
Day of month mask	Specify a value to run the task only on the selected days of a month. For example, 5, 9, 18, 27 – 31.
Apply mask validation	Select False to temporarily turn off the mask validation without removing any values. The default selection is True.
Execute after time	Specify a value in the hours:minutes format (for example, 05:00 or 16:00) to wait before the task is run. The task run begins only after the specified time is elapsed.
Enqueueable	Specify whether you want to ignore the next run command or run it after the current task. Select one of the following options: False: Ignores the next run command when a particular task is already running. This is the default selection. True: Starts the next run command immediately after the current running task is completed.

3.Click Save.

The ETL tasks page shows the details of the newly configured Cloudera REST ETL:

Step III. Run the ETL

After you configure the ETL, you can run it to collect data. You can run the ETL in the following modes:

A. Simulation mode: Only validates connection to the data source, does not collect data. Use this mode when you want to run the ETL for the first time or after you make any changes to the ETL configuration.

B. Production mode: Collects data from the data source.

A. Running the ETL in the simulation mode

To run the ETL in the simulation mode:

In the console, navigate to Administration > ETL & System Tasks, and select ETL tasks.
On the ETL tasks page, click the ETL. The ETL details are displayed.
In the Run configurations table, click Edit to modify the ETL configuration settings.
On the Run configuration tab, ensure that the Execute in simulation mode option is set to Yes, and click Save.
Click Run active configuration. A confirmation message about the ETL run job submission is displayed.
On the ETL tasks page, check the ETL run status in the Last exit column.
OK Indicates that the ETL ran without any error. You are ready to run the ETL in the production mode.
If the ETL run status is Warning, Error, or Failed:
1. On the ETL tasks page, click in the last column of the ETL name row.
2. Check the log and reconfigure the ETL if required.
3. Run the ETL again.
4. Repeat these steps until the ETL run status changes to OK.

B. Running the ETL in the production mode

You can run the ETL manually when required or schedule it to run at a specified time.

Running the ETL manually

On the ETL tasks page, click the ETL. The ETL details are displayed.
In the Run configurations table, click Edit to modify the ETL configuration settings. The Edit run configuration page is displayed.
On the Run configuration tab, select No for the Execute in simulation mode option, and click Save.
To run the ETL immediately, click Run active configuration. A confirmation message about the ETL run job submission is displayed.
When the ETL is run, it collects data from the source and transfers it to the database.

Scheduling the ETL run

By default, the ETL is scheduled to run daily. You can customize this schedule by changing the frequency and period of running the ETL.

To configure the ETL run schedule:

On the ETL tasks page, click the ETL, and click Edit Task . The ETL details are displayed.
On the Edit task page, do the following, and click Save:
- Specify a unique name and description for the ETL task.
- In the Maximum execution time before warning field, specify the duration for which the ETL must run before generating warnings or alerts, if any.
- Select a predefined or custom frequency for starting the ETL run. The default selection is Predefined.
- Select the task group and the scheduler to which you want to assign the ETL task.
Click Schedule. A message confirming the scheduling job submission is displayed.
When the ETL runs as scheduled, it collects data from the source and transfers it to the database.

Step IV. Verify data collection

Verify that the ETL ran successfully and check whether the Cloudera Manager data is refreshed in the Workspace.

To verify whether the ETL ran successfully:

In the console, click Administration > ETL and System Tasks > ETL tasks.
In the Last exec time column corresponding to the ETL name, verify that the current date and time are displayed.

To verify that the Cloudera Manager data is refreshed:

In the console, click Workspace.
Expand (Domain name) > Systems > (Cluster name)
In the left pane, verify that the hierarchy displays the new and updated Cloudera Nodes, Resource Managers, and Services.
Click a Cloudera REST entity, and click the Metrics tab in the right pane.
Check if the Last Activity column in the Configuration metrics and Performance metrics tables displays the current date.

Cloudera REST Workspace Entity

Details

Entities

Show Entities

TSCO Entities	Cloudera Entity
Hadoop Cluster	Cluster
Hadoop Resource Pool	Resource Pool
Hadoop Node	Host
Services	Supported Cloudera Services are HDFS, YARN, HBASE, MAP REDUCE, SPARK

Hierarchy

Show Hierarchy

The connector is able to replicate relationships and logical dependencies among these entities. In particular all the available Application are imported with their tier and machines. Additional hosts and services that are not part of a specific application will be imported in another domain tree.

Configuration and Performance metrics mapping

Metrics Mapping

Cloudera entity	TSCO entity	PERF/CONF	Cloudera metric	TSCO Metric	conversion factor
Cluster	Hadoop Cluster	CONF	clusterName	ALIAS_NAME
Cluster	Hadoop Cluster	CONF	total_cores_across_hosts	CPU_NUM
Cluster	Hadoop Cluster	CONF	version + fullversion	DESCRIPTION
Cluster	Hadoop Cluster	CONF	"Cloudera"	HADOOP_DISTRIBUTION
Cluster	Hadoop Cluster	CONF	total_swap_total_across_hosts	SWAP_SPACE_TOT
Cluster	Hadoop Cluster	CONF	total_capacity_used_across_filesystems /total_capacity_across_filesystems	TOTAL_FS_UTIL
Cluster	Hadoop Cluster	CONF	total_physical_memory_total_across_hosts	TOTAL_REAL_MEM
Cluster	Hadoop Cluster	PERF	total_load_1_across_hosts	CPU_RUN_QUEUE
Cluster	Hadoop Cluster	PERF	(total_cpu_system_rate_across_hosts +total_cpu_user_rate_across_hosts) /total_cores_across_hosts	CPU_UTIL
Cluster	Hadoop Cluster	PERF	total_cpu_system_rate_across_hosts/total_cores_across_hosts	CPU_UTIL_SYSTEM
Cluster	Hadoop Cluster	PERF	total_cpu_user_rate_across_hosts/total_cores_across_hosts	CPU_UTIL_USER
Cluster	Hadoop Cluster	PERF	total_cpu_iowait_rate_across_hosts/total_cores_across_hosts	CPU_UTIL_WAIO
Cluster	Hadoop Cluster	PERF	total_physical_memory_cached_across_hosts	MEM_CACHED
Cluster	Hadoop Cluster	PERF	total_swap_out_rate_across_hosts	MEM_SWAP_OUT_RATE
Cluster	Hadoop Cluster	PERF	total_physical_memory_used_across_hosts	MEM_USED
Cluster	Hadoop Cluster	PERF	total_physical_memory_used_across_hosts /total_physical_memory_total_across_hosts	MEM_UTIL
Cluster	Hadoop Cluster	PERF	total_physical_memory_cached_across_hosts /total_physical_memory_total_across_hosts	MEM_UTIL_CACHED
Cluster	Hadoop Cluster	PERF	total_bytes_receive_rate_across_network_interfaces +total_bytes_transmit_rate_across_network_interfaces	NET_BYTE_RATE
Cluster	Hadoop Cluster	PERF	total_swap_free_across_hosts	SWAP_SPACE_FREE
Cluster	Hadoop Cluster	PERF	total_swap_used_across_hosts	SWAP_SPACE_USED
Cluster	Hadoop Cluster	PERF	total_swap_used_across_hosts/swap_total_across_hosts	SWAP_SPACE_UTIL
Cluster	Hadoop Cluster	PERF	total_capacity_across_filesystems+(-1*total_capacity_used_across_filesystems)	TOTAL_FS_FREE
Cluster	Hadoop Cluster	PERF	total_capacity_across_filesystems	TOTAL_FS_SIZE
Cluster	Hadoop Cluster	PERF	total_capacity_used_across_filesystems	TOTAL_FS_USED
HBASE	Hadoop HBASE Service	PERF	total_compaction_queue_size_across_regionservers	COMPACTION_QUEUE_SIZE
HBASE	Hadoop HBASE Service	PERF	total_events_critical_rate_across_regionservers	CRIT_EVENT_RATE
HBASE	Hadoop HBASE Service	PERF	total_requests_rate_across_regionservers	DISK_IO_RATE
HBASE	Hadoop HBASE Service	PERF	total_read_requests_rate_across_regionservers	DISK_IO_READ_RATE
HBASE	Hadoop HBASE Service	PERF	total_write_requests_rate_across_regionservers	DISK_IO_WRITE_RATE
HBASE	Hadoop HBASE Service	PERF	total_read_bytes_rate_across_regionservers	DISK_READ_RATE
HBASE	Hadoop HBASE Service	PERF	total_write_bytes_rate_across_regionservers +total_read_bytes_rate_across_regionservers	DISK_TRANSFER_RATE
HBASE	Hadoop HBASE Service	PERF	total_write_bytes_rate_across_regionservers	DISK_WRITE_RATE
HBASE	Hadoop HBASE Service	PERF	total_jvm_heap_committed_mb_across_regionservers	HEAPMEM_COMMITTED	1024*1024
HBASE	Hadoop HBASE Service	PERF	total_jvm_max_memory_mb_across_regionservers	HEAPMEM_MAX	1024*1024
HBASE	Hadoop HBASE Service	PERF	total_jvm_heap_used_mb_across_regionservers	HEAPMEM_USED	1024*1024
HBASE	Hadoop HBASE Service	PERF	total_jvm_heap_used_mb_across_regionservers /total_jvm_max_memory_mb_across_regionservers	HEAPMEM_UTIL
HBASE	Hadoop HBASE Service	PERF	total_jvm_non_heap_committed_mb_across_regionservers	NONHEAPMEM_COMMITTED	1024*1024
HBASE	Hadoop HBASE Service	PERF	total_jvm_non_heap_used_mb_across_regionservers	NONHEAPMEM_USED	1024*1024
HBASE	Hadoop HBASE Service	PERF	total_stores_across_regionservers	STORE_COUNT
HBASE	Hadoop HBASE Service	PERF	total_storefiles_across_regionservers	STOREFILE_COUNT
HBASE	Hadoop HBASE Service	PERF	total_storefile_index_size_across_regionservers	STOREFILE_IDX_SIZE
HBASE	Hadoop HBASE Service	PERF	total_storefiles_size_across_regionservers	STOREFILE_SIZE
HDFS	Hadoop HDFS Resource Manager	PERF	<HDFS usage report>	BYUSER_HDFS_FILE_COUNT
HDFS	Hadoop HDFS Resource Manager	PERF	<HDFS usage report>	BYUSER_HDFS_TOTAL_FILE_SIZE
HDFS	Hadoop HDFS Resource Manager	PERF	total_bytes_read_rate_across_datanodes	DISK_READ_RATE
HDFS	Hadoop HDFS Resource Manager	PERF	total_bytes_written_rate_across_datanodes	DISK_WRITE_RATE
HDFS	Hadoop HDFS Resource Manager	PERF	files_total	HDFS_FILES_COUNT
HDFS	Hadoop HDFS Resource Manager	PERF	dfs_capacity	HDFS_TOTAL_SIZE
HDFS	Hadoop HDFS Resource Manager	PERF	dfs_capacity_used	HDFS_USED_SIZE
Cluster	Hadoop Node	CONF	clusterName	CLUSTER_NAME
Cluster	Hadoop Node	CONF	"Cloudera"	HADOOP_DISTRIBUTION
HOST_JVM	Hadoop Node	CONF	jvm_max_memory_mb	BYVM_HEAPMEM_MAX	1024*1024
HOST_PERF	Hadoop Node	CONF	cores	CPU_NUM
HOST_PERF	Hadoop Node	CONF	swap_total	SWAP_SPACE_TOT
HOST_PERF	Hadoop Node	CONF	total_capacity_across_filesystems	TOTAL_FS_SIZE
HOST_PERF	Hadoop Node	CONF	physical_memory_total	TOTAL_REAL_MEM
HOST_PERF	Hadoop Node	CONF	maintenanceMode	MAINTENANCE_MODE
HOST_PERF	Hadoop Node	CONF	ipAddress	NET_IP_ADDRESS
HOST_PERF	Hadoop Node	CONF	services	HADOOP_COMPONENTS
HOST_HDFS	Hadoop Node	PERF	files_total	HDFS_FILES_COUNT
HOST_HDFS	Hadoop Node	PERF	dfs_capacity	HDFS_TOTAL_SIZE
HOST_HDFS	Hadoop Node	PERF	dfs_capacity_used	HDFS_USED_SIZE
HOST_JVM	Hadoop Node	PERF	jvm_gc_rate	BYVM_GC_EVENTS_RATE
HOST_JVM	Hadoop Node	PERF	jvm_gc_time_ms_rate	BYVM_GC_TIME_PCT	0.001
HOST_JVM	Hadoop Node	PERF	jvm_heap_committed_mb	BYVM_HEAPMEM_COMMITTED	1024*1024
HOST_JVM	Hadoop Node	PERF	jvm_heap_used_mb	BYVM_HEAPMEM_USED	1024*1024
HOST_PERF	Hadoop Node	PERF	load_1	CPU_RUN_QUEUE
HOST_PERF	Hadoop Node	PERF	(cpu_system_rate+cpu_user_rate) /getHostFact(numCores,1)	CPU_UTIL
HOST_PERF	Hadoop Node	PERF	cpu_idle_rate/getHostFact(numCores,1)	CPU_UTIL_IDLE
HOST_PERF	Hadoop Node	PERF	cpu_system_rate/getHostFact(numCores,1)	CPU_UTIL_SYSTEM
HOST_PERF	Hadoop Node	PERF	cpu_user_rate/getHostFact(numCores,1)	CPU_UTIL_USER
HOST_PERF	Hadoop Node	PERF	cpu_iowait_rate/getHostFact(numCores,1)	CPU_UTIL_WAIO
HOST_PERF	Hadoop Node	PERF	total_read_bytes_rate_across_disks	DISK_READ_RATE
HOST_PERF	Hadoop Node	PERF	total_write_bytes_rate_across_disks	DISK_WRITE_RATE
HOST_PERF	Hadoop Node	PERF	physical_memory_cached	MEM_CACHED
HOST_PERF	Hadoop Node	PERF	physical_memory_total +(-1*physical_memory_used)	MEM_FREE
HOST_PERF	Hadoop Node	PERF	swap_out_rate	MEM_SWAP_OUT_RATE
HOST_PERF	Hadoop Node	PERF	physical_memory_used	MEM_USED
HOST_PERF	Hadoop Node	PERF	physical_memory_used/physical_memory_total	MEM_UTIL
HOST_PERF	Hadoop Node	PERF	physical_memory_cached/physical_memory_total	MEM_UTIL_CACHED
HOST_PERF	Hadoop Node	PERF	total_bytes_receive_rate_across_network_interfaces +total_bytes_transmit_rate_across_network_interfaces	NET_BYTE_RATE
HOST_PERF	Hadoop Node	PERF	total_bytes_receive_rate_across_network_interfaces	NET_IN_BYTE_RATE
HOST_PERF	Hadoop Node	PERF	total_bytes_transmit_rate_across_network_interfaces	NET_OUT_BYTE_RATE
HOST_PERF	Hadoop Node	PERF	swap_free	SWAP_SPACE_FREE
HOST_PERF	Hadoop Node	PERF	swap_used	SWAP_SPACE_USED
HOST_PERF	Hadoop Node	PERF	swap_used/swap_total	SWAP_SPACE_UTIL
HOST_PERF	Hadoop Node	PERF	total_capacity_across_filesystems +(-1*total_capacity_used_across_filesystems)	TOTAL_FS_FREE
HOST_PERF	Hadoop Node	PERF	total_capacity_used_across_filesystems	TOTAL_FS_USED
HOST_PERF	Hadoop Node	PERF	total_capacity_used_across_filesystems /total_capacity_across_filesystems	TOTAL_FS_UTIL
YARN_POOL	Hadoop Resource Pool (YARN)	CONF	allocated_vcores_cumulative +available_vcores	CPU_NUM
YARN_POOL	Hadoop Resource Pool (YARN)	CONF	allocated_memory_mb_cumulative +available_memory_mb	TOTAL_REAL_MEM
YARN_APP	Hadoop Resource Pool (YARN)	PERF	cpu_milliseconds	BYAPP_CPU_TIME	0.001
YARN_APP	Hadoop Resource Pool (YARN)	PERF	application_duration	BYAPP_DURATION
YARN_APP	Hadoop Resource Pool (YARN)	PERF	file_bytes_read	BYAPP_FILE_BYTES_READ
YARN_APP	Hadoop Resource Pool (YARN)	PERF	file_bytes_written	BYAPP_FILE_BYTES_WRITE
YARN_APP	Hadoop Resource Pool (YARN)	PERF	hdfs_bytes_read	BYAPP_HDFS_BYTES_READ
YARN_APP	Hadoop Resource Pool (YARN)	PERF	hdfs_bytes_written	BYAPP_HDFS_BYTES_WRITE
YARN_APP	Hadoop Resource Pool (YARN)	PERF	map_input_bytes	BYAPP_MAP_IN_BYTES
YARN_APP	Hadoop Resource Pool (YARN)	PERF	map_input_records	BYAPP_MAP_IN_RECORDS
YARN_APP	Hadoop Resource Pool (YARN)	PERF	map_output_bytes	BYAPP_MAP_OUT_BYTES
YARN_APP	Hadoop Resource Pool (YARN)	PERF	map_output_records	BYAPP_MAP_OUT_RECORDS
YARN_APP	Hadoop Resource Pool (YARN)	PERF	reduce_input_records	BYAPP_RED_IN_RECORDS
YARN_APP	Hadoop Resource Pool (YARN)	PERF	reduce_output_records	BYAPP_RED_OUT_RECORDS
YARN_APP	Hadoop Resource Pool (YARN)	PERF	cpu_milliseconds	CPU_TIME	0.001
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	apps_completed_rate	APP_COMPLETION_RATE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	apps_failed_rate	APP_FAILED_RATE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	yarn_application_file_bytes_read_rate	APP_FILE_READ_RATE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	yarn_application_file_bytes_written_rate	APP_FILE_WRITE_RATE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	yarn_application_hdfs_bytes_read_rate	APP_HDFS_BYTES_READ
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	yarn_application_hdfs_bytes_written_rate	APP_HDFS_BYTES_WRITE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	apps_killed_rate	APP_KILLED_RATE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	yarn_application_maps_rate	APP_MAP_LAUNCH_RATE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	apps_pending	APP_PENDING
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	yarn_application_reduces_rate	APP_RED_LAUNCH_RATE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	apps_running	APP_RUNNING
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	apps_submitted_rate	APP_SUBMITTED_RATE
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	allocated_memory_mb	MEM_USED
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	allocated_vcores	VCORES_USED
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	reserved_vcores	VCORES_RESERVED
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	pending_vcores	VCORES_PENDING
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	reserved_memory_mb	MEM_RESERVED	1024*1024
YARN_POOL	Hadoop Resource Pool (YARN)	PERF	pending_memory_mb	MEM_PENDING	1024*1024
YARN	Hadoop YARN Resource Manager	CONF	total_allocated_vcores_across_yarn_pools +total_available_vcores_across_yarn_pools	CPU_NUM
YARN	Hadoop YARN Resource Manager	CONF	total_allocated_memory_mb_across_yarn_pools +total_available_memory_mb_across_yarn_pools	TOTAL_REAL_MEM	1024*1024
Spark	Hadoop YARN Resource Manager	PERF	totalCores	APP_SPARK_CORES
Spark	Hadoop YARN Resource Manager	PERF	diskUsed	APP_SPARK_DISK_BYTES
Spark	Hadoop YARN Resource Manager	PERF	totalGCTime	APP_SPARK_GC_TIME
Spark	Hadoop YARN Resource Manager	PERF	totalInputBytes	APP_SPARK_INPUT_BYTES
Spark	Hadoop YARN Resource Manager	PERF	maxMemory	APP_SPARK_MEM_TOTAL_BYTES
Spark	Hadoop YARN Resource Manager	PERF	memoryUsed	APP_SPARK_MEM_USED_BYTES
Spark	Hadoop YARN Resource Manager	PERF	rddBlocks	APP_SPARK_RDD_BLOCKS
Spark	Hadoop YARN Resource Manager	PERF	totalShuffleRead	APP_SPARK_SHUFFLE_READ_BYTES
Spark	Hadoop YARN Resource Manager	PERF	totalShuffleWrite	APP_SPARK_SHUFFLE_WRITE_BYTES
Spark	Hadoop YARN Resource Manager	PERF	totalDuration	APP_SPARK_TASK_TIME
Spark	Hadoop YARN Resource Manager	PERF	completedTasks	APP_SPARK_TASKS_COMPLETED
Spark	Hadoop YARN Resource Manager	PERF	failedTasks	APP_SPARK_TASKS_FAILED
Spark	Hadoop YARN Resource Manager	PERF	totalTasks	APP_SPARK_TASKS_TOTAL
Spark	Hadoop YARN Resource Manager	PERF	totalCores	BYAPP_SPARK_CORES
Spark	Hadoop YARN Resource Manager	PERF	diskUsed	BYAPP_SPARK_DISK_BYTES
Spark	Hadoop YARN Resource Manager	PERF	totalGCTime	BYAPP_SPARK_GC_TIME
Spark	Hadoop YARN Resource Manager	PERF	totalInputBytes	BYAPP_SPARK_INPUT_BYTES
Spark	Hadoop YARN Resource Manager	PERF	maxMemory	BYAPP_SPARK_MEM_TOTAL_BYTES
Spark	Hadoop YARN Resource Manager	PERF	memoryUsed	BYAPP_SPARK_MEM_USED_BYTES
Spark	Hadoop YARN Resource Manager	PERF	rddBlocks	BYAPP_SPARK_RDD_BLOCKS
Spark	Hadoop YARN Resource Manager	PERF	totalShuffleRead	BYAPP_SPARK_SHUFFLE_READ_BYTES
Spark	Hadoop YARN Resource Manager	PERF	totalShuffleWrite	BYAPP_SPARK_SHUFFLE_WRITE_BYTES
Spark	Hadoop YARN Resource Manager	PERF	totalDuration	BYAPP_SPARK_TASK_TIME
Spark	Hadoop YARN Resource Manager	PERF	completedTasks	BYAPP_SPARK_TASKS_COMPLETED
Spark	Hadoop YARN Resource Manager	PERF	failedTasks	BYAPP_SPARK_TASKS_FAILED
Spark	Hadoop YARN Resource Manager	PERF	totalTasks	BYAPP_SPARK_TASKS_TOTAL
YARN	Hadoop YARN Resource Manager	PERF	yarn_application_file_bytes_read_rate	APP_FILE_READ_RATE
YARN	Hadoop YARN Resource Manager	PERF	yarn_application_file_bytes_written_rate	APP_FILE_WRITE_RATE
YARN	Hadoop YARN Resource Manager	PERF	yarn_application_hdfs_bytes_read_rate	APP_HDFS_BYTES_READ
YARN	Hadoop YARN Resource Manager	PERF	yarn_application_hdfs_bytes_written_rate	APP_HDFS_BYTES_WRITE
YARN	Hadoop YARN Resource Manager	PERF	yarn_application_maps_rate	APP_MAP_LAUNCH_RATE
YARN	Hadoop YARN Resource Manager	PERF	yarn_application_reduces_rate	APP_RED_LAUNCH_RATE
YARN	Hadoop YARN Resource Manager	PERF	total_allocated_vcores_across_yarn_pools	VCORES_USED
YARN	Hadoop YARN Resource Manager	PERF	total_allocated_vcores_across_yarn_pools /(total_allocated_vcores_across_yarn_pools +total_available_vcores_across_yarn_pools)	CPU_UTIL
YARN	Hadoop YARN Resource Manager	PERF	total_available_memory_mb_across_yarn_pools	MEM_FREE	1024*1024
YARN	Hadoop YARN Resource Manager	PERF	total_allocated_memory_mb_across_yarn_pools	MEM_USED	1024*1024
YARN	Hadoop YARN Resource Manager	PERF	total_allocated_memory_mb_across_yarn_pools /(total_allocated_memory_mb_across_yarn_pools +total_available_memory_mb_across_yarn_pools)	MEM_UTIL
YARN_ALL_POOLS	Hadoop YARN Resource Manager	PERF	apps_completed_cumulative_rate	APP_COMPLETION_RATE
YARN_ALL_POOLS	Hadoop YARN Resource Manager	PERF	apps_failed_cumulative_rate	APP_FAILED_RATE
YARN_ALL_POOLS	Hadoop YARN Resource Manager	PERF	apps_killed_cumulative_rate	APP_KILLED_RATE
YARN_ALL_POOLS	Hadoop YARN Resource Manager	PERF	apps_pending_cumulative	APP_PENDING
YARN_ALL_POOLS	Hadoop YARN Resource Manager	PERF	apps_running_cumulative	APP_RUNNING
YARN_ALL_POOLS	Hadoop YARN Resource Manager	PERF	apps_submitted_cumulative_rate	APP_SUBMITTED_RATE

Moviri Integrator for TrueSight Capacity Optimization - Cloudera REST

A. Configuring the basic properties

(Optional) B. Configuring the advanced properties

A. Running the ETL in the simulation mode

B. Running the ETL in the production mode

Running the ETL manually

Scheduling the ETL run

Comments