Moviri Integrator for TrueSight Capacity Optimization - Cloudera REST

Moviri Integrator for TrueSight Capacity Optimization – Cloudera REST is an additional component of TrueSight Capacity Optimization product. It allows extracting data from Cloudera Enterprise, which is a Cloudera Hadoop distribution composed of CDH (Cloudera Data Hub) and Cloudera Manager.  Relevant capacity metrics are loaded into TrueSight Capacity Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Hadoop View.

The integration supports the extraction of both performance and configuration data across different components of CDH and can be configured via parameters that allow entity filtering and many other settings. Furthermore the connector is able to replicate relationships and logical dependencies among entities such as clusters, resource pools, services and nodes.

The documentation is targeted at TrueSight Capacity Optimization administrators, in charge of configuring and monitoring the integration between TrueSight Capacity Optimization and Cloudera.


Moviri Integrator for TrueSight Capacity Optimization - Cloudera REST is compatible with TrueSight Capacity Optimization 11.3 and onward.


Collecting data by using the Cloudera REST

To collect data by using the Cloudera REST ETL, do the following tasks:

I. Complete the pre-configuration tasks.

II. Configure the ETL.

III. Run Cloudera REST ETL.

IV. Verify Cloudera Data.

Step I. Complete the preconfiguration tasks

StepsDetails
Check the Cloudera Manager version is supportedSupported Cloudera Data Hub and Cloudera Manager versions: 7.0.3+
Create an administrator account for Cloudera API access.

To generate credentials, follow the steps:

  1. Login to Cloudera Manager console.
  2. Click on the gear icon and select Users & Groups.
  3. Click the New user button.
  4. Enter a username and password and under the permissions tab select Admin User
Verify if the user credentials have the access

Execute the following command from a linux console:

curl --user <username>:<password> <https/http>://<host>:<port>/api/version

An output similar to the following should be obtained:

v44

Now run the following command from a linux console using the api version received from the first command: 

curl --user <username>:<password> <https/http>://<host>:<port>/api/<version>/clusters

An output similar to the following should be obtained:

{
  "items" : [ {
    "name" : "Cluster 1",
    "displayName" : "cluster",
    "fullVersion" : "7.1.7",
    "maintenanceMode" : false,
    "maintenanceOwners" : [ ],
    "clusterUrl" : "http://ip-172-31-23-180.ec2.internal:7180/cmf/clusterRedirect/Cluster+1",
    "hostsUrl" : "http://ip-172-31-23-180.ec2.internal:7180/cmf/clusterRedirect/Cluster+1/hosts",
    "entityStatus" : "GOOD_HEALTH",
    "uuid" : "88cf4b3e-5898-4e3d-bf41-d491791745b9",
    "clusterType" : "BASE_CLUSTER",
    "tags" : [ ]
  } ]
}

Step II. Configure the ETL

A. Configuring the basic properties

Some of the basic properties display default values. You can modify these values if required.

To configure the basic properties:

  1. In the console, navigate to Administration ETL & System Tasks, and select ETL tasks.
  2. On the ETL tasks page, click Add > Add ETL. The Add ETL page displays the configuration properties. You must configure properties in the following tabs: Run configuration, Entity catalog, and Amazon Web Services Connection

  3. On the Run Configuration tab, select Moviri - Cloudera REST Extractor from the ETL Module list. The name of the ETL is displayed in the ETL task name field. You can edit this field to customize the name.

  4. Click the Entity catalog tab, and select one of the following options:
    • Shared Entity Catalog:

      • From the Sharing with Entity Catalog list, select the entity catalog name that is shared between ETLs.
    • Private Entity Catalog: Select if this is the only ETL that extracts data from the Cloudera REST resources.
  5. Click the Cloudera REST - Settings tab, and configure the following properties:

PropertyDescription

Cloudera Protocol (HTTP/HTTPS)

The protocol of the Cloudera Manager instance (HTTP/HTTPS)

Cloudera Hostname

The hostname of the Cloudera Manager instance

Cloudera Port

The port that the Cloudera Manager instance is running on.

Spark Hostname

If spark is being used, the hostname of the spark history server.

Spark Port

If spark is being used, the port of the spark history server. 

User

Username of the administrator account created in the pre-configuration steps.

Password

Password of the administrator account created in the pre-configuration steps.

Import nodesImport data at node level
Import poolsImport data at pool level.
Import hbaseImport data about HBSAE service
import sparkImport data about Spark service
Import HDFS usage reportImport data about HDFS usage by user (requires cluster admin permission)

6. On the same tab, and configure the following Data Selection properties:

PropertyDescription

Data Granularity *

Granularity of data to be imported, supported granularity: 10 minute, 1 hour, 6 hour, and 1 day, and raw 

Raw data aggregationThe duration in minutes to rollup the gathered raw data, default is 5 min

Import nodes

Import data at node level

Import pools

Import data at pool level

Import hbaseImport data about HBSAE service

Import spark

Import data about Spark service

Import HDFS usage reportImport data about HDFS usage by user (requires cluster admin permission)

*Data Granularity: When choosing 1 day granularity, Cloudera REST aggregates data for every day based on UTC time zone. For the best practice and accuracy, consider relocate the data to UTC to align with the aggregation resolution. 

*Data Granularity: Cloudera REST API has default data detention period for certain resolution.

7. On the same tab, and configure the following Time Interval properties:
PropertyDescription
Default Last Counter (YYYY-MM-DD HH24:MI:SS Z)Default last counter value. Time zone is optional, if ignored, it will use the ETL engine time zone. 
Relocate data to timezone (e.g. America/New_York, leave empty to use BCO timezone)Advanced - Time zone to which relocate any imported sample
Max extraction period (hours), default is 24 hours (1 day)Max extraction period in hours, default is 24 hours
Lag hour to current time, default is 1 hourLag hour to the current time
Max days to import in a single run (0 for no limit)Maximum days to collect in a single ETL run
Use cluster displayname for lookup instead of cluster name (default)Advanced - Use cluster displayname as internal lookup name - useful to avoid system overwrite in TSCO if different cloudera clusters have the same cluster name and the lookup is shared between their ETL
Add cluster name to components (cluster:component) as entity nameAdvanced - Add cluster name as a prefix of all components, useful in case of multiple clusters

The following image shows the list of options in the ETL configuration menu, with advanced properties.

7. (Optional) Override the default values of the properties:

PropertyDescription
Module descriptionA short description of the ETL module.
Execute in simulation modeBy default, the ETL execution in simulation mode is selected to validate connectivity with the data source, and to ensure that the ETL does not have any configuration issues. In the simulation mode, the ETL does not load data into the database. This option is useful when you want to test a new ETL task. To run the ETL in the production mode, select No.
BMC recommends that you run the ETL in the simulation mode after ETL configuration and then run it in the production mode.
PropertyDescription
Associate new entities to

Specify the domain to which you want to add the entities created by the ETL.

Select one of the following options:

  • Existing domain: This option is selected by default. Select an existing domain from the Domain list. 
  • New domain: Select a parent domain, and specify a name for your new domain.

By default, a new domain with the same ETL name is created for each ETL. 

PropertyDescription
Task groupSelect a task group to classify the ETL.
Running on schedulerSelect one of the following schedulers for running the ETL:
  • Primary Scheduler: Runs on the Application Server.
  • Generic Scheduler: Runs on a separate computer.
  • Remote: Runs on remote computers.
Maximum execution time before warningIndicates the number of hours, minutes, or days for which the ETL must run before generating warnings or alerts, if any.
Frequency

Select one of the following frequencies to run the ETL:

  • Predefined: This is the default selection. Select a daily, weekly, or monthly frequency, and then select a time to start the ETL run accordingly.
  • Custom: Specify a custom frequency, select an appropriate unit of time, and then specify a day and a time to start the ETL run.

(Optional) B. Configuring the advanced properties

You can configure the advanced properties to change the way the ETL works or to collect additional metrics.

To configure the advanced properties:

  1. On the Add ETL page, click Advanced.
  2. Configure the following properties:

    PropertyDescription
    Run configuration nameSpecify the name that you want to assign to this ETL task configuration. The default configuration name is displayed. You can use this name to differentiate between the run configuration settings of ETL tasks.
    Deploy statusSelect the deploy status for the ETL task. For example, you can initially select Test and change it to Production after verifying that the ETL run results are as expected.
    Log levelSpecify the level of details that you want to include in the ETL log file. Select one of the following options:
    • 1 - Light: Select to add the bare minimum activity logs to the log file.
    • 5 - Medium: Select to add the medium-detailed activity logs to the log file.
    • 10 - Verbose: Select to add detailed activity logs to the log file.

    Use log level 5 as a general practice. You can select log level 10 for debugging and troubleshooting purposes.

    Datasets

    Specify the datasets that you want to add to the ETL run configuration. The ETL collects data of metrics that are associated with these datasets.

    1. Click Edit.
    2. Select one (click) or more (shift+click) datasets from the Available datasets list and click >> to move them to the Selected datasets list.
    3. Click Apply.

    The ETL collects data of metrics associated with the datasets that are available in the Selected datasets list.

    PropertyDescription
    Metric profile selection

    Select the metric profile that the ETL must use. The ETL collects data for the group of metrics that is defined by the selected metric profile.

    • Use Global metric profile: This is selected by default. All the out-of-the-box ETLs use this profile.
    • Select a custom metric profile: Select the custom profile that you want to use from the Custom metric profile list. This list displays all the custom profiles that you have created.
    For more information about metric profiles, see Adding and managing metric profiles.
    Levels up to

    Specify the metric level that defines the number of metrics that can be imported into the database. The load on the database increases or decreases depending on the selected metric level.

    PropertyDescription
    Empty dataset behaviorSpecify the action for the loader if it encounters an empty dataset:
    • Warn: Generate a warning about loading an empty dataset.
    • Ignore: Ignore the empty dataset and continue parsing.
    ETL log file nameThe name of the file that contains the ETL run log. The default value is: %BASE/log/%AYEAR%AMONTH%ADAY%AHOUR%MINUTE%TASKID
    Maximum number of rows for CSV outputA numeric value to limit the size of the output files.
    CSV loader output file nameThe name of the file that is generated by the CSV loader. The default value is: %BASE/output/%DSNAME%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSID%TASKID
    Capacity Optimization loader output file name

    The name of the file that is generated by the TrueSight Capacity Optimization loader. The default value is: %BASE/output/%DSNAME%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSID%TASKID

    Remove domain suffix from datasource name (Only for systems) Select True to remove the domain from the data source name. For example, server.domain.com will be saved as server. The default selection is False.
    Leave domain suffix to system name (Only for systems)Select True to keep the domain in the system name. For example: server.domain.com will be saved as is. The default selection is False.
    Skip entity creation (Only for ETL tasks sharing lookup with other tasks)

    Select True if you do not want this ETL to create an entity and discard data from its data source for entities not found in TrueSight Capacity Optimization. It uses one of the other ETLs that share a lookup to create a new entity. The default selection is False.

    PropertyDescription
    Hour maskSpecify a value to run the task only during particular hours within a day. For example, 0 – 23 or 1, 3, 5 – 12.
    Day of week maskSelect the days so that the task can be run only on the selected days of the week. To avoid setting this filter, do not select any option for this field.
    Day of month maskSpecify a value to run the task only on the selected days of a month. For example, 5, 9, 18, 27 – 31.
    Apply mask validationSelect False to temporarily turn off the mask validation without removing any values. The default selection is True.
    Execute after timeSpecify a value in the hours:minutes format (for example, 05:00 or 16:00) to wait before the task is run. The task run begins only after the specified time is elapsed.
    EnqueueableSpecify whether you want to ignore the next run command or run it after the current task. Select one of the following options:
    • False: Ignores the next run command when a particular task is already running. This is the default selection.
    • True: Starts the next run command immediately after the current running task is completed.
    3.Click Save.

    The ETL tasks page shows the details of the newly configured Cloudera REST ETL:

Step III. Run the ETL

After you configure the ETL, you can run it to collect data. You can run the ETL in the following modes:

A. Simulation mode: Only validates connection to the data source, does not collect data. Use this mode when you want to run the ETL for the first time or after you make any changes to the ETL configuration.

B. Production mode: Collects data from the data source.

A. Running the ETL in the simulation mode

To run the ETL in the simulation mode:

  1. In the console, navigate to Administration ETL & System Tasks, and select ETL tasks.
  2. On the ETL tasks page, click the ETL. The ETL details are displayed.



  3. In the Run configurations table, click Edit  to modify the ETL configuration settings.
  4. On the Run configuration tab, ensure that the Execute in simulation mode option is set to Yes, and click Save.
  5. Click Run active configuration. A confirmation message about the ETL run job submission is displayed.
  6. On the ETL tasks page, check the ETL run status in the Last exit column.
    OK Indicates that the ETL ran without any error. You are ready to run the ETL in the production mode.
  7.  If the ETL run status is Warning, Error, or Failed:
    1. On the ETL tasks page, click  in the last column of the ETL name row.
    2. Check the log and reconfigure the ETL if required.
    3. Run the ETL again.
    4. Repeat these steps until the ETL run status changes to OK.

B. Running the ETL in the production mode

You can run the ETL manually when required or schedule it to run at a specified time.

Running the ETL manually

  1. On the ETL tasks page, click the ETL. The ETL details are displayed.
  2. In the Run configurations table, click Edit  to modify the ETL configuration settings. The Edit run configuration page is displayed.
  3. On the Run configuration tab, select No for the Execute in simulation mode option, and click Save.
  4. To run the ETL immediately, click Run active configuration. A confirmation message about the ETL run job submission is displayed.
    When the ETL is run, it collects data from the source and transfers it to the database.

Scheduling the ETL run

By default, the ETL is scheduled to run daily. You can customize this schedule by changing the frequency and period of running the ETL.

To configure the ETL run schedule:

  1. On the ETL tasks page, click the ETL, and click Edit Task . The ETL details are displayed.

  2. On the Edit task page, do the following, and click Save:

    • Specify a unique name and description for the ETL task.
    • In the Maximum execution time before warning field, specify the duration for which the ETL must run before generating warnings or alerts, if any.
    • Select a predefined or custom frequency for starting the ETL run. The default selection is Predefined.
    • Select the task group and the scheduler to which you want to assign the ETL task.
  3. Click Schedule. A message confirming the scheduling job submission is displayed.
    When the ETL runs as scheduled, it collects data from the source and transfers it to the database.

Step IV. Verify data collection

Verify that the ETL ran successfully and check whether the Cloudera Manager data is refreshed in the Workspace.

To verify whether the ETL ran successfully:

  1. In the console, click Administration > ETL and System Tasks > ETL tasks.
  2. In the Last exec time column corresponding to the ETL name, verify that the current date and time are displayed.
To verify that the Cloudera Manager data is refreshed:

  1. In the console, click Workspace.
  2. Expand (Domain name) > Systems > (Cluster name)
  3. In the left pane, verify that the hierarchy displays the new and updated Cloudera Nodes, Resource Managers, and Services.
  4. Click a Cloudera REST entity, and click the Metrics tab in the right pane.
  5. Check if the Last Activity column in the Configuration metrics and Performance metrics tables displays the current date.


Cloudera REST Workspace EntityDetails

Entities

TSCO EntitiesCloudera Entity

Hadoop Cluster

Cluster

Hadoop Resource Pool

Resource Pool

Hadoop Node

Host

Services

Supported Cloudera Services are HDFS, YARN, HBASE, MAP REDUCE, SPARK 


Hierarchy

The connector is able to replicate relationships and logical dependencies among these entities. In particular all the available Application are imported with their tier and machines. Additional hosts and services that are not part of a specific application will be imported in another domain tree.


Configuration and Performance metrics mapping

Cloudera entityTSCO entityPERF/CONFCloudera metricTSCO Metricconversion factor
ClusterHadoop ClusterCONFclusterNameALIAS_NAME
ClusterHadoop ClusterCONFtotal_cores_across_hostsCPU_NUM
ClusterHadoop ClusterCONFversion + fullversionDESCRIPTION
ClusterHadoop ClusterCONF"Cloudera"HADOOP_DISTRIBUTION
ClusterHadoop ClusterCONFtotal_swap_total_across_hostsSWAP_SPACE_TOT
ClusterHadoop ClusterCONF

total_capacity_used_across_filesystems

/total_capacity_across_filesystems

TOTAL_FS_UTIL
ClusterHadoop ClusterCONFtotal_physical_memory_total_across_hostsTOTAL_REAL_MEM
ClusterHadoop ClusterPERFtotal_load_1_across_hostsCPU_RUN_QUEUE
ClusterHadoop ClusterPERF

(total_cpu_system_rate_across_hosts

+total_cpu_user_rate_across_hosts)

/total_cores_across_hosts

CPU_UTIL
ClusterHadoop ClusterPERFtotal_cpu_system_rate_across_hosts/total_cores_across_hostsCPU_UTIL_SYSTEM 
ClusterHadoop ClusterPERFtotal_cpu_user_rate_across_hosts/total_cores_across_hostsCPU_UTIL_USER
ClusterHadoop ClusterPERFtotal_cpu_iowait_rate_across_hosts/total_cores_across_hostsCPU_UTIL_WAIO
ClusterHadoop ClusterPERFtotal_physical_memory_cached_across_hostsMEM_CACHED
ClusterHadoop ClusterPERFtotal_swap_out_rate_across_hostsMEM_SWAP_OUT_RATE
ClusterHadoop ClusterPERFtotal_physical_memory_used_across_hostsMEM_USED
ClusterHadoop ClusterPERF

total_physical_memory_used_across_hosts

/total_physical_memory_total_across_hosts

MEM_UTIL
ClusterHadoop ClusterPERF

total_physical_memory_cached_across_hosts

/total_physical_memory_total_across_hosts

MEM_UTIL_CACHED
ClusterHadoop ClusterPERF

total_bytes_receive_rate_across_network_interfaces

+total_bytes_transmit_rate_across_network_interfaces

NET_BYTE_RATE
ClusterHadoop ClusterPERFtotal_swap_free_across_hostsSWAP_SPACE_FREE
ClusterHadoop ClusterPERFtotal_swap_used_across_hostsSWAP_SPACE_USED
ClusterHadoop ClusterPERFtotal_swap_used_across_hosts/swap_total_across_hostsSWAP_SPACE_UTIL
ClusterHadoop ClusterPERFtotal_capacity_across_filesystems+(-1*total_capacity_used_across_filesystems)TOTAL_FS_FREE
ClusterHadoop ClusterPERFtotal_capacity_across_filesystemsTOTAL_FS_SIZE
ClusterHadoop ClusterPERFtotal_capacity_used_across_filesystemsTOTAL_FS_USED
HBASEHadoop HBASE ServicePERFtotal_compaction_queue_size_across_regionserversCOMPACTION_QUEUE_SIZE
HBASEHadoop HBASE ServicePERFtotal_events_critical_rate_across_regionserversCRIT_EVENT_RATE
HBASEHadoop HBASE ServicePERFtotal_requests_rate_across_regionserversDISK_IO_RATE
HBASEHadoop HBASE ServicePERFtotal_read_requests_rate_across_regionserversDISK_IO_READ_RATE
HBASEHadoop HBASE ServicePERFtotal_write_requests_rate_across_regionserversDISK_IO_WRITE_RATE
HBASEHadoop HBASE ServicePERFtotal_read_bytes_rate_across_regionserversDISK_READ_RATE
HBASEHadoop HBASE ServicePERF

total_write_bytes_rate_across_regionservers

+total_read_bytes_rate_across_regionservers

DISK_TRANSFER_RATE
HBASEHadoop HBASE ServicePERFtotal_write_bytes_rate_across_regionserversDISK_WRITE_RATE
HBASEHadoop HBASE ServicePERFtotal_jvm_heap_committed_mb_across_regionserversHEAPMEM_COMMITTED1024*1024
HBASEHadoop HBASE ServicePERFtotal_jvm_max_memory_mb_across_regionserversHEAPMEM_MAX1024*1024
HBASEHadoop HBASE ServicePERFtotal_jvm_heap_used_mb_across_regionserversHEAPMEM_USED1024*1024
HBASEHadoop HBASE ServicePERF

total_jvm_heap_used_mb_across_regionservers

/total_jvm_max_memory_mb_across_regionservers

HEAPMEM_UTIL
HBASEHadoop HBASE ServicePERFtotal_jvm_non_heap_committed_mb_across_regionserversNONHEAPMEM_COMMITTED1024*1024
HBASEHadoop HBASE ServicePERFtotal_jvm_non_heap_used_mb_across_regionserversNONHEAPMEM_USED1024*1024
HBASEHadoop HBASE ServicePERFtotal_stores_across_regionserversSTORE_COUNT
HBASEHadoop HBASE ServicePERFtotal_storefiles_across_regionserversSTOREFILE_COUNT
HBASEHadoop HBASE ServicePERFtotal_storefile_index_size_across_regionserversSTOREFILE_IDX_SIZE
HBASEHadoop HBASE ServicePERFtotal_storefiles_size_across_regionserversSTOREFILE_SIZE
HDFSHadoop HDFS Resource ManagerPERF<HDFS usage report>BYUSER_HDFS_FILE_COUNT
HDFSHadoop HDFS Resource ManagerPERF<HDFS usage report>BYUSER_HDFS_TOTAL_FILE_SIZE
HDFSHadoop HDFS Resource ManagerPERFtotal_bytes_read_rate_across_datanodesDISK_READ_RATE
HDFSHadoop HDFS Resource ManagerPERFtotal_bytes_written_rate_across_datanodesDISK_WRITE_RATE
HDFSHadoop HDFS Resource ManagerPERFfiles_totalHDFS_FILES_COUNT
HDFSHadoop HDFS Resource ManagerPERFdfs_capacityHDFS_TOTAL_SIZE
HDFSHadoop HDFS Resource ManagerPERFdfs_capacity_usedHDFS_USED_SIZE
ClusterHadoop NodeCONFclusterNameCLUSTER_NAME
ClusterHadoop NodeCONF"Cloudera"HADOOP_DISTRIBUTION
HOST_JVMHadoop NodeCONFjvm_max_memory_mbBYVM_HEAPMEM_MAX1024*1024
HOST_PERFHadoop NodeCONFcoresCPU_NUM
HOST_PERFHadoop NodeCONFswap_totalSWAP_SPACE_TOT
HOST_PERFHadoop NodeCONFtotal_capacity_across_filesystemsTOTAL_FS_SIZE
HOST_PERFHadoop NodeCONFphysical_memory_totalTOTAL_REAL_MEM
HOST_PERFHadoop NodeCONFmaintenanceModeMAINTENANCE_MODE
HOST_PERFHadoop NodeCONFipAddressNET_IP_ADDRESS
HOST_PERFHadoop NodeCONFservicesHADOOP_COMPONENTS
HOST_HDFSHadoop NodePERFfiles_totalHDFS_FILES_COUNT
HOST_HDFSHadoop NodePERFdfs_capacityHDFS_TOTAL_SIZE
HOST_HDFSHadoop NodePERFdfs_capacity_usedHDFS_USED_SIZE
HOST_JVMHadoop NodePERFjvm_gc_rateBYVM_GC_EVENTS_RATE
HOST_JVMHadoop NodePERFjvm_gc_time_ms_rateBYVM_GC_TIME_PCT0.001
HOST_JVMHadoop NodePERFjvm_heap_committed_mbBYVM_HEAPMEM_COMMITTED1024*1024
HOST_JVMHadoop NodePERFjvm_heap_used_mbBYVM_HEAPMEM_USED1024*1024
HOST_PERFHadoop NodePERFload_1CPU_RUN_QUEUE
HOST_PERFHadoop NodePERF

(cpu_system_rate+cpu_user_rate)

/getHostFact(numCores,1)

CPU_UTIL
HOST_PERFHadoop NodePERFcpu_idle_rate/getHostFact(numCores,1)CPU_UTIL_IDLE
HOST_PERFHadoop NodePERFcpu_system_rate/getHostFact(numCores,1)CPU_UTIL_SYSTEM
HOST_PERFHadoop NodePERFcpu_user_rate/getHostFact(numCores,1)CPU_UTIL_USER
HOST_PERFHadoop NodePERFcpu_iowait_rate/getHostFact(numCores,1)CPU_UTIL_WAIO
HOST_PERFHadoop NodePERFtotal_read_bytes_rate_across_disksDISK_READ_RATE
HOST_PERFHadoop NodePERFtotal_write_bytes_rate_across_disksDISK_WRITE_RATE
HOST_PERFHadoop NodePERFphysical_memory_cachedMEM_CACHED
HOST_PERFHadoop NodePERF

physical_memory_total

+(-1*physical_memory_used)

MEM_FREE
HOST_PERFHadoop NodePERFswap_out_rateMEM_SWAP_OUT_RATE
HOST_PERFHadoop NodePERFphysical_memory_usedMEM_USED
HOST_PERFHadoop NodePERFphysical_memory_used/physical_memory_totalMEM_UTIL
HOST_PERFHadoop NodePERFphysical_memory_cached/physical_memory_totalMEM_UTIL_CACHED
HOST_PERFHadoop NodePERF

total_bytes_receive_rate_across_network_interfaces

+total_bytes_transmit_rate_across_network_interfaces

NET_BYTE_RATE
HOST_PERFHadoop NodePERFtotal_bytes_receive_rate_across_network_interfacesNET_IN_BYTE_RATE
HOST_PERFHadoop NodePERFtotal_bytes_transmit_rate_across_network_interfacesNET_OUT_BYTE_RATE
HOST_PERFHadoop NodePERFswap_freeSWAP_SPACE_FREE
HOST_PERFHadoop NodePERFswap_usedSWAP_SPACE_USED
HOST_PERFHadoop NodePERFswap_used/swap_totalSWAP_SPACE_UTIL
HOST_PERFHadoop NodePERF

total_capacity_across_filesystems

+(-1*total_capacity_used_across_filesystems)

TOTAL_FS_FREE
HOST_PERFHadoop NodePERFtotal_capacity_used_across_filesystemsTOTAL_FS_USED
HOST_PERFHadoop NodePERF

total_capacity_used_across_filesystems

/total_capacity_across_filesystems

TOTAL_FS_UTIL
YARN_POOLHadoop Resource Pool (YARN)CONF

allocated_vcores_cumulative

+available_vcores

CPU_NUM
YARN_POOLHadoop Resource Pool (YARN)CONF

allocated_memory_mb_cumulative

+available_memory_mb

TOTAL_REAL_MEM
YARN_APPHadoop Resource Pool (YARN)PERFcpu_millisecondsBYAPP_CPU_TIME0.001
YARN_APPHadoop Resource Pool (YARN)PERFapplication_durationBYAPP_DURATION
YARN_APPHadoop Resource Pool (YARN)PERFfile_bytes_readBYAPP_FILE_BYTES_READ
YARN_APPHadoop Resource Pool (YARN)PERFfile_bytes_writtenBYAPP_FILE_BYTES_WRITE
YARN_APPHadoop Resource Pool (YARN)PERFhdfs_bytes_readBYAPP_HDFS_BYTES_READ
YARN_APPHadoop Resource Pool (YARN)PERFhdfs_bytes_writtenBYAPP_HDFS_BYTES_WRITE
YARN_APPHadoop Resource Pool (YARN)PERFmap_input_bytesBYAPP_MAP_IN_BYTES
YARN_APPHadoop Resource Pool (YARN)PERFmap_input_recordsBYAPP_MAP_IN_RECORDS
YARN_APPHadoop Resource Pool (YARN)PERFmap_output_bytesBYAPP_MAP_OUT_BYTES
YARN_APPHadoop Resource Pool (YARN)PERFmap_output_recordsBYAPP_MAP_OUT_RECORDS
YARN_APPHadoop Resource Pool (YARN)PERFreduce_input_recordsBYAPP_RED_IN_RECORDS
YARN_APPHadoop Resource Pool (YARN)PERFreduce_output_recordsBYAPP_RED_OUT_RECORDS
YARN_APPHadoop Resource Pool (YARN)PERFcpu_millisecondsCPU_TIME0.001
YARN_POOLHadoop Resource Pool (YARN)PERFapps_completed_rateAPP_COMPLETION_RATE
YARN_POOLHadoop Resource Pool (YARN)PERFapps_failed_rateAPP_FAILED_RATE
YARN_POOLHadoop Resource Pool (YARN)PERFyarn_application_file_bytes_read_rateAPP_FILE_READ_RATE
YARN_POOLHadoop Resource Pool (YARN)PERFyarn_application_file_bytes_written_rateAPP_FILE_WRITE_RATE
YARN_POOLHadoop Resource Pool (YARN)PERFyarn_application_hdfs_bytes_read_rateAPP_HDFS_BYTES_READ
YARN_POOLHadoop Resource Pool (YARN)PERFyarn_application_hdfs_bytes_written_rateAPP_HDFS_BYTES_WRITE
YARN_POOLHadoop Resource Pool (YARN)PERFapps_killed_rateAPP_KILLED_RATE
YARN_POOLHadoop Resource Pool (YARN)PERFyarn_application_maps_rateAPP_MAP_LAUNCH_RATE
YARN_POOLHadoop Resource Pool (YARN)PERFapps_pendingAPP_PENDING
YARN_POOLHadoop Resource Pool (YARN)PERFyarn_application_reduces_rateAPP_RED_LAUNCH_RATE
YARN_POOLHadoop Resource Pool (YARN)PERFapps_runningAPP_RUNNING
YARN_POOLHadoop Resource Pool (YARN)PERFapps_submitted_rateAPP_SUBMITTED_RATE
YARN_POOLHadoop Resource Pool (YARN)PERFallocated_memory_mbMEM_USED
YARN_POOLHadoop Resource Pool (YARN)PERFallocated_vcoresVCORES_USED
YARN_POOLHadoop Resource Pool (YARN)PERFreserved_vcoresVCORES_RESERVED
YARN_POOLHadoop Resource Pool (YARN)PERFpending_vcoresVCORES_PENDING
YARN_POOLHadoop Resource Pool (YARN)PERFreserved_memory_mbMEM_RESERVED1024*1024
YARN_POOLHadoop Resource Pool (YARN)PERFpending_memory_mbMEM_PENDING1024*1024
YARNHadoop YARN Resource ManagerCONF

total_allocated_vcores_across_yarn_pools

+total_available_vcores_across_yarn_pools

CPU_NUM
YARNHadoop YARN Resource ManagerCONF

total_allocated_memory_mb_across_yarn_pools

+total_available_memory_mb_across_yarn_pools

TOTAL_REAL_MEM1024*1024
SparkHadoop YARN Resource ManagerPERFtotalCoresAPP_SPARK_CORES
SparkHadoop YARN Resource ManagerPERFdiskUsedAPP_SPARK_DISK_BYTES
SparkHadoop YARN Resource ManagerPERFtotalGCTimeAPP_SPARK_GC_TIME
SparkHadoop YARN Resource ManagerPERFtotalInputBytesAPP_SPARK_INPUT_BYTES
SparkHadoop YARN Resource ManagerPERFmaxMemoryAPP_SPARK_MEM_TOTAL_BYTES
SparkHadoop YARN Resource ManagerPERFmemoryUsedAPP_SPARK_MEM_USED_BYTES
SparkHadoop YARN Resource ManagerPERFrddBlocksAPP_SPARK_RDD_BLOCKS
SparkHadoop YARN Resource ManagerPERFtotalShuffleReadAPP_SPARK_SHUFFLE_READ_BYTES
SparkHadoop YARN Resource ManagerPERFtotalShuffleWriteAPP_SPARK_SHUFFLE_WRITE_BYTES
SparkHadoop YARN Resource ManagerPERFtotalDurationAPP_SPARK_TASK_TIME
SparkHadoop YARN Resource ManagerPERFcompletedTasksAPP_SPARK_TASKS_COMPLETED
SparkHadoop YARN Resource ManagerPERFfailedTasksAPP_SPARK_TASKS_FAILED
SparkHadoop YARN Resource ManagerPERFtotalTasksAPP_SPARK_TASKS_TOTAL
SparkHadoop YARN Resource ManagerPERFtotalCoresBYAPP_SPARK_CORES
SparkHadoop YARN Resource ManagerPERFdiskUsedBYAPP_SPARK_DISK_BYTES
SparkHadoop YARN Resource ManagerPERFtotalGCTimeBYAPP_SPARK_GC_TIME
SparkHadoop YARN Resource ManagerPERFtotalInputBytesBYAPP_SPARK_INPUT_BYTES
SparkHadoop YARN Resource ManagerPERFmaxMemoryBYAPP_SPARK_MEM_TOTAL_BYTES
SparkHadoop YARN Resource ManagerPERFmemoryUsedBYAPP_SPARK_MEM_USED_BYTES
SparkHadoop YARN Resource ManagerPERFrddBlocksBYAPP_SPARK_RDD_BLOCKS
SparkHadoop YARN Resource ManagerPERFtotalShuffleReadBYAPP_SPARK_SHUFFLE_READ_BYTES
SparkHadoop YARN Resource ManagerPERFtotalShuffleWriteBYAPP_SPARK_SHUFFLE_WRITE_BYTES
SparkHadoop YARN Resource ManagerPERFtotalDurationBYAPP_SPARK_TASK_TIME
SparkHadoop YARN Resource ManagerPERFcompletedTasksBYAPP_SPARK_TASKS_COMPLETED
SparkHadoop YARN Resource ManagerPERFfailedTasksBYAPP_SPARK_TASKS_FAILED
SparkHadoop YARN Resource ManagerPERFtotalTasksBYAPP_SPARK_TASKS_TOTAL
YARNHadoop YARN Resource ManagerPERFyarn_application_file_bytes_read_rateAPP_FILE_READ_RATE
YARNHadoop YARN Resource ManagerPERFyarn_application_file_bytes_written_rateAPP_FILE_WRITE_RATE
YARNHadoop YARN Resource ManagerPERFyarn_application_hdfs_bytes_read_rateAPP_HDFS_BYTES_READ
YARNHadoop YARN Resource ManagerPERFyarn_application_hdfs_bytes_written_rateAPP_HDFS_BYTES_WRITE
YARNHadoop YARN Resource ManagerPERFyarn_application_maps_rateAPP_MAP_LAUNCH_RATE
YARNHadoop YARN Resource ManagerPERFyarn_application_reduces_rateAPP_RED_LAUNCH_RATE
YARNHadoop YARN Resource ManagerPERF

total_allocated_vcores_across_yarn_pools

VCORES_USED
YARNHadoop YARN Resource ManagerPERF

total_allocated_vcores_across_yarn_pools

/(total_allocated_vcores_across_yarn_pools

+total_available_vcores_across_yarn_pools)

CPU_UTIL
YARNHadoop YARN Resource ManagerPERFtotal_available_memory_mb_across_yarn_poolsMEM_FREE1024*1024
YARNHadoop YARN Resource ManagerPERFtotal_allocated_memory_mb_across_yarn_poolsMEM_USED1024*1024
YARNHadoop YARN Resource ManagerPERF

total_allocated_memory_mb_across_yarn_pools

/(total_allocated_memory_mb_across_yarn_pools

+total_available_memory_mb_across_yarn_pools)

MEM_UTIL
YARN_ALL_POOLSHadoop YARN Resource ManagerPERFapps_completed_cumulative_rateAPP_COMPLETION_RATE
YARN_ALL_POOLSHadoop YARN Resource ManagerPERFapps_failed_cumulative_rateAPP_FAILED_RATE
YARN_ALL_POOLSHadoop YARN Resource ManagerPERFapps_killed_cumulative_rateAPP_KILLED_RATE
YARN_ALL_POOLSHadoop YARN Resource ManagerPERFapps_pending_cumulativeAPP_PENDING
YARN_ALL_POOLSHadoop YARN Resource ManagerPERFapps_running_cumulativeAPP_RUNNING
YARN_ALL_POOLSHadoop YARN Resource ManagerPERFapps_submitted_cumulative_rateAPP_SUBMITTED_RATE

Was this page helpful? Yes No Submitting... Thank you

Comments