Overview of datasets in an ETL task

A dataset is a logical structure used to define and organize the data collected by an ETL task in a standard format.

The dataset structure is predefined in

WarningSome content is unavailable due to permissions.

; every dataset has a name and a numeric identifier (datasetid) that univocally refers to a specific structure. Every structure details a group of metrics, for example, all metrics inherent to hard drives are defined in the System Disk (SYSDSK) dataset.

Datasets are used to ensure that the imported data satisfy a well defined format specification, and to let you know when some imported metric is unsupported (that is not listed in a dataset).

The Administration > DATA WAREHOUSE > Datasets & metrics page displays the available datasets, and lists the supported metrics for each dataset.

WarningSome content is unavailable due to permissions.

integrators and ETL developers can find information about the dataset structure within the

WarningSome content is unavailable due to permissions.

ETL Development Studio. For more information, see Developing-custom-ETLs.

Horizontal and Vertical datasets

There are two kinds of datasets in

WarningSome content is unavailable due to permissions.

Horizontal datasets: For every supported metric there is one column, and every row defines a new data sample. Most datasets have a horizontal structure and using horizontal datasets, you can load only the metrics (OOTB and custom) related to the specified (instantiated) dataset.
Vertical datasets: They have generic column names, and each metric sample is added to a new row. Using vertical datasets, you can load any
WarningSome content is unavailable due to permissions.
metric (all the metrics related to all the horizontal datasets). The SYSDAT or WKLDAT datasets in
WarningSome content is unavailable due to permissions.
have a Vertical structure. SYSDAT is used to load system performance data, and system configuration data in a vertical structure, to System entities, whereas WKLDAT is used to load business driver performance data, and business driver configuration data in a vertical structure, to Business Driver entities, and APPDAT is used to load domain configuration data in a vertical structure, to domain entities. (Performance metrics are NOT applicable to domain entities.)

If you are going to import configuration and/or performance data contained in CSV files or you are writing a custom connector, you have to fill these two datasets in different ways:

In horizontal datasets, you can put values for more metrics in the same row.
In vertical datasets, for every metric, you have to fill one row specifying the OBJNAME (metric)

Example scenario

You want to import the samples for CPU_UTIL and CPU_UTILMHZ of two systems, each sampled at midnight.

Case 1: Horizontal dataset
Dataset name: SYSGLB

Case 2: Vertical dataset
Dataset name: SYSDAT

Dataset conventions

The following conventions must be kept in mind while working with datasets:

Timestamps must be in the YYYY-MM-DD HH:MI:SS format.
Data for percentage metrics must be between 0 and 1.
Data for the Duration column must be in seconds.

Overview of datasets in an ETL task

Dataset conventions

Related topics

BMC Helix Continuous Optimization 25.1

On this page