Overview of datasets in an ETL task

A dataset is a logical structure used to define and organize the data collected by an ETL task in a standard format.

The dataset structure is predefined in BMC Helix Continuous Optimization; every dataset has a name and a numeric identifier (datasetid) that univocally refers to a specific structure. Every structure details a group of metrics, for example, all metrics inherent to hard drives are defined in the System Disk (SYSDSK) dataset.

Datasets are used to ensure that the imported data satisfy a well defined format specification, and to let you know when some imported metric is unsupported (that is not listed in a dataset).

The Administration > DATA WAREHOUSE > Datasets & metrics page displays the available datasets, and lists the supported metrics for each dataset. BMC Helix Continuous Optimization integrators and ETL developers can find information about the dataset structure within the BMC Helix Continuous Optimization ETL Development Studio. For more information, see Developing custom ETLs.

Horizontal and Vertical datasets

There are two kinds of datasets in BMC Helix Continuous Optimization:

Horizontal datasets: For every supported metric there is one column, and every row defines a new data sample. Most datasets have a horizontal structure and using horizontal datasets, you can load only the metrics (OOTB and custom) related to the specified (instantiated) dataset.
Vertical datasets: They have generic column names, and each metric sample is added to a new row. Using vertical datasets, you can load any BMC Helix Continuous Optimization metric (all the metrics related to all the horizontal datasets). The SYSDAT or WKLDAT datasets in BMC Helix Continuous Optimization have a Vertical structure. SYSDAT is used to load system performance data, and system configuration data in a vertical structure, to System entities, whereas WKLDAT is used to load business driver performance data, and business driver configuration data in a vertical structure, to Business Driver entities, and APPDAT is used to load domain configuration data in a vertical structure, to domain entities. (Performance metrics are NOT applicable to domain entities.)

If you are going to import configuration and/or performance data contained in CSV files or you are writing a custom connector, you have to fill these two datasets in different ways:

In horizontal datasets, you can put values for more metrics in the same row.
In vertical datasets, for every metric, you have to fill one row specifying the OBJNAME (metric)

Example scenario

You want to import the samples for CPU_UTIL and CPU_UTILMHZ of two systems, each sampled at midnight.

Case 1: Horizontal dataset
Dataset name: SYSGLB

TS	DURATION	SYSNM	DS_SYSNM	SYSTYPENM	CPU_UTIL	CPU_UTILMHZ
2014-08-30 22:07:14	86400	System1	System1	gm:vmw	30	500
2014-08-30 22:07:14	86400	System2	System2	gm:vmw	60	600
2014-08-30 22:07:14	86400	System1	System1	gm:vmw	80	1100
2014-08-30 22:07:14	86400	System2	System2	gm:vmw	10	650

Case 2: Vertical dataset
Dataset name: SYSDAT

TS	DURATION	SYSNM	DS_SYSNM	SYS TYPENM	OBJNM	SUB OBJNM	VALUE
2014-08-30 22:07:14 00:00	86400	System1	System1	gm:vmw	CPU_UTIL	GLOBAL	500
2014-08-30 22:07:14 00:00	86400	System1	System1	gm:vmw	CPU_ UTILMHZ	GLOBAL	30
2014-08-30 22:07:14 00:00	86400	System2	System2	gm:vmw	CPU_UTIL	GLOBAL	600
2014-08-30 22:07:14 00:00	86400	System2	System2	gm:vmw	CPU_ UTILMHZ	GLOBAL	60
2014-08-30 22:07:14	86400	System1	System1	gm:vmw	CPU_UTIL	GLOBAL	1100
2014-08-30 22:07:14	86400	System1	System1	gm:vmw	CPU_ UTILMHZ	GLOBAL	80
2014-08-30 22:07:14	86400	System2	System2	gm:vmw	CPU_UTIL	GLOBAL	650
2014-08-30 22:07:14	86400	System2	System2	gm:vmw	CPU_ UTILMHZ	GLOBAL	10

Dataset conventions

The following conventions must be kept in mind while working with datasets:

Timestamps must be in the YYYY-MM-DD HH:MI:SS format.
Data for percentage metrics must be between 0 and 1.
Data for the Duration column must be in seconds.

Overview of datasets in an ETL task

Dataset conventions

Related topics

Comments