Overview of datasets in an ETL task
A dataset is a logical structure used to define and organize the data collected by an ETL task in a standard format.
The dataset structure is predefined in BMC Helix Capacity Optimization; every dataset has a name and a numeric identifier (datasetid) that univocally refers to a specific structure. Every structure details a group of metrics, for example, all metrics inherent to hard drives are defined in the System Disk (SYSDSK) dataset.
Datasets are used to ensure that the imported data satisfy a well defined format specification, and to let you know when some imported metric is unsupported (that is not listed in a dataset).
The Administration > DATA WAREHOUSE > Datasets & metrics page displays the available datasets, and lists the supported metrics for each dataset. BMC Helix Capacity Optimization integrators and ETL developers can find information about the dataset structure within the BMC Helix Capacity Optimization ETL Development Studio. For more information, see Developing-custom-ETLs.
Horizontal and Vertical datasets
There are two kinds of datasets in BMC Helix Capacity Optimization:
- Horizontal datasets: For every supported metric there is one column, and every row defines a new data sample. Most datasets have a horizontal structure, and using horizontal datasets, you can load only the metrics (OOTB and custom) related to the specified (instantiated) dataset.
- Vertical datasets: They have generic column names, and each metric sample is added to a new row. Using vertical datasets, you can load any BMC Helix Capacity Optimization metric (all the metrics related to all the horizontal datasets). The SYSDAT or WKLDAT datasets in BMC Helix Capacity Optimization have a Vertical structure. SYSDAT is used to load system data in a vertical structure, whereas WKLDAT is used to load business driver data in a vertical structure.
If you are going to import configuration and/or performance data contained in CSV files or you are writing a custom connector, you have to fill these two datasets in different ways:
- In horizontal datasets, you can put values for more metrics in the same row.
- In vertical datasets, for every metric you have to fill one row specifying the OBJNAME (metric)
Dataset conventions
The following conventions must be kept in mind while working with datasets:
- Timestamps must be in the YYYY-MM-DD HH:MI:SS format.
- Data for percentage metrics must be between 0 and 1.
- Data for the Duration column must be in seconds.
Related topics
