Overview of datasets in an ETL task
A dataset is a logical structure used to define and organize the data collected by an ETL task in a standard format.
The dataset structure is predefined in BMC Helix Continuous Optimization; every dataset has a name and a numeric identifier (datasetid) that univocally refers to a specific structure. Every structure details a group of metrics, for example, all metrics inherent to hard drives are defined in the System Disk (SYSDSK) dataset.
Datasets are used to ensure that the imported data satisfy a well defined format specification, and to let you know when some imported metric is unsupported (that is not listed in a dataset).
The Administration > DATA WAREHOUSE > Datasets & metrics page displays the available datasets, and lists the supported metrics for each dataset. BMC Helix Continuous Optimization integrators and ETL developers can find information about the dataset structure within the BMC Helix Continuous Optimization ETL Development Studio. For more information, see Developing custom ETLs.
Horizontal and Vertical datasets
There are two kinds of datasets in BMC Helix Continuous Optimization:
- Horizontal datasets: For every supported metric there is one column, and every row defines a new data sample. Most datasets have a horizontal structure and using horizontal datasets, you can load only the metrics (OOTB and custom) related to the specified (instantiated) dataset.
- Vertical datasets: They have generic column names, and each metric sample is added to a new row. Using vertical datasets, you can load any BMC Helix Continuous Optimization metric (all the metrics related to all the horizontal datasets). The SYSDAT or WKLDAT datasets in BMC Helix Continuous Optimization have a Vertical structure. SYSDAT is used to load system performance data, and system configuration data in a vertical structure, to System entities, whereas WKLDAT is used to load business driver performance data, and business driver configuration data in a vertical structure, to Business Driver entities, and APPDAT is used to load domain configuration data in a vertical structure, to domain entities. (Performance metrics are NOT applicable to domain entities.)
If you are going to import configuration and/or performance data contained in CSV files or you are writing a custom connector, you have to fill these two datasets in different ways:
- In horizontal datasets, you can put values for more metrics in the same row.
- In vertical datasets, for every metric, you have to fill one row specifying the OBJNAME (metric)
Example scenario
You want to import the samples for CPU_UTIL and CPU_UTILMHZ of two systems, each sampled at midnight.
Case 1: Horizontal dataset
Dataset name: SYSGLB
TS | DURATION | SYSNM | DS_SYSNM | SYSTYPENM | CPU_UTIL | CPU_UTILMHZ |
---|---|---|---|---|---|---|
2014-08-30 22:07:14 | 86400 | System1 | System1 | gm:vmw | 30 | 500 |
2014-08-30 22:07:14 | 86400 | System2 | System2 | gm:vmw | 60 | 600 |
2014-08-30 22:07:14 | 86400 | System1 | System1 | gm:vmw | 80 | 1100 |
2014-08-30 22:07:14 | 86400 | System2 | System2 | gm:vmw | 10 | 650 |
Case 2: Vertical dataset
Dataset name: SYSDAT
TS | DURATION | SYSNM | DS_SYSNM | SYS TYPENM | OBJNM | SUB OBJNM | VALUE |
---|---|---|---|---|---|---|---|
2014-08-30 22:07:14 00:00 | 86400 | System1 | System1 | gm:vmw | CPU_UTIL | GLOBAL | 500 |
2014-08-30 22:07:14 00:00 | 86400 | System1 | System1 | gm:vmw | CPU_ UTILMHZ | GLOBAL | 30 |
2014-08-30 22:07:14 00:00 | 86400 | System2 | System2 | gm:vmw | CPU_UTIL | GLOBAL | 600 |
2014-08-30 22:07:14 00:00 | 86400 | System2 | System2 | gm:vmw | CPU_ UTILMHZ | GLOBAL | 60 |
2014-08-30 22:07:14 | 86400 | System1 | System1 | gm:vmw | CPU_UTIL | GLOBAL | 1100 |
2014-08-30 22:07:14 | 86400 | System1 | System1 | gm:vmw | CPU_ UTILMHZ | GLOBAL | 80 |
2014-08-30 22:07:14 | 86400 | System2 | System2 | gm:vmw | CPU_UTIL | GLOBAL | 650 |
2014-08-30 22:07:14 | 86400 | System2 | System2 | gm:vmw | CPU_ UTILMHZ | GLOBAL | 10 |
Dataset conventions
The following conventions must be kept in mind while working with datasets:
- Timestamps must be in the YYYY-MM-DD HH:MI:SS format.
- Data for percentage metrics must be between 0 and 1.
- Data for the Duration column must be in seconds.
Comments
Log in or register to comment.