This topic contains the following sections:
A dataset is an instance of the
ETL::DataSet class that has a sequential-write, and random-read matrix data structure.
It can be better understood as a collection of data presented in tabular form, where each column represents a particular variable, and each row corresponds to a given member of the dataset in question. Sequential-write means that rows can be added only to the end of the data structure, and random-read means you can read a row or a column present at any position in the dataset.
The Dataset object is not loaded entirely into memory, as it is very large in size. Instead, ETL uses a temporary file to save data to disk, and later restore it into memory to prevent data loss, and to ensure efficient handling of the dataset.
The ETL Datasets view in Integration Studio provides information about the structure of the dataset. The following figure shows a list of available datasets with their unique IDs, names, and other related information.
The ETL Datasets view
The structure of the preceding figure can be explained as follows:
|Icons to identify special columns|
In an ETL, the structure of every dataset contained in the result set must be standard and similar. This can be achieved by using the
ETL::DefChecker object, that correctly initializes dataset columns and creates the appropriate structure.
Populating a dataset comprises the following activities:
The following code illustrates how to create and initialize a
DataSet object corresponding to a standard dataset named
To add data to the dataset, use the
addRow method explicitly. The parameter array in the call must positionally correspond to the column structure of the dataset. The following code illustrates this operation:
A rather effective way to add data to datasets on the basis of column names is associative population. This approach has an advantage in that it does not depend on the structure of the dataset. The necessary steps required to achieve this are:
The following code illustrates the preceding steps:
After you populate a dataset, you can read it using the following methods:
Row and column indices range from 0 to -1.
The following code explains this Read activity:
To copy data from a source dataset to target as per the target dataset's structure, use the