Using ETL datasets
A dataset is an instance of the ETL::DataSet class that has a sequential-write, and random-read matrix data structure.
It can be better understood as a collection of data presented in tabular form, where each column represents a particular variable, and each row corresponds to a given member of the dataset in question. Sequential-write means that rows can be added only to the end of the data structure, and random-read means you can read a row or a column present at any position in the dataset.
The Dataset object is not loaded entirely into memory, as it is very large in size. Instead, ETL uses a temporary file to save data to disk, and later restore it into memory to prevent data loss, and to ensure efficient handling of the dataset.
Populating the dataset
In an ETL, the structure of every dataset contained in the result set must be standard and similar. This can be achieved by using the ETL::DefChecker object, that correctly initializes dataset columns and creates the appropriate structure.
Populating a dataset comprises the following activities:
Creating and initializing a dataset
The following code illustrates how to create and initialize a DataSet object corresponding to a standard dataset named SYSGLB:
my $dc = $conf->getDefChecker();
$dc->initializeColumns($res);
Adding data to a dataset
To add data to the dataset, use the addRow method explicitly. The parameter array in the call must positionally correspond to the column structure of the dataset. The following code illustrates this operation:
Associative population
A rather effective way to add data to datasets on the basis of column names is associative population. This approach has an advantage in that it does not depend on the structure of the dataset. The necessary steps required to achieve this are:
- Create a new and empty row in the dataset.
- Populate each column separately.
- Add the new row to the dataset.
The following code illustrates the preceding steps:
$res->fillRow("A",1,\@row); # Populate column 'A' #
$res->fillRow("B","beta",\@row); # Populate column 'B' #
$res->addRow(@row); # Add rows 'A' and 'B' to the dataset #
Reading the dataset
After you populate a dataset, you can read it using the following methods:
- getRow
- get
The following code explains this Read activity:
for (my $rowidx = 0; $rowidx<$res->size();$rowidx++) {
# getting the entire row as an array
my @row = $res->getRow($rowidx);
# reading associatively from the row array
my $adata = $res->getByColName("a",@row);
# or, equivalently finding the column index
my $colidx = $res->findColumn("a");
# and then accessing positionally by (row,column)
$adata = $res->get($rowidx,$colidx);
}
Copying data between datasets
To copy data from a source dataset to target as per the target dataset's structure, use the fillAll method.
Related topics