Using ETL datasets
A dataset is an instance of the ETL::DataSet
class that has a sequential-write, and random-read matrix data structure.
It can be better understood as a collection of data presented in tabular form, where each column represents a particular variable, and each row corresponds to a given member of the dataset in question. Sequential-write means that rows can be added only to the end of the data structure, and random-read means you can read a row or a column present at any position in the dataset.
The Dataset object is not loaded entirely into memory, as it is very large in size. Instead, ETL uses a temporary file to save data to disk, and later restore it into memory to prevent data loss, and to ensure efficient handling of the dataset.
Populating the dataset
In an ETL, the structure of every dataset contained in the result set must be standard and similar. This can be achieved by using the ETL::DefChecker
object, that correctly initializes dataset columns and creates the appropriate structure.
Populating a dataset comprises the following activities:
Creating and initializing a dataset
The following code illustrates how to create and initialize a DataSet
object corresponding to a standard dataset named SYSGLB
:
$res = new ETL::DataSet("SYSGLB");
my $dc = $conf->getDefChecker();
$dc->initializeColumns($res);
Adding data to a dataset
To add data to the dataset, use the addRow
method explicitly. The parameter array in the call must positionally correspond to the column structure of the dataset. The following code illustrates this operation:
$res->addRow(1,"alpha");
Associative population
A rather effective way to add data to datasets on the basis of column names is associative population. This approach has an advantage in that it does not depend on the structure of the dataset. The necessary steps required to achieve this are:
- Create a new and empty row in the dataset.
- Populate each column separately.
- Add the new row to the dataset.
The following code illustrates the preceding steps:
my @row = $res->newRow(); # Create an empty row in the dataset #
$res->fillRow("A",1,\@row); # Populate column 'A' #
$res->fillRow("B","beta",\@row); # Populate column 'B' #
$res->addRow(@row); # Add rows 'A' and 'B' to the dataset #
Reading the dataset
After you populate a dataset, you can read it using the following methods:
getRow
get
Note
Row and column indices range from 0 to -1.
The following code explains this Read activity:
# a loop through DataSet rows
for (my $rowidx = 0; $rowidx<$res->size();$rowidx++) {
# getting the entire row as an array
my @row = $res->getRow($rowidx);
# reading associatively from the row array
my $adata = $res->getByColName("a",@row);
# or, equivalently finding the column index
my $colidx = $res->findColumn("a");
# and then accessing positionally by (row,column)
$adata = $res->get($rowidx,$colidx);
}
Copying data between datasets
To copy data from a source dataset to target as per the target dataset's structure, use the fillAll
method.
Comments
Log in or register to comment.