This topic contains the following sections:
The ETL (Extract, Transform and Load) process is realized by different modules that run on top of a common engine framework (see ETL development API constructs for details). As shown in the diagram, the data import process is divided in three phases:
During the loading phase, the ETL engine stores data into the stage area. The warehousing engine completes the job by processing the data and aggregating it at the standard time aggregation levels.
It is important to note that "before" and "after" commands are part of the import process but have nothing to do with the data itself: they are simply instrumental to the smooth execution of the process. These commands can be written in any language, for example, shell script, Java, C#, etc.
An action is an operation that can be performed at any step of the import process and that generally deals with data manipulation.
The modularity of the process allows you to change many of its settings and modules by only editing the configuration, without writing a single line of code.
The following diagram summarizes the standard ETL workflow:
Sometimes data sources adhere to a naming convention that is different from the one used by BMC TrueSight Capacity Optimization: This makes it necessary to translate names according to the naming policy of the data warehouse.
It is possible to configure ETL-specific lookup tables which set, for each entity, the translation from the name used by the ETL task (ds_sysnm or ds_wklnm metrics) to the identifier used into the data warehouse (sysnm or wklnm). This process decouples sources naming from DWH naming and is unique for each ETL.
The lookup tables are automatically populated by ETL tasks when they find new objects in the source data; a new entry will be created for each new object, with identical data warehouse and data source names. You can manually modify the DWH name to make it comply to the BMC TrueSight Capacity Optimization naming policies.
BMC TrueSight Capacity Optimization has no built-in naming policy. You can set your own naming policy and then ensure that each new entity name complies to it.
It is important to spot potential naming conflicts when creating new ETL tasks.
If the measured entities are new, the ETL will automatically propose a name that you can manually modify. On the other hand, if the entities are already measured by another ETL, they are already associated to a name in the warehouse.
If you detect this situation before the first run of the ETL task, you can perform manual or shared lookup and solve the issue; this is the recommended solution. If you detect this situation only after running the ETL task, then the reconciliation process has to be performed.
To avoid naming conflicts, remember to run a newly configured ETL task in simulation mode, and to check the imported data before actually importing it into the BMC TrueSight Capacity Optimization Data Warehouse. Lookup reconciliation is possible but discouraged; you should always try to fix every lookup issue before running the ETL task for the first time.
When running ETL tasks, only new data must be loaded into the warehouse; thus, a way to mark imported data is needed.
Each ETL has an associated parameter called lastcounter which keeps track of the last imported samples. This counter is updated after every run and, at the start of the next one, the extractor module reads it to distinguish between old and new data.
As ETL extractors deal with both databases and files, a different lastcounter logic is required. It can be a timestamp, as it is usually for ETL tasks that collect data from databases, or a regular expression, to prevent the ETL task to parse files with a specific extension indicating that they have been already examined. In particular: