ETL data flow and the lastcounter value


Each time a connector runs, it needs to pick up the data from its data source where it left off the last time. If the data source is changing at the same time, then the connector, as the consumer, needs to track data flow through the data source that the upstream producer is populating. 

If the data source is using time stamps or version numbers, then the connector needs to use these to keep track of which data it extracted the last time it ran. BMC Helix Continuous Optimization provides the connector with a mechanism, the lastcounter value, that the connector can store in BMC Helix Continuous Optimization between runs.

If the data source is not exposing any time stamps, version numbers, or similar metadata, then the connector has no reliable way to detect changes. In this case, the connector must make assumptions based on knowledge about the data source. For example, if an external database is being updated with a daily feed, then the connector might assume that each daily run is extracting a new day's data. In this case, there is no need to use the lastcounter mechanism.

About the lastcounter value

The lastcounter value is an arbitrary string that the connector can store in BMC Helix Continuous Optimization after the extraction is complete. This string should denote the point in the data flow that the next run should start from. It could be a time stamp, or a sequence or record number, or a version number, that is meaningful to the connector. 

BMC Helix Continuous Optimization stores this string as part of the "status" of the connector, and an administrator can view this string for each run, as follows.

Lastcounter status detail example

lastcounter_status_detail.PNG

 

The above screen shot shows two examples of lastcounter (numbered 3 in the figure), one that looks like a timestamp, and a second that looks like a sequence or record number. Along with the lastcounter value, the connector can also store a relevant message string that might be helpful to the administrator (numbered 2 in the figure).

Prior to beginning extraction, the connector should load the lastcounter string from BMC Helix Continuous Optimization and use its value to decide where to start extraction from in the data source.

Multiple data flows and table of lastcounter values

Consider a connector that is extracting data from an external database. There may be multiple processes populating different sub-sets of data to that database, on different schedules.

A single connector can manage the data flow for each such sub-set of data separately,so that it can keep track of the data that was extracted the last time. The mechanism for this is a table of lastcounter values that the connector can maintain within BMC Helix Continuous Optimization.

The connector can identify each sub-setof data with a name, called a data flow, and track a separate status for each data flow with a separate lastcounter value. The data flow, then, is the most granular part of data that the connector intends to manage in append mode. For example, if a particular ETL imports data from two different tables, it may want to remember the lastcounter value separately for each one of them. The DSStatus object allows you to specify a different lastcounter parameter for each data flow.

In the Status detail example above, a single connector has two data flows, identified in the column SOURCE (numbered 1 in the figure).

How to choose data flows

Consider an example where an ETL imports CPU data from a table into a database. This gives you the following choices for data flow identification for each monitored system:

Choices for data flow identification

How to store lastcounter: the DSStatus object

The DSStatus object allows you to set the value of a lastcounter parameter for each data flow. The connector chooses strings for identifying data flows, and for each data flow, it interprets the corresponding lastcounter parameter.

After the very first connector execution, the connector should call Insert status in order to create a new DSStatus object.

In the subsequent executions, you can get the last counter and modify it inside the DSStaus object using getLastCntr(<dataflowname>); and setLastCntr(<dataflowname>);

Information
Example - first time creation
//Get the current status
DSStatus status = conf.getStatus();
if (status == null){
//create a new status
status = new DSStatus();
//sets initial lastcounter and message
status.setLastCntr("MAINFLOW","1900-01-01 00:00:00");
status.setMsg("MAINFLOW","New flow");
//and insert it
conf.insertStatus(status);
}
Information
Example - get and set
String startpoint = status.getLastCntr("MAINFLOW");
//Perfrom data extraction here
status.setLastCntr("MAINFLOW", "Add here last extracted point");

After the data extraction, before exiting from the code, you have to save the updated last counter values by calling the updateStatus() method onDBConfobject. Here is an example:

conf.updateStatus(status);

When updateStatus() is invoked, changes done via setLastCntr and setMsg() are stored into the database and become visible from the Status detailpage.

The following table lists some commonly used DSStatus methods along with their descriptions:

Commonly used DSStatus methods

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Continuous Optimization 24.2