ETL data flow and the lastcounter value

Each time a connector runs, it needs to pick up the data from its data source where it left off the last time. If the data source is changing at the same time, then the connector, as the consumer, needs to track data flow through the data source that the upstream producer is populating. 

If the data source is using time stamps or version numbers, then the connector needs to use these to keep track of which data it extracted the last time it ran. TrueSight Capacity Optimization provides the connector with a mechanism, the lastcounter value, that the connector can store in TrueSight Capacity Optimization between runs.

If the data source is not exposing any time stamps, version numbers, or similar metadata, then the connector has no reliable way to detect changes. In this case, the connector must make assumptions based on knowledge about the data source. For example, if an external database is being updated with a daily feed, then the connector might assume that each daily run is extracting a new day's data. In this case, there is no need to use the lastcounter mechanism.

About the lastcounter value

The lastcounter value is an arbitrary string that the connector can store in TrueSight Capacity Optimization after the extraction is complete. This string should denote the point in the data flow that the next run should start from. It could be a time stamp, or a sequence or record number, or a version number, that is meaningful to the connector. 

TrueSight Capacity Optimization will store this string as part of the "status" of the connector, and the TrueSight Capacity Optimization administrator can view this string for each run, as follows.

Lastcounter status detail example

The above screen shot shows two examples of lastcounter (numbered 3 in the figure), one that looks like a timestamp, and a second that looks like a sequence or record number. Along with the lastcounter value, the connector can also store a relevant message string that might be helpful to the administrator (numbered 2 in the figure).

Prior to beginning extraction, the connector should load the lastcounter string from TrueSight Capacity Optimization and use its value to decide where to start extraction from in the data source.

Multiple data flows and table of lastcounter values

Consider a connector that is extracting data from an external database. There may be multiple processes populating different sub-sets of data to that database, on different schedules.

A single connector can manage the data flow for each such sub-set of data separately,so that it can keep track of the data that was extracted the last time. The mechanism for this is a table of lastcounter values that the connector can maintain within TrueSight Capacity Optimization.

The connector can identify each sub-setof data with a name, called a data flow, and track a separate status for each data flow with a separate lastcounter value. The data flow, then, is the most granular part of data that the connector intends to manage in append mode. For example, if a particular ETL imports data from two different tables, it may want to remember the lastcounter value separately for each one of them. The DSStatus object allows you to specify a different lastcounter parameter for each data flow.

In the Status detail example above, a single connector has two data flows, identified in the column SOURCE (numbered 1 in the figure).

How to choose data flows

Consider an example where an ETL imports CPU data from a table into a database. This gives you the following choices for data flow identification for each monitored system:

Choices for data flow identification

Using a single data flow

Using separate data flows

Making a single query

Making a query for each system

Setting the lastcounter parameter to the timestamp of the most recently extracted data

Setting the lastcounter parameter to the timestamp of the most recently extracted data for each system

If one system stops publishing while others still are, ETL cannot recover this non-working system's data. ETL continues to import data using the most recent flow as reference.
Although, a manual recovery of data for all systems can later be performed by modifying the lastcounter. A situation in which the alignment between flows is ensured by the source, this choice is preferred.

This choice is more granular, and thus more precise. If a single system stops publishing data and then restarts, the ETL isaware,and can recover this data. Although, this method turns out to be more expensive in terms of resource consumption, as it results in extensive query processing.

How to store lastcounter: the DSStatus object

The DSStatus object allows you to set the value of a lastcounter parameter for each data flow. The connector chooses strings for identifying data flows, and for each data flow, it interprets the corresponding lastcounter parameter.

After the very first connector execution, the connector should call Insert status in order to create a new DSStatus object.

In the subsequent executions, you can get the last counter and modify it inside the DSStaus object using getLastCntr(<dataflowname>); and setLastCntr(<dataflowname>);

Example - first time creation

//Get the current status
DSStatus status = conf.getStatus();
if (status == null){
	//create a new status
	status = new DSStatus();
	//sets initial lastcounter and message
	status.setLastCntr("MAINFLOW","1900-01-01 00:00:00");
	status.setMsg("MAINFLOW","New flow");
	//and insert it
	conf.insertStatus(status);
}

Example - get and set

String startpoint = status.getLastCntr("MAINFLOW");
//Perfrom data extraction here
status.setLastCntr("MAINFLOW", "Add here last extracted point");

After the data extraction, before exiting from the code, you have to save the updated last counter values by calling the updateStatus() method onDBConfobject. Here is an example:

conf.updateStatus(status);

When updateStatus() is invoked, changes done via setLastCntr and setMsg() are stored into the database and become visible from the Status detailpage.

The following table lists some commonly used DSStatus methods along with their descriptions:

Commonly used DSStatus methods

Method

Description

status.getLastCntr(<dataflowname>);

Read the lastcounter parameter for a data flow

status.getMsg(<dataflowname>);

Read the message for a data flow

status.setLastCntr(<dataflowname>,<lastcounter>);

Set the lastcounter parameter for a data flow

status.setMsg(<dataflowname>,<message>);

Set a human-readable message for a data flow

String[] dataflownames = status.getLastCntrTables();

List the data flows for which DSStatus has a valid lastcounter

Was this page helpful? Yes No Submitting... Thank you

Comments