ETL data flow and the lastcounter value
Each time a connector runs, it needs to pick up the data from its data source where it left off the last time. If the data source is changing at the same time, then the connector, as the consumer, needs to track data flow through the data source that the upstream producer is populating.
If the data source is using time stamps or version numbers, then the connector needs to use these to keep track of which data it extracted the last time it ran. TrueSight Capacity Optimization provides the connector with a mechanism, the lastcounter
value, that the connector can store in TrueSight Capacity Optimization between runs.
If the data source is not exposing any time stamps, version numbers, or similar metadata, then the connector has no reliable way to detect changes. In this case, the connector must make assumptions based on knowledge about the data source. For example, if an external database is being updated with a daily feed, then the connector might assume that each daily run is extracting a new day's data. In this case, there is no need to use the lastcounter
mechanism.
About the lastcounter
value
The lastcounter
value is an arbitrary string that the connector can store in TrueSight Capacity Optimization after the extraction is complete. This string should denote the point in the data flow that the next run should start from. It could be a time stamp, or a sequence or record number, or a version number, that is meaningful to the connector.
TrueSight Capacity Optimization will store this string as part of the "status" of the connector, and the TrueSight Capacity Optimization administrator can view this string for each run, as follows.
Lastcounter status detail example
The above screen shot shows two examples of lastcounter
(numbered 3 in the figure), one that looks like a timestamp, and a second that looks like a sequence or record number. Along with the lastcounter
value, the connector can also store a relevant message
string that might be helpful to the administrator (numbered 2 in the figure).
Prior to beginning extraction, the connector should load the lastcounter
string from TrueSight Capacity Optimization and use its value to decide where to start extraction from in the data source.
Multiple data flows and table of lastcounter
values
Consider a connector that is extracting data from an external database. There may be multiple processes populating different sub-sets of data to that database, on different schedules.
A single connector can manage the data flow for each such sub-set of data separately,so that it can keep track of the data that was extracted the last time. The mechanism for this is a table of lastcounter
values that the connector can maintain within TrueSight Capacity Optimization.
The connector can identify each sub-setof data with a name, called a data flow, and track a separate status for each data flow with a separate lastcounter
value. The data flow, then, is the most granular part of data that the connector intends to manage in append mode. For example, if a particular ETL imports data from two different tables, it may want to remember the lastcounter
value separately for each one of them. The DSStatus
object allows you to specify a different lastcounter
parameter for each data flow.
In the Status detail example above, a single connector has two data flows, identified in the column SOURCE
(numbered 1 in the figure).
How to choose data flows
Consider an example where an ETL imports CPU data from a table into a database. This gives you the following choices for data flow identification for each monitored system:
Choices for data flow identification
Using a single data flow | Using separate data flows |
---|---|
Making a single query | Making a query for each system |
Setting the | Setting the |
If one system stops publishing while others still are, ETL cannot recover this non-working system's data. ETL continues to import data using the most recent flow as reference. | This choice is more granular, and thus more precise. If a single system stops publishing data and then restarts, the ETL isaware,and can recover this data. Although, this method turns out to be more expensive in terms of resource consumption, as it results in extensive query processing. |
How to store lastcounter
: the DSStatus
object
The DSStatus
object allows you to set the value of a lastcounter
parameter for each data flow. The connector chooses strings for identifying data flows, and for each data flow, it interprets the corresponding lastcounter
parameter.
After the very first connector execution, the connector should call Insert status
in order to create a new DSStatus
object.
In the subsequent executions, you can get the last counter and modify it inside the DSStaus
object using getLastCntr(<dataflowname>);
and setLastCntr(<dataflowname>);
After the data extraction, before exiting from the code, you have to save the updated last counter values by calling the updateStatus() method onDBConfobject. Here is an example:
conf.updateStatus(status);
When updateStatus()
is invoked, changes done via setLastCntr
and setMsg()
are stored into the database and become visible from the Status detail
page.
The following table lists some commonly used DSStatus
methods along with their descriptions:
Commonly used DSStatus
methods
Method | Description |
---|---|
status.getLastCntr(<dataflowname>); | Read the |
status.getMsg(<dataflowname>); | Read the message for a data flow |
status.setLastCntr(<dataflowname>,<lastcounter>); | Set the |
status.setMsg(<dataflowname>,<message>); | Set a human-readable message for a data flow |
String[] dataflownames = status.getLastCntrTables(); | List the data flows for which |
Comments
Log in or register to comment.