Collection Schedules, Extraction Windows and Latency General

Collectors extract data on a schedule. This schedule is known as the ‘Collection Schedule’ and is typically set when initially configuring a collector, or subsequently modified

This tells the collector how often to run the extraction cycle. So if we set a Collection Schedule of 5min, we’d generally expect to run every 5 min, as shown :

However, a point in time like this is not sufficient to fully specify a range of time over which data is selected. The data sources typically have selection criteria like ‘before this time’, ‘between these times’. SWP therefore determines a window of time, within with, data is selected. In other words, SWP requests that the source provide data within a specific window of time.

Starting from a point in time, there are number of approaches or conventions which can be applied to determine an actual time window, as well how to interpret the boundaries of that time window e.g. include data with timestamps <= end of the boundary. Also, the time windows can have different widths :

So, to describe the time window, a few conventions are employed. First is the Data Time Window :

This is often the ‘width’ of the window, how much time it spans. Sometimes it can be considered the ‘period’ of the data e.g. 5min data, 15min data. Width alone is not sufficient to locate the Window on the time line. The location of a leading (or trailing) edge must be defined.

This is determined by using the ‘Collection Schedule’ time ie the time the schedule is run (‘Now’). Given a time ‘Now’, and the Data Time Window, we can precisely locate the extraction window on the timeline:

However, it proves to be much more useful in practice to first ‘align’ or ‘snap’ the extraction time to one of the ‘major’ time intervals (e.g. this make it much easier to see which extraction data sets correspond to which intervals of time during troubleshooting). The ‘major’ interval depends on the extraction cycle, at 15min extractions, the intervals would be 00:00, 00:15, 00:30, 00:45 etc At 1min they would be 00:00, 00:01, 00:02, 00:03 etc, and at 5min they’d be shown as on the timeline above ( 00:00, 00:05, 00:10…)

Using our example of 5min extraction and 5min Data Time Window we would have:

In the above example, if we started an extraction cycle just after 00:30, first SWP would ‘snap’ back to 00:30, and then use the Data Time Window (5min) to arrive at a time window of 00:25-00:30. It would request data for this time window from the source. Usually the queries sent to the sources are >= start time and < end time.

One other extremely useful configuration parameter (in slow, late data availability situations) is also available ie the Data Latency

This lets you control how far back the query time bounds are shifted. In the example below, the Data Latency is set to ~2min This lets you control how far back the query time bounds are shifted. In the example below, the Data Latency is set to ~2min and so the extraction window would be from 00:23 - 00:28 (when run from just after 00:30.

So, in summary:

Collection Schedule controls how often data extractions are attempted
Data Time Window controls how much data to try to get (by time)
Data Latency controls who far back to ‘shift’ the time window

Collection Schedules, Extraction Windows and Latency General

Comments