Data collection
Data collection schedule
Collectors extract data based on a schedule, which is known as the data collection schedule. This schedule indicates how often to run the extraction cycle.
For example, if you set the data collection schedule to 5 mins, the collector runs every 5 mins, as shown in the following image:
You can run the collector in one of the following ways:
- Constantly by specifying the schedule in minutes, hours, or days
Periodically by specifying the schedule using a cron expression, which is a string consisting of five subexpressions (also called fields) that describe individual details of the schedule. These fields, separated by white spaces, can contain any of the allowed values with various combinations of the allowed characters for that field. These expressions can be useful in handling delayed availability data situations or to avoid having all the extractions query the source at the same time.
You can specify a cron expression in the following format: Minutes Hours Day of Month Month Day of WeekThe following table shows the allowed values for different fields:
Field
Allowed values
Minutes
0-59
Hours
(24-hour clock format)
0-23
Day of Month
1-31
Month
1-12
Day of Week
0-6
or
Sun-SatFor example, if you specify 10 14 3 3 *, data is collected at 14:10 hours every third day in the month of March.
Data collection window
Data collection schedule is not sufficient for specifying the range of time over which the data should be extracted because the data sources typically have selection criteria such as before this time and between these times. Another parameter, data collection window enables you to specify the range of time (window) over which data should be extracted. For example, if you set the data collection schedule to 5 mins (considering 00:32 as the current time) and the data collection window to 5 mins, data is extracted from 00:27 to 00:32, as shown in the following image:
We recommend you to align or snap the extraction time (shown as Now in the preceding image) to one of the major time intervals. This alignment or snapping makes it much easier to see which extraction data sets correspond to which intervals of time during troubleshooting. The major interval depends on the extraction cycle. At 15 mins extractions, the intervals would be 00:00, 00:15, 00:30, 00:45, etc. At 1 min, they would be 00:00, 00:01, 00:02, 00:03, etc., and at 5 mins, they would be 00:00, 00:05, 00:10, etc. as shown on the preceding timeline.
For example, if we align the extraction time to 00:30, the data time window would be 00:25 to 00:30, as shown in the following image:
In the preceding example, if we started an extraction cycle just after 00:30, first BMC Helix Intelligent Integrations would snap back to 00:30, and then use the data time window (5 mins) to arrive at a time window of 00:25-00:30. It would request data for this time window from the source.
Data latency
The data latency parameter lets you control how far back the query time bounds are shifted. This parameter is useful in slow, late data availability situations.
In the following example, data latency is set to ~2 mins, and so the extraction window would be from 00:23 - 00:28, when run from just after 00:30.
Subscription time
The subscription time indicates the duration for which a client (or subscriber) remains connected to a data source and collects data from the subscribed topic.
For example, if the data collection schedule and subscription time are set to 5 and 2 minutes, after every 5 minutes the client (or subscriber) connects to the specified topic and collects data for 2 minutes.