Collecting data into the system
The easiest way of collecting data is to create a data collector from the Administration > Data Collectors tab. Data collectors collect data and send it to the Indexer for indexing. When you perform a search, the indexed data is made available as a series of individual records (or search results).
When creating a data collector, you need to provide various inputs such as details of the target host from where you plan to collect data, and other parameters such as the data pattern (matching the data that you want to collect), the date format to use for indexing the date and time string, the file encoding to use, and other advanced (but optional) settings. Before you create a data collector, you need to understand the kind of data that you want to collect and collate all the inputs required for creating the particular data collector type.
The following information can help you understand the data collector creation process and associated best practices.
Preparing for data collector creation
As a best practice, before you start creating data collectors, you must create a. If you plan to configure a large number of for multiple applications, you must plan for your needs from various aspects to help you save time and make your data-collection process as efficient as possible. You can start by creating a list of sources of all the data that you want to collect and then fill in key properties about each data source. For example, the attached spreadsheet contains a list of sample inputs required while creating data collectors.
Icons and associated functions on the Data Collectors tab
The Data Collectors tab allows you to manage data collectors. To access this tab, navigate to Administration > Data Collectors.
The Data Collectors tab displays a default data collector that collects data available in the Collection_metrics.log file. The Dashboards tab displays a graph summarizing the data collected by this data collector. For more information, see Creating and managing dashboards. For more information about collecting data from the product metrics files, see Monitoring the product metric files.
You can perform the following actions on the Data Collectors tab.
|Add Data Collector|
Add a new data collector.
The kind of data that you want to collect determines the type of data collector. The inputs required for collecting a data collector vary based on the data collector type. For more information, see Creating data collectors.
Best practice: Adding new data collectors must be done incrementally and followed with a simple validation phase. As a part of the validation, you can see if data collection has started and if you can perform searches.
Configuring data collectors incrementally has the following advantages:
|Edit Data Collector |
Edit the selected data collector.
You can modify the same details that you provided while adding a data collector.
Note: You cannot modify a data collector if the data collector is of the Upload File type.
|View Data Collector ||View details of the selected data collector|
|Delete Data Collector |
Delete the selected data collectors. Optionally, select the Delete data for this Data Collector check box if you want to delete all the data collected by that data collector so that it is no longer available for searching.Click OK to confirm your action.
Note: There might be some residual data remaining in the system that was still being collected when you decided to delete the data collected by the data collector. Such data is deleted from the system when the data retention period is over.
|Clone Data Collector|
Make a copy of the selected data collector.
Collection Status History
(Last 10 polls status)
View the individual status of the last ten polls for a data collector. For more information, see Understanding the data collection status.
To refresh the list displayed, click Refresh Collection Status History. Note that this feature is not supported for the Upload File type of data collector because this data collector is meant for one time collection of data.
|Start Data Collector(s)|
Start the selected data collectors.
This action is not relevant for the Upload file type of data collector as it performs one-time data collection after the data collector is created.
|Stop Data Collector(s)|
Stop the selected data collectors that are already started.
|Refresh Data Collector List||Manually refresh the list of data collectors to see the latest poll status and any other updates made to the data collectors.|
|Change Maximum Data Retention Period|
Change the maximum data retention period (in days) for the selected data collectors.
Use the following options to set the data retention period:
By default, the upper limit for changing the data retention period is set to 14 days. This upper limit is available at. You can customize this value by modifying the following property value:
After changing the property value, you need to restart the Search component to apply the change.
For more information about how data retention works, see Understanding data retention and deletion.
In the search bar, at the top-right side of your screen, you can filter data collectors based on the following columns:
To filter data collectors, you can specify one of the preceding names (except tags) either fully or partially. Each time you search, the data collectors are filtered based on values found in one or more of the preceding columns.
To filter data collectors by tags, you need to specify the tag name and value in the format, TagName=TagValue. You can also specify a comma-separated list of tag name=value pairs.
In addition, you can filter data collectors in the following ways:
The Data Collectors tab provides the following information:
|Show Data Collected|
Click Show Data Collectednext to one of the data collectors in the last column on the right, to search the data collected by that data collector.
When you click Show Data Collected, by default the search is run for the search string, COLLECTOR_NAME="DataCollectorName".
In the preceding search string, DataCollectorName refers to the name of the data collector.
Name of the data collector.
|Path||File path of the data file used while creating the data collector.|
|Host||Host name of the server on which the data exists.|
List of tag names with their corresponding values added to the data collector.
Overall polling status for the data collector, as follows:
For more information, see Understanding the data collection status.
Displays the data collector state, to indicate whether the data collector was started or stopped, as follows:
Note: The start and stop actions are not relevant for the Upload file type of data collector as it performs one-time collection. Therefore, for this data collector, the State column displays a dash (-).
Date and time when the data collector was modified.
If you did not modify the data collector, then the date and time when the data collector was created is displayed.
|Last Event Timestamp||Date and time of the last event that got indexed by the data collector.|
|Type||Type of the data collector.|
Data pattern used for creating the data collector.
For more information about assigning a data pattern, see Assigning the data pattern and date format to a data collector.
Best practices for using common data collector settings
The following table lists the best practices that you can apply while using the common data collector settings (at the time of creating a data collector).
Use a consistent data-collector naming convention. Doing this can help you accomplish the following tasks:
Retain the default poll interval unless you have a special reason for changing it (such as running one of the script collectors only once every hour).
Polling every 5 minutes (versus every 1 minute) does not make any noticeable difference in performance.
|Filename/ rollover pattern|
If the current log file being written to consistently uses the same name, provide the exact log file name.
If the current log file name is consistently changing, use a match pattern (such as error.*.log). However, it is important to ensure that your pattern matches only the current log file, to avoid processing extra data that might not be intended.
If you are not enforcing access control on your data collectors, disable the data access control setting.
Instead of selecting every access group for every data collector, it is more efficient to disable the data access control setting.
You must not set a time zone explicitly for any log file that contains a time zone. If a time zone is not set explicitly and the log file does not contain time zone information, the IT Data Analytics server time zone is used when converting the date time to UTC.
Note: If the data you are collecting has a time stamp that is more than 24 hours in the future, that data is not indexed. Therefore, you must ensure that the time settings on the target hosts and the collection hosts are set up correctly and are synchronized.
If your data file does not have a reference to the year in the time stamp (as in Syslog files, for example), and at the time of indexing the product detects that the time at which the data occurred is ahead of the current time, the product assumes that this data is from the previous year. Such an instance might occur if the time settings for the target host and the collection host time are not synchronized. Based on the maximum data retention period (if set in days), such data might not be indexed.
Example: If the product server date and time are set to June 10, 2014 2:45 AM, and the events received have a date and time stamp of July 10, 3:45 AM, the product assumes that the year in which the data occurred is 2013 (the previous year). If the data retention period in the product is set to 15 days, this data is not indexed, because the time at which the data occurred is outside of the maximum data-retention period.
Scenarios when data is not collected
The following table provides scenarios in which data might not be collected and therefore is not searchable.
|When a data collector is stopped and then started.||The time for which the data collector remains down.|
|When a new Collection Station is added to the pool.||This involves restart of all the Collection Agents. Data is not be collected for the time taken by the Collection Agents to restart.|
|When the configuremasters CLI command is run.||This involves the restart of the Collection Station. Data is not be collected for the time taken by the Collection Station to restart.|
|When the movecomponents CLI command is run.||This involves the restart of all the Collection Agents. Data is not be collected for the time taken by the Collection Agents to restart.|
|When the data collector is started, past data is not collected.|
|If the Payload channel is full.||This can occur due to a number of reasons. For example, if the Collection Station is not reachable or there is a sudden burst in incoming data.|