Managing data collectors

The data collector collects data and sends it to the Indexer for indexing. When you perform a search, the indexed data is made available as a series of individual records (or search results). You can also view these search results in the form of charts. For more information, see search results.

This topic contains information about the following aspects regarding the data collection process:

What is a data collector?

Data collectors are responsible for actually collecting your data, indexing it, and making it available for search.

Data collectors contain inputs necessary for connecting with the target host (where data resides) and other inputs such as the data pattern, the rollover pattern if you are collecting rolling logs, and so on. You can collect data by creating data collectors on the Administration > Data Collectors tab. Data collectors can be of various types; depending on the type of data you want to collect, you need to create data collectors of that type. For example, if you want to collect Windows events, you must create a data collector of the type, Monitor Windows events.

What kind of data can I collect?

You can collect the following kinds of data:

Any kind of machine data such as logs and events from applications (including web servers, databases) and servers
Historical data and data generated continuously

Note

You cannot collect data that contains non-English characters appearing as the time stamp.

You can collect data for one-time or continuous monitoring.

When you create a data collector, at a minimum, you need to specify information about:

Your data source (for example, target server where the data is located and file location)
How you want to index the data (for example, data pattern to use)
How frequently you want to collect the data (for example, poll interval)

This information is used by the Indexer to index data and make it available in the form of events that can be searched immediately. If the manner in which the data was indexed is not as per your requirement, you can modify the data pattern and see if the results match your criteria.

Where is my data?

The data that you want to collect can be on the same computer on which the Collection Station (or Collection Agent) is installed (local data), or it can be on a different computer (remote data). You can collect data remotely by creating an SSH connection or connecting to a shared network drive on a Windows computer.

For more information, see Agent-types.

Which Agent type should I use?

You can collect data by using one of the following collection mechanisms:

Collection Station—An entity that is automatically installed when you install the product and is responsible for actually collecting data and providing it to the Indexer for further processing. The Collection Station serves a dual role – one of data collection and the other of acting as a proxy (or receiver) for Collection Agents that forward data to the IT Data Analytics server.
Collection Agent—Another entity that can be used for collecting data. The Collection Agent serves a similar role as that of the Collection Station with respect to data collection, but it is designed to be run on remote target hosts from which you want to collect data. But for this, you need to setup the Collection Agent first.

The Collection Agent can be of two types – standalone Collection Agent and Collection Agent available for the PATROL infrastructure.
To be able to use the Collection Agent for data collection, you need to first set it up. For more information, see Setting-up-Collection-Agents.

To understand how to choose a data collection mechanisms for your environment, see Agent-types.

Data collection process

At the time of data collection, the product automatically extracts particular knowledge from the data such as the timestamp present in the data and fields. But bulk of the extraction happens depending on the data pattern used for collecting the data. Data patterns are objects that define the pattern in which to extract, organize, and categorize data. Based on the data pattern, the data collector collects data and makes it available as a series of individual records, on which you can search. For more information, see Managing-data-patterns.

Fields are name=value pairings such as HOST=clm.bmc.com that add meaning to the data and help you search more effectively. They help you classify particular portions in your data that might otherwise go unnoticed. Fields are extracted automatically by the product (such as timestamp and name=value pairs already present in the data) and additionally defined by the data pattern. Fields act as the building blocks for running search commands and creating dashboards. You can additionally define tags to group or categorize fields with similar values. For example, you can add a tag called Location with the values Houston, San Jose, and California. These tags can be added to your search query to help improve your search. Tags can be added at the time of creating a data collector. For more information about fields and tags, see Understanding-fields.

You can collect data for one-time or continuous monitoring. This depends on the polling interval defined in the data collector and the type of data collector. For more information, see Data collector types.

Data collector types

The following table lists the supported data collector types categorized on the basis of the data sources and whether you want to perform local or remote collection:

Which functions are supported while creating data collectors?

In the process of creating data collectors, various functions are supported. For example, you can add tags that will eventually help you search data effectively. You can assign group permissions by which the data collected will be accessible only to particular user groups.

These functions might vary depending on the data collector type that you select.

For more information, see Functions-available-while-creating-data-collectors.

How do I know the data collection status?

Based on the following information available on the Administration > Data Collectors page, you can conclude the data collection status.

For more information, see Data-collection-status.

Data retention and deletion

After creating a data collector, data collection starts when the first poll happens.

Supposing you want to monitor a file in which data is being continuously added. Data starts getting collected from the point when the first poll happened. All the previous data available in the file is ignored. The previous data is ignored based on the data retention period. Data retention period defines the maximum number of days, in the past, from the current date, for which data must be retained in the system. By default, the product defines the data retention period as seven days. To change the data retention period, navigate to Administration > System Settings.

The data retention period acts as a moving window (depicted in green in the following figure).

Consider that on the following scale of time, you created a data collector at time T1, now data collection starts from T1 when the first poll happens. Data collected at T1 remains in the system until T1+7. As time passes, the data older than the seven days period starts getting deleted and is no longer available for searching.

data retention1.png

Data retention period has implications on the Read from Past (# days) function which defines the maximum limit (of time) for collecting data older than the current time. This setting is available for the following data collectors:

Upload-file used for uploading a file into the system and indexing all the data present in the file.
Monitor-remote-Windows-events used for collecting events generated by a Windows computer.
Monitor-using-external-configuration used for collecting event data from external systems integrated with the product.

Note

After the data collector is created, it might take some time (approximately 1 minute) for the first poll to happen. The first poll is used to make the data collector ready for data collection. The data is fetched only from the second poll.

Expected time delay (to see the first set of data for a search) = (Time for first poll) + (Poll interval set for the data collector).

Viewing and searching configured data collectors

The Data Collectors tab allows you to manage data collectors. To access this tab, navigate to Administration > Data Collectors.

This tab displays a default data collector for collecting the data in the Collection_metrics.log file. The Search tab displays a graph summarizing the data collected by this data collector. For more information, see Collecting product metrics.

You can perform the following actions on the Data Collectors tab.

The Data Collectors tab provides the following information:

Collecting product metrics

You can collect and analyze metrics (or logs) generated by the BMC TrueSight IT Data Analytics product for the Collection Station and Search components. After installing the product, the data collector for collecting the Collection Station is automatically created. You can also view a line chart summarizing the total data indexed on the Default dashboard page. But you need to create the data collector for collecting the Search component logs. For more information, see Monitoring-the-product-metric-files.