Data collection best practices

This topic provides information about data-collection best practices:

Prerequisites for data collection configuration

Before you start configuring data collectors, you must create a deployment plan.

If you plan to configure a large number of data collectors for multiple applications, you must plan for your needs from various aspects to help you save time and make your data-collection process as efficient as possible. You can start by creating a list of the sources of all data that you want to collect and then fill in key properties about each data source.

The following table provides a sample list of inputs you need to plan for the data collection configuration:

Notes

You can collect data with UTF-8 character encoding only.
You cannot collect data that contains non-English characters appearing as the time stamp.

When planning your data-collection configuration, be sure to choose the Agent type that aligns with your existing environment or architecture. The following scenarios can help you decide which Agent type to use:

If you want to perform local data collection, then you can install standalone Collection Agents on the target hosts from where you want to collect data.
If you have BMC PATROL Agents deployed in your current environment, it might be a good idea to configure Collection Agents by using the BMC PATROL Knowledge Module (KM) for IT Data Analytics.
If you use the Collection Agents to collect data locally, then you can configure security for the data sent from the Collection Agent to the product (Collection Station component).
If your environment consists of all Windows servers, you can use the Monitor-file-over-Windows-share data collector.
If you have data stored in a database or another product repository and this data is unavailable via log files, you might want to use the following script collectors:
- Monitor-script-output-over-SSH
- Monitor-script-output-on-Collection-Agent
If you want to perform a remote data collection (for example, using the Receive-over-TCP-UDP data collector), BMC recommends you to use a Collection Agent that is located on a computer other than the target host.

You might want to experiment with the various data collectors to determine what works best for your environment.

Recommendation

BMC recommends you to not perform remote data collection across a WAN network. If you want to collect data remotely, you can use the Collection Agent located on a remote computer (other than the target host) and create remote data collectors. The Collection Agents will then collect data via the remote data collectors and forward it to the Collection Station. For more information, see Using-a-Collection-Agent-for-remote-data-collection.

Data collector configuration options

You can create data collectors in various ways:

The Administration > Data Collectors tab allows you to create individual data by using the product user interface. For more information, see Managing-data-collectors.
You can create collection profiles containing data-collector templates that help you automate your data-collector configuration.
You can use CLI commands to export and import data collectors.

You are likely to progress from one approach to another over time.

Note

If the data you are collecting has a time stamp that is more than 24 hours in the future, that data is not indexed. Therefore, you must ensure that the time settings on the target hosts and the collection hosts are set up correctly and are synchronized. You must also ensure that you specify the time zone correctly when you create the data collector.

If your data file does not have a reference to the year in the time stamp (as in syslog files, for example), and at the time of indexing the product detects that the time at which the data occurred is ahead of the current time, the product assumes that this data is from the previous year. Such an instance might occur if the time settings for the target host and the collection host time are not synchronized. Based on the maximum data retention period (if set in days), such data might not be indexed.

Example: If the product server date and time are set to June 10, 2014 2:45 AM, and the events received have a date and time stamp of July 10, 3:45 AM, the product assumes that the year in which the data occurred is 2013 (the previous year). If the data retention period in the product is set to 15 days, this data is not indexed, because the time at which the data occurred is outside of the maximum data-retention period.

Using collection profiles

Collection profiles allow you to save multiple data-collector configuration settings (typically associated with a host) to apply against other hosts simultaneously. This approach promotes consistency and is more efficient when compared to individual data-collector configuration.

This approach is useful when you have a host-centric view of your environment (for example, when using BMC PATROL Agent with the BMC PATROL KM for IT Data Analytics). Suppose that in your environment you have seven Linux hosts. Two of the Linux hosts have the JBOSS application installed on them, while the remaining five hosts have the Apache Tomcat application installed on them. Now you can create collection profiles with data-collector templates in the following ways:

Create a collection profile for the JBOSS application and apply it to the appropriate Linux hosts.
Create a collection profile for Apache Tomcat application and apply it to the appropriate Linux hosts.

After you create the collection profile, you can expect data collectors to be automatically created and to be ready to start data collection.

Using the command line interface

The product command line interface supports export and import of data-collector configurations. You can export a small set of data collectors and then use that base export file as a master copy. You can make changes in the exported copy and then import the copy into the product. This approach can be an efficient and consistent method of configuring data collectors. It ensures that you have a backup copy of your configuration settings in files outside of the product (in the event of a serious failure). It also allows you to automate data-collector configuration, because the command line can be triggered from other scripts or workflows.

The command line interface provides the capability to start or stop all data collectors configured on your system. While the user interface allows you to start or stop data collectors individually, to start or stop all data collectors simultaneously, you must use the command line interface. This method can be useful when you want to stop data collection during a maintenance window.

Note

When the Indexer restarts after a long time, the result might be a sudden increase in data-collection traffic from the Collection Station and Collection Agents. To avoid this problem, consider shutting down all data collectors if the Indexer is down for a long time, and restart them after the Indexer is up again.

Hosts and data collectors

Using host objects can help simplify the management of your data collectors.

Creating hosts in your system first and then creating data collectors associated with the host objects allows key properties to be inherited by the data collectors assigned to that host. The following properties of a host can be inherited by the data collector assigned to it:

Host name
Host-level tags
Host-level access groups

Using hosts ensures consistency and avoids instances in which you accidentally forget to set up the same property for each data collector.

Using collection profiles is another way of leveraging host objects. One or more collection profiles can be applied against a host object. Data collectors for all data-collector templates (contained in the collection profiles) are created for each host. For example, in the following table, you can see that two collection profiles are applied to a single host, and data collectors are created for each host.

Collection Profile 1	Collection Profile 2	Host 1	Data collectors created
Data Collector Template 1 (T1) Data Collector Template 2 (T2)	Data Collector Template 3 (T3)	H1	T1H T2H T3H

This approach can be useful when you are using the BMC PATROL KM for IT Data Analytics with a PATROL Agent associated with each host.

Using tags and naming conventions

Tags allow the administrator to associate a set of properties with each data collector and the data it collects. Tags that are set properly can be very useful in the search process. Tags allow you to filter or isolate search intuitively. Data collector tags must represent data properties that are clearly defined across all data sources; for example, location, application group, and OS are common tag names that can be applied across most data sources. Tags must be added only if they might be useful in search filtering. Tags have some performance overhead associated with them, so you must think through a clear tag convention ahead of time and only define those tags that will be used.

Using a consistent data-collector naming convention will help you to accomplish the following tasks:

Easily find your data collector on the Administration > Data Collectors table
Easily search for data from particular data collectors by using the collector name as a filter (COLLECTOR_NAME field)

Common data collector settings

The following table lists common data collector settings that you can apply when creating data collectors:

Setting	Best practice
Poll interval	Retain the default poll interval unless you have a special reason for changing it (such as running one of the script collectors only once every hour). Polling every 5 minutes (versus every 1 minute) does not make any noticeable difference in performance.
Filename/ rollover pattern	If the current log file being written to consistently uses the same name, provide the exact log file name. If the current log file name is consistently changing, use a match pattern (such as *error..log**). However, it is important to ensure that your pattern matches only the current log file, to avoid processing extra data that might not be intended.
Group access	If you are not enforcing access control on your data collectors, disable the data access control setting. Instead of selecting every access group for every data collector, it is more efficient to disable the data access control setting.
Time zone	You must not set a time zone explicitly for any log file that contains a time zone. If a time zone is not set explicitly and the log file does not contain time zone information, the IT Data Analytics server time zone is used when converting the date time to UTC.
User name and passwords	For any data collectors that require username and password credentials, it is recommended that you create stored credentials that can be referenced by the data collector. Using credentials can be useful in the following scenarios: You plan to use the CLI import/export feature: When you export data collectors by using the CLI command, the passwords are not saved as a part of the exported file. If you use a stored credential in the data collector instead of manually providing details, when you export the data collector, you do not need to make manual password changes before actually importing the data collector into the system. Your applications require periodic password change: If you have a company policy that requires a periodic password change for applications, then by editing the stored credentials you can apply the password change to all data collectors referencing that credential.

Configuring incrementally

Adding new data collectors must be done incrementally and followed with a simple validation phase. As a part of the validation, you can see if data collection has started and if you can perform searches.

Configuring data collectors incrementally has the following advantages:

If you provided any incorrect settings as part of the data-collector configuration, you do not have to start the whole process from the beginning; you need only re-edit or delete the existing data collectors and then re-create a smaller set of data collectors.
Assessing performance impact to the system is easier if you apply a validation phase between configurations.
Configuring like data collectors in one configuration session (for example, on one day) and then a different set of like data collectors in another session (for example, on another day) can be easy and simple.

Backing up data collection configurations

You can back up your data-collection configuration in the following way:

Export data patterns via content pack
Export data collectors by using the CLI export command