This topic provides information about data-collection best practices:
Before you start configuring data collectors, you must create a deployment plan.
If you plan to configure a large number of data collectors for multiple applications, you must plan for your needs from various aspects to help you save time and make your data-collection process as efficient as possible. You can start by creating a list of the sources of all data that you want to collect and then fill in key properties about each data source.
The following table provides a sample list of inputs you need to plan for the data collection configuration:
Data Source | Location | Host | Format/ Pattern | Encoding | Contains time zone in time stamp? | Collection mechanism | Local/ remote collection? | Data collector |
---|---|---|---|---|---|---|---|---|
Application A Access common log | San Jose | sj-lab46.bmc.com | Access Log Common | UTF8 | Yes | Collection Station | Remote | Monitor file over Windows share |
Application A Oracle log | San Jose | sj-lab51.bmc.com | Oracle Database Alert | UTF8 | Yes | Collection Agent | Local | Monitor file on Collection Agent |
Application A Error log | San Jose | sj-lab46.bmc.com | Custom | UTF8 | No | Collection Agent | Local | Monitor file on Collection Agent |
Application A Audit log | - | - | Custom | Unknown | No | Collection Agent | Local | Monitor script output on Collection Agent |
Notes
When planning your data-collection configuration, be sure to choose a collector mechanism that aligns with your existing environment or architecture. The following scenarios can help you decide which collection mechanism to use:
You might want to experiment with the various data collectors to determine what works best for your environment.
You can create data collectors in various ways:
You are likely to progress from one approach to another over time.
Note
If the data you are collecting has a time stamp that is more than 24 hours in the future, that data is not indexed. Therefore, you must ensure that the time settings on the target hosts and the collection hosts are set up correctly and are synchronized. You must also ensure that you specify the time zone correctly when you create the data collector.
If your data file does not have a reference to the year in the time stamp (as in syslog files, for example), and at the time of indexing the product detects that the time at which the data occurred is ahead of the current time, the product assumes that this data is from the previous year. Such an instance might occur if the time settings for the target host and the collection host time are not synchronized. Based on the maximum data retention period (if set in days), such data might not be indexed.
Example: If the product server date and time are set to June 10, 2014 2:45 AM, and the events received have a date and time stamp of July 10, 3:45 AM, the product assumes that the year in which the data occurred is 2013 (the previous year). If the data retention period in the product is set to 15 days, this data is not indexed, because the time at which the data occurred is outside of the maximum data-retention period.
Collection profiles allow you to save multiple data-collector configuration settings (typically associated with a host) to apply against other hosts simultaneously. This approach promotes consistency and is more efficient when compared to individual data-collector configuration.
This approach is useful when you have a host-centric view of your environment (for example, when using BMC PATROL Agent with the BMC PATROL KM for IT Data Analytics). Suppose that in your environment you have seven Linux hosts. Two of the Linux hosts have the JBOSS application installed on them, while the remaining five hosts have the Apache Tomcat application installed on them. Now you can create collection profiles with data-collector templates in the following ways:
After you create the collection profile, you can expect data collectors to be automatically created and to be ready to start data collection.
The product command line interface supports export and import of data-collector configurations. You can export a small set of data collectors and then use that base export file as a master copy. You can make changes in the exported copy and then import the copy into the product. This approach can be an efficient and consistent method of configuring data collectors. It ensures that you have a backup copy of your configuration settings in files outside of the product (in the event of a serious failure). It also allows you to automate data-collector configuration, because the command line can be triggered from other scripts or workflows.
The command line interface provides the capability to start or stop all data collectors configured on your system. While the user interface allows you to start or stop data collectors individually, to start or stop all data collectors simultaneously, you must use the command line interface. This method can be useful when you want to stop data collection during a maintenance window.
Note
When the Indexer restarts after a long time, the result might be a sudden increase in data-collection traffic from the Collection Station and Collection Agents. To avoid this problem, consider shutting down all data collectors if the Indexer is down for a long time, and restart them after the Indexer is up again.
Using host objects can help simplify the management of your data collectors.
Creating hosts in your system first and then creating data collectors associated with the host objects allows key properties to be inherited by the data collectors assigned to that host. The following properties of a host can be inherited by the data collector assigned to it:
Using hosts ensures consistency and avoids instances in which you accidentally forget to set up the same property for each data collector.
Using collection profiles is another way of leveraging host objects. One or more collection profiles can be applied against a host object. Data collectors for all data-collector templates (contained in the collection profiles) are created for each host. For example, in the following table, you can see that two collection profiles are applied to a single host, and data collectors are created for each host.
Collection Profile 1 | Collection Profile 2 | Host 1 | Data collectors created |
---|---|---|---|
Data Collector Template 1 (T1) Data Collector Template 2 (T2) | Data Collector Template 3 (T3) | H1 | T1H T2H T3H |
This approach can be useful when you are using the BMC PATROL KM for IT Data Analytics with a PATROL Agent associated with each host.
Tags allow the administrator to associate a set of properties with each data collector and the data it collects. Tags that are set properly can be very useful in the search process. Tags allow you to filter or isolate search intuitively. Data collector tags must represent data properties that are clearly defined across all data sources; for example, location, application group, and OS are common tag names that can be applied across most data sources. Tags must be added only if they might be useful in search filtering. Tags have some performance overhead associated with them, so you must think through a clear tag convention ahead of time and only define those tags that will be used.
Using a consistent data-collector naming convention will help you to accomplish the following tasks:
The following table lists common data collector settings that you can apply when creating data collectors:
Setting | Best practice |
---|---|
Poll interval | Retain the default poll interval unless you have a special reason for changing it (such as running one of the script collectors only once every hour). Polling every 5 minutes (versus every 1 minute) does not make any noticeable difference in performance. |
Filename/ rollover pattern | If the current log file being written to consistently uses the same name, provide the exact log file name. If the current log file name is consistently changing, use a match pattern (such as error.*.log). However, it is important to ensure that your pattern matches only the current log file, to avoid processing extra data that might not be intended. |
Group access | If you are not enforcing access control on your data collectors, disable the data access control setting. Instead of selecting every access group for every data collector, it is more efficient to disable the data access control setting. |
Time zone | You must not set a time zone explicitly for any log file that does not contain a time zone. If a time zone is not set explicitly and the log file does not contain time zone information, the IT Data Analytics server time zone is used when converting the date time to UTC. |
User name and passwords | For any data collectors that require username and password credentials, it is recommended that you create stored credentials that can be referenced by the data collector. Using credentials can be useful in the following scenarios:
|
Adding new data collectors must be done incrementally and followed with a simple validation phase. As a part of the validation, you can see if data collection has started and if you can perform searches.
Configuring data collectors incrementally has the following advantages:
You can back up your data-collection configuration in the following way: