Data collection best practices
This topic provides information about data-collection best practices:
Related topics
Prerequisites for data collection configuration
Before you start configuring data collectors, you must create a deployment plan.
If you plan to configure a large number of data collectors for multiple applications, you must plan for your needs from various aspects to help you save time and make your data-collection process as efficient as possible. You can start by creating a list of the sources of all data that you want to collect and then fill in key properties about each data source.
The following table provides a sample list of inputs you need to plan for the data collection configuration:
When planning your data-collection configuration, be sure to choose the Agent type that aligns with your existing environment or architecture. The following scenarios can help you decide which Agent type to use:
- If you want to perform local data collection, then you can install standalone Collection Agents on the target hosts from where you want to collect data.
- If you have BMC PATROL Agents deployed in your current environment, it might be a good idea to configure Collection Agents by using the BMC PATROL Knowledge Module (KM) for IT Data Analytics.
- If you use the Collection Agents to collect data locally, then you can configure security for the data sent from the Collection Agent to the product (Collection Station component).
- If your environment consists of all Windows servers, you can use the Monitor-file-over-Windows-share data collector.
- If you have data stored in a database or another product repository and this data is unavailable via log files, you might want to use the following script collectors:
- If you want to perform a remote data collection (for example, using the Receive-over-TCP-UDP data collector), BMC recommends you to use a Collection Agent that is located on a computer other than the target host.
You might want to experiment with the various data collectors to determine what works best for your environment.
Data collector configuration options
You can create data collectors in various ways:
- The Administration > Data Collectors tab allows you to create individual data by using the product user interface. For more information, see Managing-data-collectors.
- You can create collection profiles containing data-collector templates that help you automate your data-collector configuration.
- You can use CLI commands to export and import data collectors.
You are likely to progress from one approach to another over time.
Using collection profiles
Collection profiles allow you to save multiple data-collector configuration settings (typically associated with a host) to apply against other hosts simultaneously. This approach promotes consistency and is more efficient when compared to individual data-collector configuration.
This approach is useful when you have a host-centric view of your environment (for example, when using BMC PATROL Agent with the BMC PATROL KM for IT Data Analytics). Suppose that in your environment you have seven Linux hosts. Two of the Linux hosts have the JBOSS application installed on them, while the remaining five hosts have the Apache Tomcat application installed on them. Now you can create collection profiles with data-collector templates in the following ways:
- Create a collection profile for the JBOSS application and apply it to the appropriate Linux hosts.
- Create a collection profile for Apache Tomcat application and apply it to the appropriate Linux hosts.
After you create the collection profile, you can expect data collectors to be automatically created and to be ready to start data collection.
Using the command line interface
The product command line interface supports export and import of data-collector configurations. You can export a small set of data collectors and then use that base export file as a master copy. You can make changes in the exported copy and then import the copy into the product. This approach can be an efficient and consistent method of configuring data collectors. It ensures that you have a backup copy of your configuration settings in files outside of the product (in the event of a serious failure). It also allows you to automate data-collector configuration, because the command line can be triggered from other scripts or workflows.
The command line interface provides the capability to start or stop all data collectors configured on your system. While the user interface allows you to start or stop data collectors individually, to start or stop all data collectors simultaneously, you must use the command line interface. This method can be useful when you want to stop data collection during a maintenance window.
Hosts and data collectors
Using host objects can help simplify the management of your data collectors.
Creating hosts in your system first and then creating data collectors associated with the host objects allows key properties to be inherited by the data collectors assigned to that host. The following properties of a host can be inherited by the data collector assigned to it:
- Host name
- Host-level tags
- Host-level access groups
Using hosts ensures consistency and avoids instances in which you accidentally forget to set up the same property for each data collector.
Using collection profiles is another way of leveraging host objects. One or more collection profiles can be applied against a host object. Data collectors for all data-collector templates (contained in the collection profiles) are created for each host. For example, in the following table, you can see that two collection profiles are applied to a single host, and data collectors are created for each host.
Collection Profile 1 | Collection Profile 2 | Host 1 | Data collectors created |
---|---|---|---|
Data Collector Template 1 (T1) Data Collector Template 2 (T2) | Data Collector Template 3 (T3) | H1 | T1H T2H T3H |
This approach can be useful when you are using the BMC PATROL KM for IT Data Analytics with a PATROL Agent associated with each host.
Using tags and naming conventions
Tags allow the administrator to associate a set of properties with each data collector and the data it collects. Tags that are set properly can be very useful in the search process. Tags allow you to filter or isolate search intuitively. Data collector tags must represent data properties that are clearly defined across all data sources; for example, location, application group, and OS are common tag names that can be applied across most data sources. Tags must be added only if they might be useful in search filtering. Tags have some performance overhead associated with them, so you must think through a clear tag convention ahead of time and only define those tags that will be used.
Using a consistent data-collector naming convention will help you to accomplish the following tasks:
- Easily find your data collector on the Administration > Data Collectors table
- Easily search for data from particular data collectors by using the collector name as a filter (COLLECTOR_NAME field)
Common data collector settings
The following table lists common data collector settings that you can apply when creating data collectors:
Setting | Best practice |
---|---|
Poll interval | Retain the default poll interval unless you have a special reason for changing it (such as running one of the script collectors only once every hour). Polling every 5 minutes (versus every 1 minute) does not make any noticeable difference in performance. |
Filename/ rollover pattern | If the current log file being written to consistently uses the same name, provide the exact log file name. If the current log file name is consistently changing, use a match pattern (such as error.*.log). However, it is important to ensure that your pattern matches only the current log file, to avoid processing extra data that might not be intended. |
Group access | If you are not enforcing access control on your data collectors, disable the data access control setting. Instead of selecting every access group for every data collector, it is more efficient to disable the data access control setting. |
Time zone | You must not set a time zone explicitly for any log file that contains a time zone. If a time zone is not set explicitly and the log file does not contain time zone information, the IT Data Analytics server time zone is used when converting the date time to UTC. |
User name and passwords | For any data collectors that require username and password credentials, it is recommended that you create stored credentials that can be referenced by the data collector. Using credentials can be useful in the following scenarios:
|
Configuring incrementally
Adding new data collectors must be done incrementally and followed with a simple validation phase. As a part of the validation, you can see if data collection has started and if you can perform searches.
Configuring data collectors incrementally has the following advantages:
- If you provided any incorrect settings as part of the data-collector configuration, you do not have to start the whole process from the beginning; you need only re-edit or delete the existing data collectors and then re-create a smaller set of data collectors.
- Assessing performance impact to the system is easier if you apply a validation phase between configurations.
- Configuring like data collectors in one configuration session (for example, on one day) and then a different set of like data collectors in another session (for example, on another day) can be easy and simple.
Backing up data collection configurations
You can back up your data-collection configuration in the following way:
- Export data patterns via content pack
- Export data collectors by using the CLI export command