About field extraction
Fields represent small portions of your data displayed as name=value pairings, such as Source=<host-name>.
At the time of data indexing, fields are automatically extracted. This process is known as field extraction. Fields can also be extracted at search-time by using certain search commands.
The following sections help you understand how fields can be identified and he various ways in which fields can be extracted.
Data collection-time field extraction
Data collection-time field extraction takes place just before the collected data is indexed.
The following processes are involved in the data collection-time field extraction.
Automatic field extraction
The product automatically discovers and extracts name=value pairs from the data and displays it as fields in your search results. This means that all data that appears with a "name=value" syntax are treated as fields. In the "name=value" syntax, the name portion refers to the field name while the value portion refers to the value of that field.
When you search for a name=value pair, the search returns only that data which contains the exact field name with the exact value.
Default fields
For every data record (or event) that is indexed, the product assigns certain fields based on the inputs specified at the the time of creating a data collector or by certain default settings. These fields are known as default fields.
The following table provides a list of default fields:
User input | Field name |
---|---|
Name Refers to the name specified to identify the data collector | COLLECTOR_NAME |
Server Name
| HOST |
Pattern Refers to the name of the data pattern used for creating the data collector | DATA_PATTERN |
Absolute file path retrieved from one or more of the user inputs:
| COLLECTOR |
Fields extracted for the BMC ProactiveNet (or BMC TrueSight Infrastructure Management) events as defined by the bppm.reader.index.slotNames property in the custom directory for the Collection Station. | mc_host pn_object_id pn_object_class_id mc_parameter severity mc_incident_time mc_arrival_time |
Data pattern based field extraction
Data pattern based field extractions can help you capture important information in the data that is not already discovered and captured by automatic field extraction.
If your data follows a particular pattern or structure that can be easily identified, then you can capture data appearing in that pattern by creating a data pattern. You can capture such data by defining custom fields at the time of creating the data pattern. For more information about identifying fields in the data, see Understanding-fields.
The data pattern wizard automatically detects the date format and the portions of data that seem to follow some pattern. These data portions can be added as fields. The rest of the data is treated as miscellaneous details and is automatically extracted as free text. If you want to extract fields from the miscellaneous details or if you want to perform an even more advanced field extraction, you can save and later edit the data pattern. In the edit mode, you can customize the primary pattern to suit your needs. To be able to customize the primary pattern, you need the knowledge of Java regular expressions. For more information about editing the data pattern, see Editing or cloning data patterns.
At the time of adding fields, you need to provide the following inputs such as:
- Field type – Defines the way in which fields must be stored in the data store. Storing fields with a field type enables you to use particular search commands to search fields effectively.
- Field name – Defines the name by which the field must be saved.
Internal fields
The following fields are treated as internal fields:
- details
- SEQUENCE_ID
- _ignore
- utcdiffminutes
- timestamp
- _raw
- RAW_EVENT_DATA
Internal fields are usually not available for searching. But you can use the timestamp field as a part of your search criteria. The timestamp field is added at the time of indexing a data record and can be most useful while using search commands. For example, you can use the timestamp field with the filter search command to display search results matching the filter criteria associated with the field.
Search-time field extraction
You can use the following search commands to extract fields at the time of searching.
- extract can be used to extract field values or raw event data by using the Java regular expression capturing groups.
- extractkv can be used to extract name=value pairs from raw event data depending on the delimiters specified.
Note that fields extracted at search-time are virtual fields and cannot be added to the Fields section on the Search page.
This kind of field extraction is more suited if your data does not follow any structure or pattern that can be easily identified or if you want to extract data that is not delimited by the standard equals sign.
Data collection-time versus search-time field extraction
Fields represent small portions of valuable knowledge in your data. To capture all the fields in your data that are not automatically extracted, you need to define custom fields.
Custom fields can be defined in the following ways:
- During data collection: By performing a data pattern based extraction.
- During search: By performing a search command based extraction.
As an administrator, it is important to consider the distinction between these two kinds of field extraction.
Both kinds of field extraction have their own advantages. However as a general rule, it is recommended that you plan your fields beforehand and extract them while creating a data pattern. With data pattern based field extraction, search can be faster. Also, this kind of field extraction gives you better control of your data and improved results. It also helps you run advanced search commands and find meaningful insights.
Both kinds of field extraction have some performance impact. The number of fields and the unique values per field determine the amount of performance impact. For more information, see Variables-that-impact-product-performance.
Data pattern based extraction
This kind of field extraction adds a cost on storage. However, search performance improves for all kinds of searches including advanced search commands such as timechart and stats. In this kind of extraction, parsing has to be done on all the data that is indexed and therefore CPU required is more. Thus, for this kind of extraction, more hardware resources are required.
Performing this kind of field extraction required some previous planning during data collection.
This kind of field extraction is more suited in the following scenarios:
- If your data follows some structure or pattern that can be easily identified.
- If you plan to use statistical operations by running advanced search commands.
Search command based extraction
This kind of extraction takes place only on the amount of data that satisfied the search criteria. This means the search is run on an already narrowed-down set of results.
Fields extracted at search-time are normally virtual fields. You cannot see the count of these fields in the Filters panel > Fields section. Any further processing on such fields negatively impacts search performance.
Extracting fields at search-time does not require any planning. You can decide which fields to extract at run time.
This kind of field extraction can be done by running search commands like extract and extractkv. To be able to run the extract command, you need knowledge of Java regular expressions.
This kind of field extraction is more suited in the following scenarios:
- If your data does not follow any structure or pattern that can be easily identified.
- If you want to extract data that is not delimited by the standard equals sign.
For more information, see the scenarios described at Search-time field extraction.
Use cases for performing search-time field extraction
The following scenarios provide an understanding of when search-time field extraction might be useful:
When the data does not follow a set pattern
If the data that you want to collect does not follow a set pattern, then it might be useful to perform search-time field extraction.
For example, suppose you want to extract the log level (warning) from the following sample data. In this scenario, you can use the extract command.The [confluence_table-plus] macro is a standalone macro and it cannot be used inline. Click on this message for details.
To see examples of the search queries that can be used for extracting various portions of the sample data, see extract search command.
When name-value pairs are delimited with other special characters
If your data contains name-value pairs that do not appear in the standard delimited syntax of "name=value", then you might want to perform a search-time field extraction.
For example, suppose you want to extract the start time and end time from the following sample data. In this scenario, you can use the extractkv command.The [confluence_table-plus] macro is a standalone macro and it cannot be used inline. Click on this message for details.
To see examples of the search queries that can be used for extracting various portions of the sample data, see the extractkv search command.
When the field name contains a period or other special characters
If name=value pairs exist in the data, but the name portion contains a period or some other special character consistently throughout the data, then you might want to perform a search-time field extraction.
For example, suppose you want to extract the CPU temperature only and not the disk temperature from the following sample data. In this scenario, you can use the extract command.The [confluence_table-plus] macro is a standalone macro and it cannot be used inline. Click on this message for details.
If you index the preceding sample data, note that the product automatically extracts name=value (for example, temperature=40). To be able to extract the values associated to CPU only, while running the extract command, you can use a regular expression to extract only CPU values.
In this scenario, you can run the following search query:
extract field=".*?cpu\.temperature=(?<cpuTemperature>\d+).*"