About field extraction
The following sections help you understand how fields can be identified and he various ways in which fields can be extracted.
Data collection-time field extraction
Data collection-time field extraction takes place just before the collected data is indexed.
The following processes are involved in the data collection-time field extraction.
Automatic field extraction
The product automatically discovers and extracts name=value pairs from the data and displays it as fields in your search results. This means that all data that appears with a "name=value" syntax are treated as fields. In the "name=value" syntax, the name portion refers to the field name while the value portion refers to the value of that field.
When you search for a name=value pair, the search returns only that data which contains the exact field name with the exact value.
Suppose you run the following search query.
This search finds data with the Source field that has a value of Win-Hou-02. The search does not find data records with any other Source value. It also does not find data records with other fields that share the Win-Hou-02 value. Thus, searching for name=value pairs return results that are more focused than you get if you search for just the value.
During automatic field extraction, the field value is limited to the following characters. This means that when the product spots a name=value pair, it extracts the value only upto that point where the following list of characters occur. Any other character found (that is not part of the following list) determines the end point of the field value.
List of characters included in the field value
- Letters (irrespective of case)
- Numbers (0 to 9)
- Underscore (_)
- Hyphen (-)
- Period (.)
For example, in the following sample data,
searchText=COLLECTOR_NAME is extracted. The % character defines the end point of the field value.
[06/Jan/2016:12:23:21 -0600] "GET /olaengine/rest/olaapi
If you want to capture and extract particular portions in your data that are important to your needs, but which are not automatically discovered and extracted, then you need to define fields for such portions. You can define fields by performing a data pattern based field extraction while or you can perform a search-time field extraction by using particular search commands.
BMC recommends you to plan your fields beforehand and perform a data pattern based field extraction. For more information, see Data collection-time versus search-time field extraction.
By default, automatic fields are extracted with the field type STRING.
To change the field type, you need to, perform the following steps, and click Update:
- Copy sample lines of your data containing details that you want to extract as a field in the Sample Text box.
- Edit the primary pattern to capture the particular field for which you want to change the field type.
While naming this field, it is important that the name is unique and not already used.
- Click Preview to validate the sample data entries and then change the field type for the field that you just added.
For more information, see Editing or cloning data patterns.
For every data record (or event) that is indexed, the product assigns certain fields based on the inputs specified at the the time of creating a data collector or by certain default settings. These fields are known as default fields.
The following table provides a list of default fields:
|User input||Field name|
Refers to the name specified to identify the data collector
Refers to the name of the data pattern used for creating the data collector
Absolute file path retrieved from one or more of the user inputs:
Fields extracted for the ProactiveNet (or TrueSight Infrastructure Management) events as defined by the
|Fields extracted from change management data|
|Fields extracted from incident management data|
Data pattern based field extraction
Data pattern based field extractions can help you capture important information in the data that is not already discovered and captured by automatic field extraction.
The data pattern wizard automatically detects the date format and the portions of data that seem to follow some pattern. These data portions can be added as fields. The rest of the data is treated as miscellaneous details and is automatically extracted as free text. If you want to extract fields from the miscellaneous details or if you want to perform an even more advanced field extraction, you can save and later edit the data pattern. In the edit mode, you can customize the primary pattern to suit your needs. To be able to customize the primary pattern, you need the knowledge of Java regular expressions. For more information about editing the data pattern, see Editing or cloning data patterns.
At the time of adding fields, you need to provide the following inputs such as:
- Field type – Defines the way in which fields must be stored in the data store. Storing fields with a field type enables you to use particular search commands to search fields effectively.
- Field name – Defines the name by which the field must be saved.
The following fields are treated as internal fields:
Internal fields are usually not available for searching. But you can use the timestamp field as a part of your search criteria. The timestamp field is added at the time of indexing a data record and can be most useful while using search commands. For example, you can use the timestamp field with the filter search command search command to display search results matching the filter criteria associated with the field.
Search-time field extraction
You can use the following search commands to extract fields at the time of searching.
- extract search command can be used to extract field values or raw event data by using the Java regular expression capturing groups.
- extractkv search command can be used to extract name=value pairs from raw event data depending on the delimiters specified.
Note that fields extracted at search-time are virtual fields and cannot be added to the Fields section on the Search page.
This kind of field extraction is more suited if your data does not follow any structure or pattern that can be easily identified or if you want to extract data that is not delimited by the standard equals sign.
Data collection-time versus search-time field extraction
Fields represent small portions of valuable knowledge in your data. To capture all the fields in your data that are not automatically extracted, you need to define custom fields.
Custom fields can be defined in the following ways:
- During data collection: By performing a data pattern based extraction.
- During search: By performing a search command based extraction.
As an administrator, it is important to consider the distinction between these two kinds of field extraction.
Both kinds of field extraction have their own advantages. However as a general rule, it is recommended that you plan your fields beforehand and extract them while creating a data pattern. With data pattern based field extraction, search can be faster. Also, this kind of field extraction gives you better control of your data and improved results. It also helps you run advanced search commands and find meaningful insights.
Both kinds of field extraction have some performance impact. The number of fields and the unique values per field determine the amount of performance impact. For more information, see.
Data pattern based extraction
This kind of field extraction adds a cost on storage. However, search performance improves for all kinds of searches including advanced search commands such as timechart search command and stats search command. In this kind of extraction, parsing has to be done on all the data that is indexed and therefore CPU required is more. Thus, for this kind of extraction, more hardware resources are required.
Performing this kind of field extraction required some previous planning during data collection.
This kind of field extraction is more suited in the following scenarios:
- If your data follows some structure or pattern that can be easily identified.
- If you plan to use statistical operations by running advanced search commands.
Search command based extraction
This kind of extraction takes place only on the amount of data that satisfied the search criteria. This means the search is run on an already narrowed-down set of results.
Fields extracted at search-time are normally virtual fields. You cannot see the count of these fields in the Filters panel > Fields section. Any further processing on such fields negatively impacts search performance.
Extracting fields at search-time does not require any planning. You can decide which fields to extract at run time.
This kind of field extraction can be done by running search commands like extract search command and extractkv search command. To be able to run the extract search command command, you need knowledge of Java regular expressions.
This kind of field extraction is more suited in the following scenarios:
- If your data does not follow any structure or pattern that can be easily identified.
- If you want to extract data that is not delimited by the standard equals sign.
For more information, see the scenarios described at Search-time field extraction.
Use cases for performing search-time field extraction
The following scenarios provide an understanding of when search-time field extraction might be useful:
When the data does not follow a set pattern
If the data that you want to collect does not follow a set pattern, then it might be useful to perform search-time field extraction.
For example, suppose you want to extract the log level (warning) from the following sample data. In this scenario, you can use the
2014-11-18T15:50:53.872+05:30 [03140 warning 'VpxProfiler'
When name-value pairs are delimited with other special characters
If your data contains name-value pairs that do not appear in the standard delimited syntax of "name=value", then you might want to perform a search-time field extraction.
For example, suppose you want to extract the start time and end time from the following sample data. In this scenario, you can use thecommand.
ChartData found for searchId = 1401867925702, index:bw-2014-06-02-06-006;
When the field name contains a period or other special characters
If name=value pairs exist in the data, but the name portion contains a period or some other special character consistently throughout the data, then you might want to perform a search-time field extraction.
For example, suppose you want to extract the CPU temperature only and not the disk temperature from the following sample data. In this scenario, you can use the
[10/26/2015 6:04 PM] Cpu is overheated. cpu.temperature=40
In this scenario, you can run the following search query: