About field extraction

Fields represent small portions of your data displayed as name=value pairings, such as Source=<host-name>.

At the time of data indexing, fields are automatically extracted. This process is known as field extraction. Fields can also be extracted at search-time by using certain search commands.

The following sections help you understand how fields can be identified and he various ways in which fields can be extracted.

Data collection-time field extraction

Data collection-time field extraction takes place just before the collected data is indexed.

The following processes are involved in the data collection-time field extraction.

Automatic field extraction
Default fields
Data pattern based field extraction
Internal fields

Automatic field extraction

The product automatically discovers and extracts name=value pairs from the data and displays it as fields in your search results. This means that all data that appears with a "name=value" syntax are treated as fields. In the "name=value" syntax, the name portion refers to the field name while the value portion refers to the value of that field.

When you search for a name=value pair, the search returns only that data which contains the exact field name with the exact value.

Example

Suppose you run the following search query.

Source=Win-Hou-02

This search finds data with the Source field that has a value of Win-Hou-02. The search does not find data records with any other Source value. It also does not find data records with other fields that share the Win-Hou-02 value. Thus, searching for name=value pairs return results that are more focused than you get if you search for just the value.

During automatic field extraction, the field value is limited to the following characters. This means that when the product spots a name=value pair, it extracts the value only upto that point where the following list of characters occur. Any other character found (that is not part of the following list) determines the end point of the field value.

List of characters included in the field value

Letters (irrespective of case)
Numbers (0 to 9)
Underscore (_)
Hyphen (-)
Period (.)

For example, in the following sample data, searchText=COLLECTOR_NAME is extracted. The % character defines the end point of the field value.

Sample data
[06/Jan/2016:12:23:21 -0600] "GET /olaengine/rest/olaapi /admin/manage/searchSuggestions/ user?searchText=COLLECTOR_NAME%3D%22Tomcat%20Access%20Logs

Recommendation

If you want to capture and extract particular portions in your data that are important to your needs, but which are not automatically discovered and extracted, then you need to define fields for such portions. You can define fields by performing a data pattern based field extraction while creating a data pattern or you can perform a search-time field extraction by using particular search commands.

BMC recommends you to plan your fields beforehand and perform a data pattern based field extraction. For more information, see Data collection-time versus search-time field extraction.

Note

By default, automatic fields are extracted with the field type STRING.

To change the field type, you need to edit the data pattern, perform the following steps, and click Update:

Copy sample lines of your data containing details that you want to extract as a field in the Sample Text box.
Edit the primary pattern to capture the particular field for which you want to change the field type.
While naming this field, it is important that the name is unique and not already used.
Click Preview to validate the sample data entries and then change the field type for the field that you just added.

For more information, see Editing-or-cloning-data-patterns.

Default fields

For every data record (or event) that is indexed, the product assigns certain fields based on the inputs specified at the the time of creating a data collector or by certain default settings. These fields are known as default fields.

The following table provides a list of default fields:

User input	Field name
Name Refers to the name specified to identify the data collector	COLLECTOR_NAME
Server Name Indicates the host name, IP address, or fully qualified domain name of the computer from which the data entry originates. Can be used to locate data originating from a specific host.	HOST
Pattern Refers to the name of the data pattern used for creating the data collector	DATA_PATTERN
Absolute file path retrieved from one or more of the user inputs: File Path for Upload File data collector Script Path for Monitor Script Output over SSH and Monitor Script Output on Collection Agent Directory Path and Filename/Rolloverpattern for the Monitor File over Windows Share and Monitor File over SSH data collectors.	COLLECTOR
Fields extracted for the ProactiveNet (or TrueSight Infrastructure Management) events as defined by the bppm.reader.index.slotNames property in the custom directory for the Collection Station.	mc_host pn_object_id pn_object_class_id mc_parameter severity mc_incident_time mc_arrival_time
Fields extracted from change management data	Request_ID Submitter Submit_Date Last_Modified_Date Description Change_ID
Fields extracted from incident management data	Entry_ID Submit_Date Description Last_Modified_Date Submitter Incident_ID

Data pattern based field extraction

Data pattern based field extractions can help you capture important information in the data that is not already discovered and captured by automatic field extraction.

If your data follows a particular pattern or structure that can be easily identified, then you can capture data appearing in that pattern by creating a data pattern. You can capture such data by defining custom fields at the time of creating the data pattern. For more information about identifying fields in the data, see Understanding-fields.

The data pattern wizard automatically detects the date format and the portions of data that seem to follow some pattern. These data portions can be added as fields. The rest of the data is treated as miscellaneous details and is automatically extracted as free text. If you want to extract fields from the miscellaneous details or if you want to perform an even more advanced field extraction, you can save and later edit the data pattern. In the edit mode, you can customize the primary pattern to suit your needs. To be able to customize the primary pattern, you need the knowledge of Java regular expressions. For more information about editing the data pattern, see Editing-or-cloning-data-patterns.

At the time of adding fields, you need to provide the following inputs such as:

Field type – Defines the way in which fields must be stored in the data store. Storing fields with a field type enables you to use particular search commands to search fields effectively.
Field name – Defines the name by which the field must be saved.

Internal fields

The following fields are treated as internal fields:

details
SEQUENCE_ID
_ignore
utcdiffminutes
timestamp
_raw
RAW_EVENT_DATA

Internal fields are usually not available for searching. But you can use the timestamp field as a part of your search criteria. The timestamp field is added at the time of indexing a data record and can be most useful while using search commands. For example, you can use the timestamp field with the filter-search-command search command to display search results matching the filter criteria associated with the field.

Search-time field extraction

You can use the following search commands to extract fields at the time of searching.

extract-search-command can be used to extract field values or raw event data by using the Java regular expression capturing groups.
extractkv-search-command can be used to extract name=value pairs from raw event data depending on the delimiters specified.

Note that fields extracted at search-time are virtual fields and cannot be added to the Fields section on the Search page.

This kind of field extraction is more suited if your data does not follow any structure or pattern that can be easily identified or if you want to extract data that is not delimited by the standard equals sign.

Data collection-time versus search-time field extraction

Fields represent small portions of valuable knowledge in your data. To capture all the fields in your data that are not automatically extracted, you need to define custom fields.

Custom fields can be defined in the following ways:

During data collection: By performing a data pattern based extraction.
During search: By performing a search command based extraction.

As an administrator, it is important to consider the distinction between these two kinds of field extraction.

Both kinds of field extraction have their own advantages. However as a general rule, it is recommended that you plan your fields beforehand and extract them while creating a data pattern. With data pattern based field extraction, search can be faster. Also, this kind of field extraction gives you better control of your data and improved results. It also helps you run advanced search commands and find meaningful insights.

Both kinds of field extraction have some performance impact. The number of fields and the unique values per field determine the amount of performance impact. For more information, see Variables-that-impact-product-performance.

Data pattern based extraction

This kind of field extraction adds a cost on storage. However, search performance improves for all kinds of searches including advanced search commands such as timechart-search-command and stats-search-command. In this kind of extraction, parsing has to be done on all the data that is indexed and therefore CPU required is more. Thus, for this kind of extraction, more hardware resources are required.

Performing this kind of field extraction required some previous planning during data collection.

This kind of field extraction is more suited in the following scenarios:

If your data follows some structure or pattern that can be easily identified.
If you plan to use statistical operations by running advanced search commands.

Search command based extraction

This kind of extraction takes place only on the amount of data that satisfied the search criteria. This means the search is run on an already narrowed-down set of results.

Fields extracted at search-time are normally virtual fields. You cannot see the count of these fields in the Filters panel > Fields section. Any further processing on such fields negatively impacts search performance.

Extracting fields at search-time does not require any planning. You can decide which fields to extract at run time.

This kind of field extraction can be done by running search commands like extract-search-command and extractkv-search-command. To be able to run the extract-search-command command, you need knowledge of Java regular expressions.

This kind of field extraction is more suited in the following scenarios:

If your data does not follow any structure or pattern that can be easily identified.
If you want to extract data that is not delimited by the standard equals sign.

For more information, see the scenarios described at Search-time field extraction.

Use cases for performing search-time field extraction

The following scenarios provide an understanding of when search-time field extraction might be useful:

When the data does not follow a set pattern

If the data that you want to collect does not follow a set pattern, then it might be useful to perform search-time field extraction.

For example, suppose you want to extract the log level (warning) from the following sample data. In this scenario, you can use the extract command.The [confluence_table-plus] macro is a standalone macro and it cannot be used inline.
To see examples of the search queries that can be used for extracting various portions of the sample data, see extract-search-command.

When name-value pairs are delimited with other special characters

If your data contains name-value pairs that do not appear in the standard delimited syntax of "name=value", then you might want to perform a search-time field extraction.

For example, suppose you want to extract the start time and end time from the following sample data. In this scenario, you can use the extractkv command.The [confluence_table-plus] macro is a standalone macro and it cannot be used inline.
To see examples of the search queries that can be used for extracting various portions of the sample data, see the extractkv-search-command.

When the field name contains a period or other special characters

If name=value pairs exist in the data, but the name portion contains a period or some other special character consistently throughout the data, then you might want to perform a search-time field extraction.

For example, suppose you want to extract the CPU temperature only and not the disk temperature from the following sample data. In this scenario, you can use the extract command.The [confluence_table-plus] macro is a standalone macro and it cannot be used inline.
If you index the preceding sample data, note that the product automatically extracts name=value (for example, temperature=40). To be able to extract the values associated to CPU only, while running the extract-search-command command, you can use a regular expression to extract only CPU values.

In this scenario, you can run the following search query:

extract field=".*?cpu\.temperature=(?<cpuTemperature>\d+).*"