extract search command

This search command can be used to extract field values or raw event data that it then assigns to new fields by using the Java regular expression capturing groups. The extract command can be used to specify a regular expression in such a way that it matches the target field value (or raw event data) that you want to extract and then assigns the extracted values to the new fields specified. The regular expression specified must exactly match the field value (or raw event data) in the search results.
This topic contains the following information:

For a list of all search commands, see Search commands.

Syntax

extract field=[<Source-Field>] "<Regex-Expression>"

In the preceding syntax, the following definitions apply:

  • <Source-Field> refers to the source field name that you want to use to extract particular information. Specifying this information is optional. If you do not specify a field name, the raw event data is used to extract particular information.
  • <Regex-Expression> refers to the Java regular expression (capturing groups) that you want to specify. This expression must be a combination of the regular expression and the new field or fields to which you want to assign the extracted information. This expression must be enclosed in double quotes (").
  • [Expression] indicates it is optional.

For more information about specifying the regular expression, see Specifying the regular expression correctly.

Short examples

Example 1: Extract the the log level data entry (example warning), and the corresponding component action data entry (example, VpxUtil_InvokeWithOpId) and assign the values to new fields, LogLevel and ComponentAction respectively.

... | extract field=".*?\[\d+\s+(?<LogLevel>\w+).*?\]\s+(?<ComponentAction>\w+).*"

Example 2: Extract two portions (host name and domain name) of the value of the HOST field and assign those values to two new fields, Hostname and Domainname.

... | extract field=HOST "(?<Hostname>[A-Za-z-]+)\.(?<Domainname>.+)"

Long examples

The following sample data and sample indexed data (displayed on the Search tab) will help you understand the examples of using the extract command. 

Sample data

2014-11-18T15:50:53.872+05:30 [03140 warning 'VpxProfiler' 
opID=SWI-661f7bb8] VpxUtil_InvokeWithOpId [TotalTime] took 2750 ms

Back to examples ↑

Sample indexed data

2014-11-18T15:50:53.872+05:30 [03140 warning 'VpxProfiler' 
opID=SWI-661f7bb8] VpxUtil_InvokeWithOpId [TotalTime] took 2750 ms
HOST=my-server.bmc.com |COLLECTOR_NAME=my_coll |opID=SWI-661f7bb8

Back to examples ↑

Extract log level and component action

In this example, you use the command to extract the log level (warningand the component (VpxUtil) along with the related action for that component (InvokeWithOpId).

When you run this command, two new fields are added to the output: LogLevel and ComponentAction.

Command

... | extract field=".*?\[\d+\s+(?<LogLevel>\w+).*?\]\s+(?<ComponentAction>\w+).*"

Output

2014-11-18T15:50:53.872+05:30 [03140 warning 'VpxProfiler' 
opID=SWI-661f7bb8] VpxUtil_InvokeWithOpId [TotalTime] took 2750 ms
HOST=my-server.bmc.com |COLLECTOR_NAME=my_coll |opID=SWI-661f7bb8 |LogLevel=warning |ComponentAction=VpxUtil_InvokeWithOpId

Back to examples ↑

Extract module name and time

In this example, you use the command to extract the module name (VpxProfiler) and the time taken by the module (in milliseconds) (2750) for executing the respective operation (opID=SWI-661f7bb8).

When you run this command, two new fields are added to the output: Module and TimeInms.

Command

... | extract field=".*?'(?<Module>\w+).*?(?<TimeInms>\d+)\sms"

Output

2014-11-18T15:50:53.872+05:30 [03140 warning 'VpxProfiler'
opID=SWI-661f7bb8] VpxUtil_InvokeWithOpId [TotalTime] took 2750 ms
HOST=my-server.bmc.com |COLLECTOR_NAME=my_coll |opID=SWI-661f7bb8 |Module=VpxProfiler |TimeInms=2750

Back to examples ↑

Extract value of the HOST field into two separate fields

In this example, you use the command to extract the host name (my-server) and domain name (bmc.com) separately from the value of the HOST field.

When you run this command, two new fields are added to the output: Hostname and Domainname.

Command

... | extract field=HOST "(?<Hostname>[A-Za-z-]+)\.(?<Domainname>.+)"

Output

2014-11-18T15:50:53.872+05:30 [03140 warning 'VpxProfiler'
opID=SWI-661f7bb8] VpxUtil_InvokeWithOpId [TotalTime] took 2750 ms
HOST=my-server.bmc.com |COLLECTOR_NAME=my_coll |opID=SWI-661f7bb8 |Hostname=my-server|Domainname=bmc.com

Back to examples ↑

Specifying the regular expression correctly

Suppose you have the following line in your indexed data:

ChartData found for searchId = 1401867925702, index = bw-2014-06-02-06-006, 
fetched records = 24, remaining = 52
HOST=my-server.bmc.com |COLLECTOR_NAME=my_coll

You want to extract the search ID information (1401867925702) and the index information (bw-2014-06-02-06-006).

If you use the following incorrect syntax, the actual output does not match the expected output.

Incorrect syntax

... | extract field=".*=\s*(?<SearchID>\d+).*=\s*(?<Index>\w+).*"

Expected outputActual output
SearchID=1401867925702 and Index=bw-2014-06-02-06-006SearchID=24 and Index=52

This discrepancy occurs because the regular expression (.*) is a greedy quantifier and tries to match the maximum possible data. To be able to extract the exact information that you are looking for, you must use a reluctant (nongreedy) quantifier.

You can use the following correct syntax to extract the exact information for search ID and index.

Correct syntax

... | extract field=".*?=\s*(?<SearchID>\d+).*?=\s*(?<Index>\w+).*"

Notes

  • The product supports only Java regular expressions that are compatible with Java Runtime Environment (JRE) version 1.6.
  • The regular expression that you specify in the command must match the specified field value or raw data entry.
  • You cannot use the default field names HOST, COLLECTOR_NAME, or DATA_PATTERN as the value of the target field.
     
Was this page helpful? Yes No Submitting... Thank you

Comments