Logs parsing and filtering


Data collection in BMC Helix Log Analytics is based on open source data collectors to collect logs from different data sources or applications deployed in your environment. Logs are parsed based on the format present in the logs. The following formats are supported:

Format

Supported in logs collection from files

Supported in logs collection from Amazon Web Services

Apache

✅️

✅️

Apache error

✅️

✅️

Nginx

✅️

✅️

Regexp

✅️

✅️

Java multiline

✅️

❌️

Json

✅️

✅️

CSV

✅️

✅️

No Parser

✅️

✅️

Logs parsing

Logs are parsed through a parser before collection and the parsed logs are displayed in the Discover tab in BMC Helix Log Analytics. A log expression informs the parser what information is present in the logs. You can also use the expression to filter logs for collection.

Let's look at an example to help understand parsing. Here are the expression and date format for Apache. These expressions are provided for all supported formats (wherever required) when you configure a log collection.

Expression (Apache): /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>(?:[^\"]|\\.)*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>(?:[^\"]|\\.)*)" "(?<agent>(?:[^\"]|\\.)*)")?$/
Time Format: %d/%b/%Y:%H:%M:%S %z

Log entry: 192.168.0.1 - - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0"

Parsed as:

time:
1362020400 (28/Feb/2013:12:00:00 +0900)
record:
{
"user" : nil,
"method" : "GET",
"code" : 200,
"size" : 777,
"host" : "192.168.0.1",
"path" : "/",
"referer": nil,
"agent" : "Opera/12.0"
}

To parse logs with different expressions, you can either update the default expressions or use a custom format.

For more information, see Fluentd documentation.

Logs filtering

After the logs are parsed, you can filter the logs to include relevant log data and exclude data that you do not require. For example, you set up the following grep configurations

GrepFilter.png

 

Sample logs:
The value of the message field contains cool.
The value of the hostname field matches 
web<INTEGER>.example.com.
The value of the message field does NOT contain uncool.

The following logs are collected:
{"message":"It's cool outside today", "hostname":"web001.example.com"}
{"message":"That's not cool", "hostname":"web1337.example.com"}
The following logs are excluded:
{"message":"I am cool but you are uncool", "hostname":"db001.example.com"}
{"hostname":"web001.example.com"}
{"message":"It's cool outside today"}

Configuring parsing and filtering for logs collection

The following table lists steps to use formats and the grep function:

Format

Description

Apache, Apache Error, Nginx, and Regexp

For these formats, expression and supported date format are displayed in the Expression and Time Format fields. Update the expression or date format based on the expression and date format present in your log files. 

Example

Sample log:

[Mon Jan 10 02:13:55 2022] [necessitatibus:notice] [pid 5441:tid 6660] [client 11.111.111.111:2222] The TCP bus is down, override the wireless capacitor so we can connect the XML interface!
[Mon Jan 10 02:13:55 2022] [necessitatibus:info] [pid 9948:tid 2588] [client 22.222.222.22:3333] You can't bypass the program without programming the bluetooth HDD sensor!
[Mon Jan 10 02:13:55 2022] [et:notice] [pid 4498:tid 4891] [client 111.111.111.1:4444] Programming the alarm won't do anything, we need to hack the 1080p EXE protocol!

Default expression: /^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])? \[client (?<client>[^\]]*)\] (?<message>.*)$/

Updated expression to parse logs without port number: /^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])? \[client (?<client>\d+\.\d+\.\d+\.\d+):\d+\].(?<message>.*)$/

Notes

  • Updates to the Apache default expression and time format is not supported.
  • If you are updating the default expression, ensure that you retain the time parameter that contains the log generation time.  


ApacheLogs_withoutPort.png

To filter logs
  1. From the Log Filter list, select Grep.
  2. From the Directive field, select Regex (to include logs) or Exclude.
  3. In the Key field, enter the key from the log expression.
    Get the keys from the log expression. For example, in the Apache expression, host, user, time, method, path, code, size, refer, and agent are keys.
  4. In the Pattern field, enter the value to be included or excluded, enclosed within forward slashes (//).
  5. Click + to add another grep expression.
    Here is an example:
    GrepFilter.png
    Sample logs:
    The value of the message field contains cool.
    The value of the hostname field matches 
    web<INTEGER>.example.com.
    The value of the message field does NOT contain uncool.

    The following logs are collected:
    {"message":"It's cool outside today", "hostname":"web001.example.com"}
    {"message":"That's not cool", "hostname":"web1337.example.com"}
    The following logs are excluded:
    {"message":"I am cool but you are uncool", "hostname":"db001.example.com"}
    {"hostname":"web001.example.com"}
    {"message":"It's cool outside today"}

Java multiline

Date, firstline, and time format expressions are displayed in the Format Firstline, Format 1, and Time format fields.

To parse the following sample logs:

2021-09-07 14:19:17 INFO [main] Generating some log messages 0
2021-09-07 14:19:17 INFO [main] Sleeping for 1 second.
2021-09-07 14:19:17 INFO [main] Generating some log messages 1

Modify the default expression for multiline. Here is how you can modify the out-of-the-box expression (note the square brackets location in the expressions):
Default: /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/

Updated: /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) (?<thread>.*) \[(?<level>[^\s]+)\](?<message>.*)/

To verify the expression, visit rubular or fluentular.

Note

If you are updating the default expression, ensure that you retain the time parameter that contains the log generation time.  

To filter logs
  1. From the Log Filter list, select Grep.
  2. From the Directive field, select Regex (to include logs) or Exclude.
  3. In the Key field, enter the key from the log expression.
    You can get the keys from the log expression. For example, in the Java multiline expression, time, thread, level, and message are keys.
  4. In the Pattern field, enter the value to be included or excluded, enclosesd within forward slashes (//).
  5. Click + to add another grep expression.
    Here is an example:
    GrepFilter1.png
    Sample logs:
    The value of the message field contains cool.
    The value of the message field does NOT contain uncool.

    The following logs are collected:
    {"message":"It's cool outside today"}
    The following logs are excluded:
    {"message":"I am cool but you are uncool"}

Json

In the Time Key field, enter the key or field in which time value is present in the logs. In the Time Format field, enter the time format present in your logs. 

To filter logs
  1. From the Log Filter list, select Grep.
  2. From the Directive field, select Regex (to include logs) or Exclude.
  3. In the Key field, enter the key from the log expression.
    Get the keys from logs. For example, you have the following log entry: {"time":1362020400,"host":"111.111.0.1","size":777,"method":"PUT"}. Here, you have the following keys: time, host, size, and method.
  4. In the Pattern field, enter the value to be included or excluded, enclosesd within forward slashes (//).
  5. Click + to add another grep expression.
    Here is an example:
    GrepFilterJSON.png
    Sample log:
    {"time":1362020400,"host":"111.111.0.1","size":777,"method":"PUT"}
    {"time":1362020400,"host":"111.111.0.1","size":777,"method":"POST"}
    {"time":1362020400,"host":"111.111.0.1","size":777,"method":"GET"}

    The following logs are collected:
    {"time":1362020400,"host":"111.111.0.1","size":777,"method":"GET"}
    The following logs are excluded:
    {"time":1362020400,"host":"111.111.0.1","size":777,"method":"PUT"}
    {"time":1362020400,"host":"111.111.0.1","size":777,"method":"POST"}

CSV

In the Keys field, enter the field names (separated by comma) that you want to provide to the values in the CSV file in the order they appear in the file. In the Time Format field, enter the time format present in your CSV file.

For example, a CSV contains the following values:

2013/02/28 12:00:00,192.168.0.1,111,user1

2013/02/28 12:00:00,192.168.0.1,112,user2

2013/02/28 12:00:00,192.168.0.1,113,user3

For this example, enter time,host,req_ID,user.

The CSV is parsed as:

ParsedCSV.png

Note

Ensure that one key value is time that contains the log generation time.

To filter logs
  1. From the Log Filter list, select Grep.
  2. From the Directive field, select Regex (to include logs) or Exclude.
  3. In the Key field, enter the key from the log expression.
    Keys are the field names that you entered for the columns in the CSV file.
  4. In the Pattern field, enter the value to be included or excluded, enclosesd within forward slashes (//).
  5. Click + to add another grep expression.
    Here is an example:
    GrepFilterCSV.png
    Sample CSV format:
    2013/02/28 12:00:00,111.111.0.1,111,user1
    2013/02/28 12:00:00,111.111.0.1,111,user2
    2013/02/28 12:00:00,111.111.0.1,111,user3

    The following logs are collected:
    2013/02/28 12:00:00,111.111.0.1,111,user2
    2013/02/28 12:00:00,111.111.0.1,111,user3
    The following logs are excluded:
    2013/02/28 12:00:00,111.111.0.1,111,user1

No parser

Logs are collected without any parser. Each log line is collected as a separate record. The @timestamp field of these collected logs will contain the log collection time.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*