Creating a parsing rule


Parsing rule consists of a regular expression that helps you to parse the data present in your log files. Logs are parsed based on the format present in the logs. For more information, see Fluentd documentation.


Example

Here are the expression and date format for the Apache log format. These expressions are provided for all supported formats (wherever required) when you configure a parsing rule.

Expression (Apache): /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>(?:[^\"]|\\.)*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>(?:[^\"]|\\.)*)" "(?<agent>(?:[^\"]|\\.)*)")?$/
Time Format: %d/%b/%Y:%H:%M:%S %z

Log entry: 192.168.0.1 - - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0"

Parsed as:

time:
1362020400 (28/Feb/2013:12:00:00 +0900)
record:
{
"user" : nil,
"method" : "GET",
"code" : 200,
"size" : 777,
"host" : "192.168.0.1",
"path" : "/",
"referer": nil,
"agent" : "Opera/12.0"
}

Before you begin

Install the connector for log collection. For more information, see Installing-and-managing-connectors.

To create a parsing rule

  1. Click the Collection menu and select Parsing Rules.
  2. On the Parsing Rules page, click Create.
  3. Enter a unique name and description of the rule.
  4. From the Format list, select the log format present in your log files.
  5. Based on the log format, perform the steps as described in the following table:

    Format

    Steps

    Apache, Apache Error, Nginx, and Regexp

    For these formats, expression and supported date format are displayed in the Expression and Time Format fields. Update the expression or date format based on the expression and date format present in your log files. 

    To parse logs with a custom expression, use the Regexp format.

    Example

    Sample log:

    [Mon Jan 10 02:13:55 2022] [necessitatibus:notice] [pid 5441:tid 6660] [client 11.111.111.111:2222] The TCP bus is down, override the wireless capacitor so we can connect the XML interface!
    [Mon Jan 10 02:13:55 2022] [necessitatibus:info] [pid 9948:tid 2588] [client 22.222.222.22:3333] You can't bypass the program without programming the bluetooth HDD sensor!
    [Mon Jan 10 02:13:55 2022] [et:notice] [pid 4498:tid 4891] [client 111.111.111.1:4444] Programming the alarm won't do anything, we need to hack the 1080p EXE protocol!

    Default expression: /^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])? \[client (?<client>[^\]]*)\] (?<message>.*)$/

    Updated expression to parse logs without port number: /^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])? \[client (?<client>\d+\.\d+\.\d+\.\d+):\d+\].(?<message>.*)$/

    Important

    • Updates to the Apache default expression and time format is not supported.
    • If you are updating the default expression, ensure that you retain the time parameter that contains the log generation time.  


    ApacheLogs_withoutPort.png

    Java multiline

    Date, firstline, and time format expressions are displayed in the Format Firstline, Format 1, and Time format fields.

    To parse the following sample logs:

    2021-09-07 14:19:17 INFO [main] Generating some log messages 0
    2021-09-07 14:19:17 INFO [main] Sleeping for 1 second.
    2021-09-07 14:19:17 INFO [main] Generating some log messages 1

    Modify the default expression for multiline. Here is how you can modify the out-of-the-box expression (note the square brackets location in the expressions):
    Default: /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/

    Updated: /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) (?<thread>.*) \[(?<level>[^\s]+)\](?<message>.*)/

    To verify the expression, visit rubular or fluentular.

    Important

    If you are updating the default expression, ensure that you retain the time parameter that contains the log generation time.  

    Json

    In the Time Key field, enter the key or field in which time value is present in the logs. In the Time Format field, enter the time format present in your logs. 

    CSV

    In the Keys field, enter the field names (separated by comma) that you want to provide to the values in the CSV file in the order they appear in the file. In the Time Format field, enter the time format present in your CSV file.

    For example, a CSV contains the following values:

    2013/02/28 12:00:00,192.168.0.1,111,user1

    2013/02/28 12:00:00,192.168.0.1,112,user2

    2013/02/28 12:00:00,192.168.0.1,113,user3

    For this example, enter time,host,req_ID,user.

    The CSV is parsed as:

    ParsedCSV.png

    Important

    Ensure that one key value is time that contains the log generation time.

    No parser

    Logs are collected without any parser. Each log line is collected as a separate record. The @timestamp field of these collected logs will contain the log collection time. In the Parameter Name and Parameter Value fields, enter the log field name and new value you want to assign it in the logs that you are collecting.

Where to go from here

Creating-collection-policies

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*