Creating a data pattern for a file with multiple data formats

This topic provides an example of creating a data pattern for indexing a log file with multiple data formats.

This example helps you answer questions such as:

How do I create a data pattern for a log file that contains multiple data formats?
How do I identify fields that are common and different across multiple data formats?
How do I edit the primary pattern to accommodate multiple data formats?

The following step-by-step instructions will help you understand the creation of the IBM Websphere - SystemErr data pattern that is available by default.

Note

The following steps enumerate the design of an existing, out-of-the-box data pattern that is available by default with the product. Therefore, this data pattern need not be created.

These steps are provided to help you understand the creation of a data pattern for indexing a log file that contains multiple data formats.

Sample text

[5/15/12 16:14:07:113 PDT] 00000025 SystemErr R com.ibm.ws.exception.RuntimeError:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
com.ibm.lang.management.MemoryMXBeanImpl (initialization failure)

[5/15/12 16:14:07:113 PDT] 00000025 SystemErr R at
com.ibm.ws.runtime.component.ApplicationMgrImpl.startApplication(ApplicationMgrImpl.java:789)

In the sample text, you can see sample data with two different formats. If you look at the second row in the sample text, you can see that it contains concrete information that you can easily use for extracting fields such as class name and line number. However, the first row does not follow the same pattern of information.

Step 1: Enter sample timestamp and click Auto-detect

Navigating to Administration > Data Patterns > Add Data Pattern.

Copy the timestamp from the sample text into the Sample Timestamp field, and click Auto-detect. Because the data pattern for this log file already exists, you can see the automatically detected date format "MM/dd/yy HH:mm:ss:SSS Z" that exactly matches the sample timestamp. Leave this date format unchanged and proceed further.

Copying the sample timestamp to automatically detect the appropriate date format

(Click the image to expand it)

Step 2: Editing the primary pattern to verify timestamp extraction

Edit the primary pattern to remove "%{Data:_ignore}\s*,” because the sample text contains no data before the timestamp to ignore.

Also, surround the timestamp with square brackets ([ ]) so that the primary pattern looks as follows:

\[%{IbmWebsphereTimestamp:timestamp}\]\s*%{MultilineEntry:details}

Surrounding the timestamp with square brackets is necessary to ignore the brackets and capture only the string (timestamp) that appears within the square brackets in the sample text.

Click Preview to verify that the timestamp was extracted correctly, as shown in the following figure.

Verifying the timestamp extraction

(Click the image to expand it)

Step 3: Editing the primary pattern to extract fields of interest

Before you edit the primary pattern for extracting fields of interest, you must understand the format of the data that appears in the two rows of sample text.

The following table shows a comparison of the data formats, to reveal similarities and differences. The sequence of the format displayed in the sample text (from left to right) is displayed as a numbered list (from top to bottom). The raw data and a corresponding description (added under the raw data) are displayed to help you understand the similarities and differences in the two formats.

Table of comparison for the two data formats

Additional information

In the preceding table, the portion "ApplicationMgrImpl.java" that appears after the function name and before the line number in the second row of the sample text is deliberately ignored. This information appears twice in the sample text and it is already covered as part of the fully qualified class name.

From the preceding table, you know that the primary pattern must be edited in such a way that the differences in the sample text formats are accommodated. To do so, proceed as follows.

Part 1 Edit the primary pattern to extract common fields

In the sample text, the timestamp is followed by the group ID that is common across the two data formats. To extract the group ID, edit the primary pattern as follows:

\[%{IbmWebsphereTimestamp:timestamp}\]\s%{Data:groupid}\s+%{MultilineEntry:details}

Click Preview to verify the "groupid" field extraction as shown in the following figure.

Extracting the group ID field

(Click the image to expand it)

The group ID is followed by a static text (SystemErr) that indicates the type of log file. This is not valuable information as you are already indexing the SystemErr.log file. Therefore, you do not need to extract this information as a separate field. You can add it as expected text after the group ID by editing the primary pattern as follows:

\[%{IbmWebsphereTimestamp:timestamp}\]\s%{Data:groupid}
\sSystemErr\s+%{MultilineEntry:details}

The static text is followed by the log level represented by "R" in the sample text.

Edit the primary pattern to extract this information (field "level") as follows:

\[%{IbmWebsphereTimestamp:timestamp}\]\s%{Data:groupid}
\sSystemErr\s+%{Data:level}\s+%{MultilineEntry:details}

Click Preview to verify the "level" field extraction as shown in the following figure.

Extracting the level field

(Click the image to expand it)

Part 2 Edit the primary pattern to extract different fields together

The data formats in the sample text differ after the log level information. Therefore, you must edit the primary pattern in a way that works for both data formats:

For row 1: You can extract all the information that appears after the log level information as the "details" field.
For row 2: You can extract multiple fields for this information by breaking it down into the fully qualified class name, followed by the function name, and finally the line number, as described in the preceding table of comparison.

Start by editing the primary pattern to extract fields for the second row of data, and then use the OR operator (|) to add the "details" field required for the first row of data, as follows:

\[%{IbmWebsphereTimestamp:timestamp}\]\s%{Data:groupid}\sSystemErr\s
+%{Data:level}\s+(?:at\s+%{GreedyData:class}\.%{Data:function}\
((?:.*:%{Data:linenum}|.*)\)|%{MultilineEntry:details})

Click Preview to verify the field extraction, as shown in the following figure.

Extracting additional fields added to accommodate differences in the data formats

(Click the image to expand it)

The "level" field type shows as INTEGER because an existing data pattern with the same field is marked with the field type INTEGER. In this case, the field type STRING is more relevant.

Click Cancel to exit without saving any changes.

Creating a data pattern for a file with multiple data formats

Sample text

Step 1: Enter sample timestamp and click Auto-detect

Step 2: Editing the primary pattern to verify timestamp extraction

Step 3: Editing the primary pattern to extract fields of interest

Part 1 Edit the primary pattern to extract common fields

Part 2 Edit the primary pattern to extract different fields together

BMC TrueSight IT Data Analytics 1.1

On this page