Managing data patterns

A data pattern defines the way in which the data collected (semi-structured data) can be structured, indexed, and made available for searching. One of the primary functions of creating a data pattern is to specify fields that must be extracted from the data collected. Fields are name=value pairs that represent a grouping by which your data can be categorized. The fields that you specify at the time of creating a data pattern are added to each record in the data indexed, enabling you to both search effectively and carry out advanced analysis by using search commands. You can also assign a field type (category: an integer, a string, or a long integer) for each of the fields that you intend to get extracted. Assigning a field type enables you to run specific search commands on the fields of a certain type and perform advanced analysis.

The Data Patterns tab allows you to configure data patterns that can be used by the data collectors for collecting data in the specified way. While creating a data collector, it is important that you select an appropriate data pattern. This is necessary so that the indexed data looks as you expected, with events categorized in multiple lines (raw event data), fields extracted, and time stamp extracted. The more appropriate the data pattern, the more chances that your search will be effective.

To understand when and how to create a data pattern, see the following video:

This topic contains the following information:

What is a data pattern made up of?

A data pattern is made up of a Java regular expression used for parsing the data file and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. The subpatterns included in the data pattern are displayed as fields in your search results. You can either provide a sample time stamp and sample text to enable automatic detection of the primary pattern and date format, or you can specify a custom primary pattern, date format, and subpatterns.

While performing a search, you can find data of a certain type (data pattern) compared to data coming from a single source file. You can do this by clicking the DATA_PATTERN field in one of your search results, so that it is added to your search criteria. For more information, see Understanding fields and tags.

When do I need to create a data pattern?

The product provides a list of default data patterns (mostly for log formats) that you can use directly to index your data files. Data patterns are available for most of the common log formats. Therefore, at most times you might not need to create a data pattern. You can directly use the defaults available. For more information about the default data patterns, see List of data patterns supported.

Note

Default data patterns are out-of-the-box data patterns provided with the product. You cannot edit or delete a default data pattern, but you can clone the default data pattern to create a copy and edit (or delete) the copy.

If you find that you need a custom data pattern for your particular data file, you can:

Clone an existing data pattern and customize it to suit your needs.
Create a new data pattern.

How do I know which data pattern is appropriate for my data file?

When you create a data collector, after you point to the actual data file while assigning a data pattern, you can apply a filter to find the most relevant data patterns (available by default) that match your data file. You can select one of the options that seems most appropriate and look at a preview of the records. If you are not satisfied with the current preview, you can refresh the data patterns filter that you applied earlier, select another data pattern, and see the preview results. You can repeat this cycle until you are satisfied with the results shown.

If there is no suitable data pattern available, then suitable date formats are displayed. This happens provided that your data file contains a date and time stamp. By selecting a date format as a data pattern, you can index and categorize your data in a simple way whereby the time stamp is considered one category and the rest of the data in your data file is considered another category. Selecting a date format, as compared to a usual data pattern, will generally provide search results that might not accommodate a richer categorization of data. Furthermore, this might restrict advanced searching capabilities. For examples of date formats, see Sample-date-formats.

If there is no suitable data pattern found and no time stamp exists in your data, then you can select free text as an option. By selecting free text as a data pattern, you can index all of your data to appear in a raw format based on the time when it is indexed by the product.

Note

If you use free text as a data pattern, each line in the data file is indexed as a new record.

If you find a data pattern that provides results that partly match your expectations, you can either retain that selection or choose to clone that data pattern and customize it to suit your needs. Finally, while creating a data collector, you can select the customized data pattern. If none of the options available for selecting a data pattern suit your needs, you can choose to create a new data pattern.

Constructing a data pattern

A data pattern is formed of Java regular expressions used for parsing the data and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. When the data pattern is used, the Indexer captures all of the lines in your data, interprets the data structure, and eventually allows you to effectively search the data with the help of fields (name=value).

Fields act as a grouping mechanism that represents various categories for consistently repeated trends in your data. Before you create a data pattern, you need to identify the fields by which you would like to eventually search the data. For more information about identifying fields for creating a data pattern, see Identifying fields in the data file.

You can construct your data pattern depending on how searchable your data needs to be. You need to read your data file to identify additional sections or expressions that you might be interested in searching and investigating.

A data pattern constitutes of a primary pattern and supporting subpatterns that helps index your data in the specified way.

Data pattern = Primary pattern + Supporting subpatterns

where,

Item	Description	Example
Primary pattern	A primary pattern defines the high-level structure in which the data appears. It includes a list of fields that you want to extract from the data. The sequence of fields added in the primary pattern must exactly match the format in which the data appears. Each field represents a grouping in the data file. Therefore, the sequence in which the various groups appear must be reflected in the sequence of the fields in the primary pattern. The primary pattern consists of multiple subpatterns combined together in a certain order. The order is important, because the product parses the data and shows it as records in the same order as specified in the data pattern. The primary pattern also acts as a record delimiter. This means that each time the line in your data file matches the primary pattern regular expression, that line marks as the beginning of a new record.	%{Mytimestamp:timestamp} \[%{Data:debuglevel}\] %{Data:component} - \[Thread=%{Data:threadid}\] %{Ip:clientip} - %{MultilineEntry:details} You can see that in this primary pattern, the Ip subpattern example is used.
Subpattern	Subpatterns define the supporting details for particular sections in the primary pattern. Every subpattern can be reused to further construct more subpatterns or primary patterns. The syntax for reusing a subpattern is %{<sub-pattern-logical-name>:<field-name>} where specifying the field name is optional. If the field name is not specified, then it cannot be extracted as a field.	Ip=(?<![0-9])(?:(?:25[0-5]\|2[0-4] [0-9]\|[0-1][0-9]{1,2})[.](?:25[0-5]\| 2[0-4][0-9]\|[0-1]?[0-9]{1,2})[.] (?:25[0-5]\|2[0-4][0-9]\|[0-1]? [0-9]{1,2})[.](?:25[0-5]\|2[0-4] [0-9]\|[0-1]?[0-9]{1,2}))(?![0-9]) where, Ip is the logical name of the subpattern, and the rest of the expression is the corresponding subpattern value. You can see that this subpattern is used in the primary pattern example.

For more information about an overall use case of creating a data pattern, see Examples-of-creating-a-data-pattern.

For more information about data pattern examples, see Sample-data-patterns.

For more information about subpatterns examples, see Sample-subpatterns.

For more information about the various field inputs for creating a data pattern, see Adding a new data pattern.

How to identify fields in the data file

Before you create a data pattern, you need to analyze your data file to find out if the file follows a certain pattern that can be captured while creating the data pattern.

Suppose you want create a data pattern to index the following data:

Apr 24, 2014 03:16:40 PM configservice WARN: No configuration found.

Apr 24, 2014 03:16:44 PM dbservice INFO: Starting Schema Apr 24, 2014 03:16:44 PM
dbservice INFO: CONFIGURATIONS table exists in the database.

Apr 24, 2014 03:16:44 PM dbservice INFO: Executing Query to check init property:
select * from CONFIGURATIONS where userName = 'admin' and propertyName ='init'

Apr 24, 2014 03:16:44 PM dbservice INFO: init property exists in CONFIGURATIONS table.

In the preceding lines, every new line starts with the time stamp. If you try to analyze a consistently followed format in these lines, you will notice that each line provides the following information, presented in the same order as they appear in the lines:

Time stamp
Component name
Debug information
Application message

To identify what can be fields, you need to identify the format that is being consistently followed in the data file, and then identify how this information can be grouped. You can create a field for each grouping. This ensures that all information that can be categorized into a group is indexed and is available for search.

Tip

You can assign the details field for all miscellaneous information that you do not want to categorize with a specific field. At the time of indexing, the details field is ignored, but all name=value pairs in the section to which this field is applied are extracted as fields.

Suppose that you think that the word CONFIGURATIONS might be useful to extract as a group. However, you must not create a field for this word, because it does not appear consistently on each line. If you choose this word as a field, the product might skip the other lines that do not contain the word, and those lines will not be available for searching. You can use search commands such as extract to find such expressions that you might have wanted to choose while creating a data pattern.

Functions available for creating and using data patterns

When you are creating a data pattern, the following functions are available:

Function	Description
Automatically detect date formats and primary pattern	You can copy a few lines from your data file in the Sample Text field, copy the time stamp in the Timestamp field, and then click Auto-detect to automatically detect the primary pattern and date format relevant to the your sample data. If you are not satisfied with the date formats detected by the product, you can define your own custom date format.
Allow multiline entries	You can select the Multiline Entry check box to accommodate log patterns that have multiple line entries.
Validate primary pattern results	After specifying the primary pattern, you can click Preview to see how the parsed data entries might look. You can use this feature for validating the sample log results and for experimenting with and refining your primary pattern until you are satisfied with the results it can achieve.
Search for available subpatterns by name or value	While creating a primary pattern, you can search for subpatterns from a list of default subpatterns, to find the ones that might be relevant to the kind of data pattern you are about to create. You can find subpatterns by an expected name (or Java regular expression) or by the expected subpattern value.
Create custom subpatterns and test them for accuracy	If you do not find subpatterns that suit your needs, you can create a new subpattern. While creating a new subpattern, you can provide some sample text and easily test if that text matches the subpattern expression that you specified.

At the time of creating a data collector, the following functions related to data patterns are available:

Functions available while creating a data collector

Function	Description
Filter relevant data patterns	You can filter the relevant data patterns (by using the Filter relevant data pattern icon next to the Pattern field) to automatically detect the data patterns that match your data file.
Preview results to validate or modify data pattern selection	You can select the data pattern that you think might be most appropriate and use the preview option (by using the Preview parsed log entries icon next to the Pattern field) to see how the parsed data records look. If the selected data pattern does not satisfy your needs, you can select another data pattern and again see a preview of the data records, until you are satisfied with the results.

Function

Description

Filter relevant data patterns

You can filter the relevant data patterns (by using the filter icon.jpg Filter relevant data pattern icon next to the Pattern field) to automatically detect the data patterns that match your data file.

Preview results to validate or modify data pattern selection

You can select the data pattern that you think might be most appropriate and use the preview option (by using the preview option.png Preview parsed log entries icon next to the Pattern field) to see how the parsed data records look. If the selected data pattern does not satisfy your needs, you can select another data pattern and again see a preview of the data records, until you are satisfied with the results.

Viewing configured data patterns

The Administration > Data Patterns tab allows you to view and manage your data patterns. From here, you can perform the following actions:

The Data Patterns tab provides the following information: