Setting up data patterns to extract fields

A data pattern defines the way in which the data collected (semi-structured data) can be structured, indexed, and made available for searching. One of the primary functions of creating a data pattern is to specify fields that must be extracted from the data collected. Fields are name=value pairs that represent a grouping by which your data can be categorized. The fields that you specify at the time of creating a data pattern are added to each record in the data indexed, enabling you to both search effectively and carry out advanced analysis by using search commands. You can also assign a field type (data type) for the field values that you intend to extract at the time of creating a data pattern. Assigning a field type enables you to run specific search commands on the fields of a certain type and perform advanced analysis.

The Data Patterns tab allows you to configure data patterns that can be used by the data collectors for collecting data in the specified way. While creating a data collector, it is important that you select an appropriate data pattern. This is necessary so that the indexed data looks as you expected, with events categorized in multiple lines (raw event data), fields extracted, and time stamp extracted. The more appropriate the data pattern, the more chances that your search will be effective.

This topic contains the following information:

What is a data pattern made up of?

A data pattern is made up of a Java regular expression used for parsing the data file and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. The subpatterns included in the data pattern are displayed as fields in your search results. You can either provide a sample time stamp and sample text to enable automatic detection of the primary pattern and date format, or you can specify a custom primary pattern, date format, and subpatterns.

While performing a search, you can find data of a certain type (data pattern) compared to data coming from a single source file. You can do this by clicking the DATA_PATTERN field in one of your search results, so that it is added to your search criteria. For more information, see Understanding fields and tags.

When do I need to create a data pattern?

The product provides a list of default data patterns (mostly for log formats) that you can use directly to index your data files. Data patterns are available for most of the common log formats. Therefore, at most times you might not need to create a data pattern. You can directly use the defaults available. For more information about the default data patterns, see List of data patterns supported.

Note

Default data patterns are out-of-the-box data patterns provided with the product. You cannot edit or delete a default data pattern, but you can clone the default data pattern to create a copy and edit (or delete) the copy.

If you find that you need a custom data pattern for your particular data file, you can:

Clone an existing data pattern and customize it to suit your needs.
Create a new data pattern.

How do I know which data pattern is appropriate for my data file?

When you create a data collector, after you point to the actual data file, you might need to assign a data pattern. You can find the most relevant data patterns (available by default) that match your data file by using the Auto-Detect button. Along with the most relevant data patterns, you can also find the most relevant date formats. When you select a data pattern, you can also see a preview of how the records will be parsed. If you are not satisfied with the current preview, select another data pattern, and see the preview results. You can repeat this cycle until you are satisfied with the results shown.

If there is no suitable data pattern available, then you can select a suitable date format. By selecting a date format, you can index and categorize your data in a simple way whereby the time stamp is considered one category and the rest of the data in your data file is considered another category. Selecting a date format, as compared to a usual data pattern, will generally provide search results that might not accommodate a richer categorization of data. Furthermore, this might restrict advanced searching capabilities. If you select both a data pattern and a date format, then the date format is used to index the timestamp in the file and the data pattern is used to index rest of the data appearing in the file. For examples of date formats, see Sample-date-formats.

If there is no suitable data pattern found, you can do one of the following:

Create a new data pattern and use that for indexing the data.
Either select the Free Text with Timestamp option or create a new date format and use that for indexing the timestamp.
This approach captures the timestamp while the rest of the data appears in raw format.
If no time stamp exists in your data, select Free Text without Timestamp as an option.
By selecting free text as a data pattern, you can index all of your data to appear in a raw format based on the time when it is indexed by the product.
Warning
Note
If you use free text as a data pattern, each line in the data file is indexed as a new record.
If you find a data pattern that provides results that partly match your expectations, you can either retain that selection or choose to clone that data pattern and customize it to suit your needs.

Constructing a data pattern

A data pattern is formed of Java regular expressions used for parsing the data and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. When the data pattern is used, the Indexer captures all of the lines in your data, interprets the data structure, and eventually allows you to effectively search the data with the help of fields (name=value).

Fields act as a grouping mechanism that represents various categories for consistently repeated trends in your data. Before you create a data pattern, you need to identify the fields by which you would like to eventually search the data. For more information about identifying fields for creating a data pattern, see Identifying fields in the data file.

You can construct your data pattern depending on how searchable your data needs to be. You need to read your data file to identify additional sections or expressions that you might be interested in searching and investigating.

A data pattern constitutes of a primary pattern and supporting subpatterns that helps index your data in the specified way.

Data pattern = Primary pattern + Supporting subpatterns

where,

Item	Description	Example
Primary pattern	A primary pattern defines the high-level structure in which the data appears. It includes a list of fields that you want to extract from the data. The sequence of fields added in the primary pattern must exactly match the format in which the data appears. Each field represents a grouping in the data file. Therefore, the sequence in which the various groups appear must be reflected in the sequence of the fields in the primary pattern. The primary pattern consists of multiple subpatterns combined together in a certain order. The order is important, because the product parses the data and shows it as records in the same order as specified in the data pattern. The primary pattern also acts as a record delimiter. This means that each time the line in your data file matches the primary pattern regular expression, that line marks as the beginning of a new record.	%{Mytimestamp:timestamp} \[%{Data:debuglevel}\] %{Data:component} - \[Thread=%{Data:threadid}\] %{Ip:clientip} - %{MultilineEntry:details} You can see that in this primary pattern, the Ip subpattern example is used.
Subpattern	Subpatterns define the supporting details for particular sections in the primary pattern. Every subpattern can be reused to further construct more subpatterns or primary patterns. The syntax for reusing a subpattern is %{<subpattern-logical-name>:<field-name>} where specifying the field name is optional. If the field name is not specified, then it cannot be extracted as a field.	Ip=(?<![0-9])(?:(?:25[0-5]\|2[0-4] [0-9]\|[0-1][0-9]{1,2})[.](?:25[0-5]\| 2[0-4][0-9]\|[0-1]?[0-9]{1,2})[.] (?:25[0-5]\|2[0-4][0-9]\|[0-1]? [0-9]{1,2})[.](?:25[0-5]\|2[0-4] [0-9]\|[0-1]?[0-9]{1,2}))(?![0-9]) where, Ip is the logical name of the subpattern, and the rest of the expression is the corresponding subpattern value. You can see that this subpattern is used in the primary pattern example.

For more information about data pattern examples, see Sample-data-patterns.

For more information about subpatterns examples, see Sample-subpatterns.

For more information about the various field inputs for creating a data pattern, see Adding a new data pattern.

How to identify fields in the data file

Before you create a data pattern, you need to analyze your data file to find out if the file follows a certain pattern that can be captured while creating the data pattern. Doing this can be useful while performing advanced field extraction at the time of editing or cloning a data pattern.

Suppose you want create a data pattern to index the following data:

Apr 24, 2014 03:16:40 PM configservice WARN: No configuration found.

Apr 24, 2014 03:16:44 PM dbservice INFO: Starting Schema Apr 24, 2014 03:16:44 PM
dbservice INFO: CONFIGURATIONS table exists in the database.

Apr 24, 2014 03:16:44 PM dbservice INFO: Executing Query to check init property:
select * from CONFIGURATIONS where userName = 'admin' and propertyName ='init'

Apr 24, 2014 03:16:44 PM dbservice INFO: init property exists in CONFIGURATIONS table.

In the preceding lines, every new line starts with the time stamp. And you will notice that the file follows a consistent pattern.

The following information (groups) appears in the preceding lines from left to right:

Time stamp
Component name
Debug information
Application message

For each of the preceding groups, you can assign a field.

Tip

If your data contains miscellaneous details that cannot be categorized under a specific field, then you can assign the details field for such miscellaneous details. At the time of indexing, the details field is ignored, but all name=value pairs appearing in the section to which this field is applied are automatically extracted as fields.

Additional information

In the preceding sample lines, suppose you think that extracting the word "CONFIGURATIONS" as a field might be useful. But this word does not appear consistently on each line.

If you decide to extract this portion as a field, the product might skip the other lines that do not contain the word, in which case those lines will not be available for searching. In this scenario, you can use one of the following options:

Extract this field at search-time by using search commands such as extract.
Extract this field while creating the data pattern. And then while creating the data collector, ensure that you enable the Best Effort Collection setting under the advanced options.

Can I use data patterns created in English in other languages?

TrueSight IT Data Analytics is tested on localized operating system platforms and browsers in a list of supported languages. For more information, see Language information for IT Data Analytics

You can use data patterns created in English in the list of supported languages. To do this, you need to edit or clone the data pattern and change the Locale setting to the desired language and save the data pattern. After changing the Locale setting, you can also specify date formats and subpatterns in the language selected. For more information, see Editing-or-cloning-data-patterns.

Understanding field types

Data can be be of various types such as, numeric and alphanumeric. A field type defines the type of data that the value of a field can take. Also, the field type defines how the fields are stored in the data store. Storing fields with the correct field type enables you to perform an effective search and also enables you to use numeric operations of search commands.

The following example can help you understand how field types can be useful.

Example

Suppose you are analyzing Apache web server log files.

You want to find all requests that failed with various error response codes, such as:

404 Not Found
500 Internal Server Error
503 Service Unavailable

Suppose the preceding response codes are extracted as values of the response field.

In this scenario, you can run the following search query on the response field:

clientip=10.154.152.113 && (response >= 400)

This search query indicates that you want to find all data coming from the client IP address 10.154.152.113 and where the value of the response field is greater than 400.

For this query to provide the correct results, the response field must be saved with a numeric field type as opposed to an alphanumeric field type.

When you create a new data pattern, you plan the fields that you want to extract, and you set the field type for each field. For example, if you want to extract a field that is expected to have alphanumeric values, you set the field type to STRING, but if you want to extract a field that is expected to have numeric values only, you set the field type to INTEGER, LONG, DOUBLE, OR FLOAT.

Note

The capability of assigning field types is not available for fields that are automatically extracted by the product.

The following table describes the various field types supported:

Field type	Description
INTEGER	Whole numbers from 0 to 2,147,483,647 (2,147,483,647 = 2³¹-1)
LONG	Long integer values from 0 to 9,223,372,036,854,775,807 (9,223,372,036,854,775,807 = 2⁶³-1)
STRING	(Default) Text or numeric value
DOUBLE	Floating point numbers with double precision from 0 to 1.79769e+308.
FLOAT	Floating point numbers with single precision from 0 to 3.40282e+038.

The preceding field types are available for selection only if they are found relevant to the field value.

Functions available for creating and using data patterns

When you are creating a data pattern, the following functions are available:

Function	Description
Automatically detect date formats and primary pattern	You can copy a few lines from your data file in the Sample Text field, copy the time stamp in the Timestamp field, and then click Auto-detect to automatically detect the primary pattern and date format relevant to the your sample data. If you are not satisfied with the date formats detected by the product, you can define your own custom date format.
Allow multiline entries	You can select the Multiline Entry check box to accommodate log patterns that have multiple line entries.
Validate primary pattern results	After specifying the primary pattern, you can click Preview to see how the parsed data entries might look. You can use this feature for validating the sample log results and for experimenting with and refining your primary pattern until you are satisfied with the results it can achieve.
Search for available subpatterns by name or value	While creating a primary pattern, you can search for subpatterns from a list of default subpatterns, to find the ones that might be relevant to the kind of data pattern you are about to create. You can find subpatterns by an expected name (or Java regular expression) or by the expected subpattern value.
Create custom subpatterns and test them for accuracy	If you do not find subpatterns that suit your needs, you can create a new subpattern. While creating a new subpattern, you can provide some sample text and easily test if that text matches the subpattern expression that you specified.

At the time of creating a data collector, the following functions related to data patterns are available:

Functions available while creating a data collector

Function	Description
Filter relevant data patterns	Click Auto-Detect next to the Pattern field to automatically detect the data patterns that match your data file.
Preview results to validate or modify data pattern selection	Manually select one of the data patterns and click Preview to see how the parsed data records look. This capability is also available when you filter relevant data patterns by using the Auto-Detect function. If the selected data pattern does not satisfy your needs, you can select another data pattern and see a preview of the data records, until you are satisfied with the results.

Icons and associated functions on the Data Patterns tab

The Administration > Data Patterns tab allows you to view and manage your data patterns. From here, you can perform the following actions:

The Data Patterns tab provides the following information: