Setting up data patterns to extract fields
A data pattern defines the way in which the data collected (semi-structured data) can be structured, indexed, and made available for searching. One of the primary functions of creating a data pattern is to specify that must be extracted from the data collected. Fields are name=value pairs that represent a grouping by which your data can be categorized. The fields that you specify at the time of creating a data pattern are added to each record in the data indexed, enabling you to both search effectively and carry out advanced analysis by using search commands. You can also assign a field type (data type) for the field values that you intend to extract at the time of creating a data pattern. Assigning a field type enables you to run specific search commands on the fields of a certain type and perform advanced analysis.
The Data Patterns tab allows you to configure data patterns that can be used by the data collectors for collecting data in the specified way. While creating a data collector, it is important that you select an appropriate data pattern. This is necessary so that the indexed data looks as you expected, with events categorized in multiple lines (raw event data), fields extracted, and time stamp extracted. The more appropriate the data pattern, the more chances that your search will be effective.
This topic contains the following information:
What is a data pattern made up of?
A data pattern is made up of a Java regular expression used for parsing the data file and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. The subpatterns included in the data pattern are displayed as fields in your search results. You can either provide a sample time stamp and sample text to enable automatic detection of the primary pattern and date format, or you can specify a custom primary pattern, date format, and subpatterns.
When do I need to create a data pattern?
Default data patterns are out-of-the-box data patterns provided with the product. You cannot edit or delete a default data pattern, but you can clone the default data pattern to create a copy and edit (or delete) the copy.
If you find that you need a custom data pattern for your particular data file, you can:
- Clone an existing data pattern and customize it to suit your needs.
- Create a new data pattern.
How do I know which data pattern is appropriate for my data file?
When you create a data collector, after you point to the actual data file, you might need to assign a data pattern. You can find the most relevant data patterns (available by default) that match your data file by using the Auto-Detect button. Along with the most relevant data patterns, you can also find the most relevant date formats. When you select a data pattern, you can also see a preview of how the records will be parsed. If you are not satisfied with the current preview, select another data pattern, and see the preview results. You can repeat this cycle until you are satisfied with the results shown.
If there is no suitable data pattern available, then you can select a suitable date format. By selecting a date format, you can index and categorize your data in a simple way whereby the time stamp is considered one category and the rest of the data in your data file is considered another category. Selecting a date format, as compared to a usual data pattern, will generally provide search results that might not accommodate a richer categorization of data. Furthermore, this might restrict advanced searching capabilities. If you select both a data pattern and a date format, then the date format is used to index the timestamp in the file and the data pattern is used to index rest of the data appearing in the file. For examples of date formats, see Sample date formats.
If there is no suitable data pattern found, you can do one of the following:
Either select the Free Text with Timestamp option or create a new date format and use that for indexing the timestamp.
This approach captures the timestamp while the rest of the data appears in raw format.
If no time stamp exists in your data, select Free Text without Timestamp as an option.
By selecting free text as a data pattern, you can index all of your data to appear in a raw format based on the time when it is indexed by the product.
If you use free text as a data pattern, each line in the data file is indexed as a new record.
- If you find a data pattern that provides results that partly match your expectations, you can either retain that selection or choose to clone that data pattern and customize it to suit your needs.
Constructing a data pattern
A data pattern is formed of Java regular expressions used for parsing the data and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. When the data pattern is used, the Indexer captures all of the lines in your data, interprets the data structure, and eventually allows you to effectively search the data with the help of fields (name=value).
Fields act as a grouping mechanism that represents various categories for consistently repeated trends in your data. Before you create a data pattern, you need to identify the fields by which you would like to eventually search the data. For more information about identifying fields for creating a data pattern, see Identifying fields in the data file.
You can construct your data pattern depending on how searchable your data needs to be. You need to read your data file to identify additional sections or expressions that you might be interested in searching and investigating.
A data pattern constitutes of a primary pattern and supporting subpatterns that helps index your data in the specified way.
Data pattern = Primary pattern + Supporting subpatterns
A primary pattern defines the high-level structure in which the data appears.
It includes a list of fields that you want to extract from the data.
The sequence of fields added in the primary pattern must exactly match the format in which the data appears. Each field represents a grouping in the data file. Therefore, the sequence in which the various groups appear must be reflected in the sequence of the fields in the primary pattern.
The primary pattern consists of multiple subpatterns combined together in a certain order. The order is important, because the product parses the data and shows it as records in the same order as specified in the data pattern.
The primary pattern also acts as a record delimiter. This means that each time the line in your data file matches the primary pattern regular expression, that line marks as the beginning of a new record.
You can see that in this primary pattern, the
Subpatterns define the supporting details for particular sections in the primary pattern.
Every subpattern can be reused to further construct more subpatterns or primary patterns.
The syntax for reusing a subpattern is
You can see that this subpattern is used in the primary pattern example.
For more information about data pattern examples, see Sample data patterns.
For more information about subpatterns examples, see Sample subpatterns.
How to identify fields in the data file
Before you create a data pattern, you need to analyze your data file to find out if the file follows a certain pattern that can be captured while creating the data pattern. Doing this can be useful while performing advanced field extraction at the time of editing or cloning a data pattern.
Suppose you want create a data pattern to index the following data:
In the preceding lines, every new line starts with the time stamp. And you will notice that the file follows a consistent pattern.
The following information (groups) appears in the preceding lines from left to right:
- Time stamp
- Component name
- Debug information
- Application message
For each of the preceding groups, you can assign a field.
If your data contains miscellaneous details that cannot be categorized under a specific field, then you can assign the details field for such miscellaneous details. At the time of indexing, the details field is ignored, but all name=value pairs appearing in the section to which this field is applied are automatically extracted as fields.
In the preceding sample lines, suppose you think that extracting the word "CONFIGURATIONS" as a field might be useful. But this word does not appear consistently on each line.
If you decide to extract this portion as a field, the product might skip the other lines that do not contain the word, in which case those lines will not be available for searching. In this scenario, you can use one of the following options:
Can I use data patterns created in English in other languages?
TrueSight IT Data Analytics is tested on localized operating system platforms and browsers in a list of supported languages. For more information, see
You can use data patterns created in English in the list of supported languages. To do this, you need to edit or clone the data pattern and change the Locale setting to the desired language and save the data pattern. After changing the Locale setting, you can also specify date formats and subpatterns in the language selected. For more information, see Editing or cloning data patterns.
Understanding field types
Data can be be of various types such as, numeric and alphanumeric. A field type defines the type of data that the value of a field can take. Also, the field type defines how the fields are stored in the data store. Storing fields with the correct field type enables you to perform an effective search and also enables you to use numeric operations of search commands.
The following example can help you understand how field types can be useful.
Suppose you are analyzing Apache web server log files.
You want to find all requests that failed with various error response codes, such as:
404 Not Found
500 Internal Server Error
503 Service Unavailable
Suppose the preceding response codes are extracted as values of the response field.
In this scenario, you can run the following search query on the response field:
clientip=10.154.152.113 && (response >= 400)
This search query indicates that you want to find all data coming from the client IP address 10.154.152.113 and where the value of the response field is greater than 400.
For this query to provide the correct results, the response field must be saved with a numeric field type as opposed to an alphanumeric field type.
When you create a new data pattern, you plan the fields that you want to extract, and you set the field type for each field. For example, if you want to extract a field that is expected to have alphanumeric values, you set the field type to STRING, but if you want to extract a field that is expected to have numeric values only, you set the field type to INTEGER, LONG, DOUBLE, OR FLOAT.
The capability of assigning field types is not available for fields that are automatically extracted by the product.
The following table describes the various field types supported:
|INTEGER||Whole numbers from 0 to 2,147,483,647 (2,147,483,647 = 231-1)|
|LONG||Long integer values from 0 to 9,223,372,036,854,775,807 (9,223,372,036,854,775,807 = 263-1)|
|STRING||(Default) Text or numeric value|
|DOUBLE||Floating point numbers with double precision from 0 to 1.79769e+308.|
|FLOAT||Floating point numbers with single precision from 0 to 3.40282e+038.|
The preceding field types are available for selection only if they are found relevant to the field value.
Functions available for creating and using data patterns
When you are creating a data pattern, the following functions are available:
|Automatically detect date formats and primary pattern||You can copy a few lines from your data file in the Sample Text field, copy the time stamp in the Timestamp field, and then click Auto-detect to automatically detect the primary pattern and date format relevant to the your sample data. If you are not satisfied with the date formats detected by the product, you can define your own custom date format.|
|Allow multiline entries||You can select the Multiline Entry check box to accommodate log patterns that have multiple line entries.|
|Validate primary pattern results|
After specifying the primary pattern, you can click Preview to see how the parsed data entries might look. You can use this feature for validating the sample log results and for experimenting with and refining your primary pattern until you are satisfied with the results it can achieve.
|Search for available subpatterns by name or value||While creating a primary pattern, you can search for subpatterns from a list of default subpatterns, to find the ones that might be relevant to the kind of data pattern you are about to create. You can find subpatterns by an expected name (or Java regular expression) or by the expected subpattern value.|
|Create custom subpatterns and test them for accuracy||If you do not find subpatterns that suit your needs, you can create a new subpattern. While creating a new subpattern, you can provide some sample text and easily test if that text matches the subpattern expression that you specified.|
At the time of creating a data collector, the following functions related to data patterns are available:
Functions available while creating a data collector
|Filter relevant data patterns||Click Auto-Detect next to the Pattern field to automatically detect the data patterns that match your data file.|
|Preview results to validate or modify data pattern selection|
Manually select one of the data patterns and click Preview to see how the parsed data records look. This capability is also available when you filter relevant data patterns by using the Auto-Detect function.
If the selected data pattern does not satisfy your needs, you can select another data pattern and see a preview of the data records, until you are satisfied with the results.
Icons and associated functions on the Data Patterns tab
The Administration > Data Patterns tab allows you to view and manage your data patterns. From here, you can perform the following actions:
|Add Data Pattern|
Add a new data pattern.
|Edit Data Pattern|
Edit the selected data pattern.
|Delete Data Pattern|
Delete the selected data pattern.
Note: You cannot delete default data patterns provided with the product.
|View Data Pattern||View details of the selected data pattern.|
|Clone Data Pattern||Create a copy of the selected data pattern.|
In the search bar, at the top-right side of your screen, you can filter data patterns in the following ways:
|Hide Pattern||None||Select this check box to hide the Primary Pattern column.|
The Data Patterns tab provides the following information:
Name of the data pattern.
Name of the content pack via which the data pattern was imported.
If the data pattern was newly created, then this column displays a hyphen (-).
|Category||Category to which the data pattern belongs.|
Date/time format used by the corresponding data pattern.
|Primary Pattern||Primary pattern used to define the data pattern.|