A data pattern defines the way in which the data collected (semi-structured data) can be structured, indexed, and made available for searching. One of the primary functions of creating a data pattern is to specify fields that must be extracted from the data collected. Fields are name=value pairs that represent a grouping by which your data can be categorized. The fields that you specify at the time of creating a data pattern are added to each record in the data indexed, enabling you to both search effectively and carry out advanced analysis by using search commands. You can also assign a field type (category: an integer, a string, or a long integer) for each of the fields that you intend to get extracted. Assigning a field type enables you to run specific search commands on the fields of a certain type and perform advanced analysis.
The Data Patterns tab allows you to configure data patterns that can be used by the data collectors for collecting data in the specified way. While creating a data collector, it is important that you select an appropriate data pattern. This is necessary so that the indexed data looks as you expected, with events categorized in multiple lines (raw event data), fields extracted, and time stamp extracted. The more appropriate the data pattern, the more chances that your search will be effective.
The following video helps you understand when and how to create a data pattern:
Note
The user interface for version 1.1.00 of the product has changed. Even though the following video displays the 1.0 screens, the process described in the video is still applicable to version 1.1 of the product.
This topic contains the following information:
A data pattern is made up of a Java regular expression used for parsing the data file and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. The subpatterns included in the data pattern are displayed as fields in your search results. You can either provide a sample time stamp and sample text to enable automatic detection of the primary pattern and date format, or you can specify a custom primary pattern, date format, and subpatterns.
While performing a search, you can find data of a certain type (data pattern) compared to data coming from a single source file. You can do this by clicking the DATA_PATTERN field in one of your search results, so that it is added to your search criteria. For more information, see Understanding fields and tags.
The product provides a list of default data patterns (mostly for log formats) that you can use directly to index your data files. Data patterns are available for most of the common log formats. Therefore, at most times you might not need to create a data pattern. You can directly use the defaults available. For more information about the default data patterns, see List of data patterns supported.
Note
Default data patterns are out-of-the-box data patterns provided with the product. You cannot edit or delete a default data pattern, but you can clone the default data pattern to create a copy and edit (or delete) the copy.
If you find that you need a custom data pattern for your particular data file, you can:
When you create a data collector, after you point to the actual data file while assigning a data pattern, you can apply a filter to find the most relevant data patterns (available by default) that match your data file. This filter also finds the most relevant date formats. You can select one of the data patterns that seems most relevant and look at a preview of the records. If you are not satisfied with the current preview, you can refresh the data patterns filter that you applied earlier, select another data pattern, and see the preview results. You can repeat this cycle until you are satisfied with the results shown.
If there is no suitable data pattern available, then you can select a suitable date format. By selecting a date format, you can index and categorize your data in a simple way whereby the time stamp is considered one category and the rest of the data in your data file is considered another category. Selecting a date format, as compared to a usual data pattern, will generally provide search results that might not accommodate a richer categorization of data. Furthermore, this might restrict advanced searching capabilities. If you select both a data pattern and a date format, then the date format is used to index the timestamp in the file and the data pattern is used to index rest of the data appearing in the file. For examples of date formats, see Sample date formats.
If there is no suitable data pattern found, you can do one of the following:
If no time stamp exists in your data, select free text as an option.
By selecting free text as a data pattern, you can index all of your data to appear in a raw format based on the time when it is indexed by the product.
Note
If you use free text as a data pattern, each line in the data file is indexed as a new record.
A data pattern is formed of Java regular expressions used for parsing the data and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. When the data pattern is used, the Indexer captures all of the lines in your data, interprets the data structure, and eventually allows you to effectively search the data with the help of fields (name=value).
Fields act as a grouping mechanism that represents various categories for consistently repeated trends in your data. Before you create a data pattern, you need to identify the fields by which you would like to eventually search the data. For more information about identifying fields for creating a data pattern, see Identifying fields in the data file.
You can construct your data pattern depending on how searchable your data needs to be. You need to read your data file to identify additional sections or expressions that you might be interested in searching and investigating.
A data pattern consists of a primary pattern and supporting subpatterns that helps index your data in the specified way.
Data pattern = Primary pattern + Supporting subpatterns
where,
Item | Description | Example |
---|---|---|
Primary pattern | A primary pattern defines the high-level structure in which the data appears. It includes a list of fields that you want to extract from the data. The sequence of fields added in the primary pattern must exactly match the format in which the data appears. Each field represents a grouping in the data file. Therefore, the sequence in which the various groups appear must be reflected in the sequence of the fields in the primary pattern. The primary pattern consists of multiple subpatterns combined together in a certain order. The order is important, because the product parses the data and shows it as records in the same order as specified in the data pattern. The primary pattern also acts as a record delimiter. This means that each time the line in your data file matches the primary pattern regular expression, that line marks as the beginning of a new record. | %{Mytimestamp:timestamp} You can see that in this primary pattern, the |
Subpattern | Subpatterns define the supporting details for particular sections in the primary pattern. Every subpattern can be reused to further construct more subpatterns or primary patterns. The syntax for reusing a subpattern is | Ip=(?<![0-9])(?:(?:25[0-5]|2[0-4] where,
You can see that this subpattern is used in the primary pattern example. |
For more information about an overall use case of creating a data pattern, see Examples of creating a data pattern.
For more information about data pattern examples, see Sample data patterns.
For more information about subpatterns examples, see Sample subpatterns.
For more information about the various field inputs for creating a data pattern, see Adding a new data pattern.
Before you create a data pattern, you need to analyze your data file to find out if the file follows a certain pattern that can be captured while creating the data pattern.
Suppose you want create a data pattern to index the following data:
|
|
Apr 24, 2014 03:16:44 PM dbservice INFO: Executing Query to check init property: |
Apr 24, 2014 03:16:44 PM dbservice INFO: init property exists in CONFIGURATIONS table. |
In the preceding lines, every new line starts with the time stamp. If you try to analyze a consistently followed format in these lines, you will notice that each line provides the following information, presented in the same order as they appear in the lines:
To identify what can be fields, you need to identify the format that is being consistently followed in the data file, and then identify how this information can be grouped. You can create a field for each grouping. This ensures that all information that can be categorized into a group is indexed and is available for search.
Tip
You can assign the details field for all miscellaneous information that you do not want to categorize with a specific field. At the time of indexing, the details field is ignored, but all name=value pairs in the section to which this field is applied are extracted as fields.
Suppose that you think that the word CONFIGURATIONS might be useful to extract as a group. However, you must not create a field for this word, because it does not appear consistently on each line. If you choose this word as a field, the product might skip the other lines that do not contain the word, and those lines will not be available for searching. You can use search commands such as extract
to find such expressions that you might have wanted to choose while creating a data pattern.
When you are creating a data pattern, the following functions are available:
Function | Description |
---|---|
Automatically detect date formats and primary pattern | You can copy a few lines from your data file in the Sample Text field, copy the time stamp in the Timestamp field, and then click Auto-detect to automatically detect the primary pattern and date format relevant to the your sample data. If you are not satisfied with the date formats detected by the product, you can define your own custom date format. |
Allow multiline entries | You can select the Multiline Entry check box to accommodate log patterns that have multiple line entries. |
Validate primary pattern results | After specifying the primary pattern, you can click Preview to see how the parsed data entries might look. You can use this feature for validating the sample log results and for experimenting with and refining your primary pattern until you are satisfied with the results it can achieve. |
Search for available subpatterns by name or value | While creating a primary pattern, you can search for subpatterns from a list of default subpatterns, to find the ones that might be relevant to the kind of data pattern you are about to create. You can find subpatterns by an expected name (or Java regular expression) or by the expected subpattern value. |
Create custom subpatterns and test them for accuracy | If you do not find subpatterns that suit your needs, you can create a new subpattern. While creating a new subpattern, you can provide some sample text and easily test if that text matches the subpattern expression that you specified. |
At the time of creating a data collector, the following functions related to data patterns are available:
Functions available while creating a data collector
Function | Description |
---|---|
Filter relevant data patterns | You can filter the relevant data patterns (by using the | Filter relevant data pattern icon next to the Pattern field) to automatically detect the data patterns that match your data file.
Preview results to validate or modify data pattern selection | You can select the data pattern that you think might be most appropriate and use the preview option (by using the | Preview parsed log entries icon next to the Pattern field) to see how the parsed data records look. If the selected data pattern does not satisfy your needs, you can select another data pattern and again see a preview of the data records, until you are satisfied with the results.
The Administration > Data Patterns tab allows you to view and manage your data patterns. From here, you can perform the following actions:
Action | Icon | Description |
---|---|---|
Add Data Pattern | Add a new data pattern. For instructions about adding a data pattern, see Adding a new data pattern. | |
Edit Data Pattern | Edit the selected data pattern. You can modify the same details that you provided while adding a new data pattern. | |
Delete Data Pattern | Delete the selected data pattern. Note: You cannot delete default data patterns provided with the product. | |
View Data Pattern | View details of the selected data pattern. | |
Clone Data Pattern | Create a copy of the selected data pattern. | |
Search | Search for a data pattern by entering an appropriate search string in the search bar at the top right of your screen. You can search for a data pattern by name or by the primary pattern. | |
Hide Pattern | None | Select this check box to hide the Primary Pattern column. |
The Data Patterns tab provides the following information:
Field | Description |
---|---|
Name | Name of the data pattern. |
Category | Category to which the data pattern belongs. |
Date Format | Date/time format used by the corresponding data pattern. |
Primary Pattern | Primary pattern used to define the data pattern. |