Managing data patterns
A data pattern defines the way in which the data collected (semi-structured data) can be structured, indexed, and made available for searching. One of the primary functions of creating a data pattern is to specify fields that must be extracted from the data collected. Fields are name=value pairs that represent a grouping by which your data can be categorized. The fields that you specify at the time of creating a data pattern are added to each record in the data indexed, enabling you to both search effectively and carry out advanced analysis by using search commands. You can also assign a field type (data type) for the field values that you intend to extract at the time of creating a data pattern. Assigning a field type enables you to run specific search commands on the fields of a certain type and perform advanced analysis.
The Data Patterns tab allows you to configure data patterns that can be used by the data collectors for collecting data in the specified way. While creating a data collector, it is important that you select an appropriate data pattern. This is necessary so that the indexed data looks as you expected, with events categorized in multiple lines (raw event data), fields extracted, and time stamp extracted. The more appropriate the data pattern, the more chances that your search will be effective.
This topic contains the following information:
- What is a data pattern made up of?
- When do I need to create a data pattern?
- How do I know which data pattern is appropriate for my data file?
- Constructing a data pattern
- How to identify fields in the data file
- Can I use data patterns created in English in other languages?
- Understanding field types
- Functions available for creating and using data patterns
- Viewing configured data patterns
Related topics
What is a data pattern made up of?
A data pattern is made up of a Java regular expression used for parsing the data file and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. The subpatterns included in the data pattern are displayed as fields in your search results. You can either provide a sample time stamp and sample text to enable automatic detection of the primary pattern and date format, or you can specify a custom primary pattern, date format, and subpatterns.
While performing a search, you can find data of a certain type (data pattern) compared to data coming from a single source file. You can do this by clicking the DATA_PATTERN field in one of your search results, so that it is added to your search criteria. For more information, see Understanding fields and tags.
When do I need to create a data pattern?
The product provides a list of default data patterns (mostly for log formats) that you can use directly to index your data files. Data patterns are available for most of the common log formats. Therefore, at most times you might not need to create a data pattern. You can directly use the defaults available. For more information about the default data patterns, see List of data patterns supported.
If you find that you need a custom data pattern for your particular data file, you can:
- Clone an existing data pattern and customize it to suit your needs.
- Create a new data pattern.
How do I know which data pattern is appropriate for my data file?
When you create a data collector, after you point to the actual data file, you might need to assign a data pattern. You can find the most relevant data patterns (available by default) that match your data file by using the Auto-Detect button. Along with the most relevant data patterns, you can also find the most relevant date formats. When you select a data pattern, you can also see a preview of how the records will be parsed. If you are not satisfied with the current preview, select another data pattern, and see the preview results. You can repeat this cycle until you are satisfied with the results shown.
If there is no suitable data pattern available, then you can select a suitable date format. By selecting a date format, you can index and categorize your data in a simple way whereby the time stamp is considered one category and the rest of the data in your data file is considered another category. Selecting a date format, as compared to a usual data pattern, will generally provide search results that might not accommodate a richer categorization of data. Furthermore, this might restrict advanced searching capabilities. If you select both a data pattern and a date format, then the date format is used to index the timestamp in the file and the data pattern is used to index rest of the data appearing in the file. For examples of date formats, see Sample-date-formats.
If there is no suitable data pattern found, you can do one of the following:
- Create a new data pattern and use that for indexing the data.
Either select the Free Text with Timestamp option or create a new date format and use that for indexing the timestamp.
This approach captures the timestamp while the rest of the data appears in raw format.
If no time stamp exists in your data, select Free Text without Timestamp as an option.
By selecting free text as a data pattern, you can index all of your data to appear in a raw format based on the time when it is indexed by the product.- If you find a data pattern that provides results that partly match your expectations, you can either retain that selection or choose to clone that data pattern and customize it to suit your needs.
Constructing a data pattern
A data pattern is formed of Java regular expressions used for parsing the data and eventually displaying it in the form of search results. The usefulness of your search results is determined by the definition of your data pattern. When the data pattern is used, the Indexer captures all of the lines in your data, interprets the data structure, and eventually allows you to effectively search the data with the help of fields (name=value).
Fields act as a grouping mechanism that represents various categories for consistently repeated trends in your data. Before you create a data pattern, you need to identify the fields by which you would like to eventually search the data. For more information about identifying fields for creating a data pattern, see Identifying fields in the data file.
You can construct your data pattern depending on how searchable your data needs to be. You need to read your data file to identify additional sections or expressions that you might be interested in searching and investigating.
A data pattern constitutes of a primary pattern and supporting subpatterns that helps index your data in the specified way.
Data pattern = Primary pattern + Supporting subpatterns
where,
Item | Description | Example |
---|---|---|
Primary pattern | A primary pattern defines the high-level structure in which the data appears. It includes a list of fields that you want to extract from the data. The sequence of fields added in the primary pattern must exactly match the format in which the data appears. Each field represents a grouping in the data file. Therefore, the sequence in which the various groups appear must be reflected in the sequence of the fields in the primary pattern. The primary pattern consists of multiple subpatterns combined together in a certain order. The order is important, because the product parses the data and shows it as records in the same order as specified in the data pattern. The primary pattern also acts as a record delimiter. This means that each time the line in your data file matches the primary pattern regular expression, that line marks as the beginning of a new record. | %{Mytimestamp:timestamp} You can see that in this primary pattern, the Ip subpattern example is used. |
Subpattern | Subpatterns define the supporting details for particular sections in the primary pattern. Every subpattern can be reused to further construct more subpatterns or primary patterns. The syntax for reusing a subpattern is %{<subpattern-logical-name>:<field-name>} | Ip=(?<![0-9])(?:(?:25[0-5]|2[0-4] where, Ip is the logical name of the subpattern, and the rest of the expression is the corresponding subpattern value. You can see that this subpattern is used in the primary pattern example. |
For more information about data pattern examples, see Sample-data-patterns.
For more information about subpatterns examples, see Sample-subpatterns.
For more information about the various field inputs for creating a data pattern, see Adding a new data pattern.
How to identify fields in the data file
Before you create a data pattern, you need to analyze your data file to find out if the file follows a certain pattern that can be captured while creating the data pattern. Doing this can be useful while performing advanced field extraction at the time of editing or cloning a data pattern.
Suppose you want create a data pattern to index the following data:
Can I use data patterns created in English in other languages?
IT Data Analytics is tested on localized operating system platforms and browsers in a list of supported languages. For more information, see Language-information.
You can use data patterns created in English in the list of supported languages. To do this, you need to edit or clone the data pattern and change the Locale setting to the desired language and save the data pattern. After changing the Locale setting, you can also specify date formats and subpatterns in the language selected. For more information, see Editing-or-cloning-data-patterns.
Understanding field types
Data can be be of various types such as, numeric and alphanumeric. A field type defines the type of data that the value of a field can take. Also, the field type defines how the fields are stored in the data store. Storing fields with the correct field type enables you to perform an effective search and also enables you to use numeric operations of search commands.
The following example can help you understand how field types can be useful.
When you create a new data pattern, you plan the fields that you want to extract, and you set the field type for each field. For example, if you want to extract a field that is expected to have alphanumeric values, you set the field type to STRING, but if you want to extract a field that is expected to have numeric values only, you set the field type to INTEGER, LONG, DOUBLE, OR FLOAT.
The following table describes the various field types supported:
Field type | Description |
---|---|
INTEGER | Whole numbers from 0 to 2,147,483,647 (2,147,483,647 = 231-1) |
LONG | Long integer values from 0 to 9,223,372,036,854,775,807 (9,223,372,036,854,775,807 = 263-1) |
STRING | (Default) Text or numeric value |
DOUBLE | Floating point numbers with double precision from 0 to 1.79769e+308. |
FLOAT | Floating point numbers with single precision from 0 to 3.40282e+038. |
The preceding field types are available for selection only if they are found relevant to the field value.
Functions available for creating and using data patterns
When you are creating a data pattern, the following functions are available:
Function | Description |
---|---|
Automatically detect date formats and primary pattern | You can copy a few lines from your data file in the Sample Text field, copy the time stamp in the Timestamp field, and then click Auto-detect to automatically detect the primary pattern and date format relevant to the your sample data. If you are not satisfied with the date formats detected by the product, you can define your own custom date format. |
Allow multiline entries | You can select the Multiline Entry check box to accommodate log patterns that have multiple line entries. |
Validate primary pattern results | After specifying the primary pattern, you can click Preview to see how the parsed data entries might look. You can use this feature for validating the sample log results and for experimenting with and refining your primary pattern until you are satisfied with the results it can achieve. |
Search for available subpatterns by name or value | While creating a primary pattern, you can search for subpatterns from a list of default subpatterns, to find the ones that might be relevant to the kind of data pattern you are about to create. You can find subpatterns by an expected name (or Java regular expression) or by the expected subpattern value. |
Create custom subpatterns and test them for accuracy | If you do not find subpatterns that suit your needs, you can create a new subpattern. While creating a new subpattern, you can provide some sample text and easily test if that text matches the subpattern expression that you specified. |
At the time of creating a data collector, the following functions related to data patterns are available:
Functions available while creating a data collector
Function | Description |
---|---|
Filter relevant data patterns | Click Auto-Detect next to the Pattern field to automatically detect the data patterns that match your data file. |
Preview results to validate or modify data pattern selection | Manually select one of the data patterns and click Preview to see how the parsed data records look. This capability is also available when you filter relevant data patterns by using the Auto-Detect function. If the selected data pattern does not satisfy your needs, you can select another data pattern and see a preview of the data records, until you are satisfied with the results. |
Viewing configured data patterns
The Administration > Data Patterns tab allows you to view and manage your data patterns. From here, you can perform the following actions:
The Data Patterns tab provides the following information: