Configuring Splunk Generic Extractor
The "Moviri– Splunk Generic Extractor" connector aims at importing almost any Business Driver or System metric contained in your Splunk instance that are not specifically mapped by the other Splunk connectors provided by Moviri.
It works in a similar fashion to the built-in 'Generic' connectors available in BMC Helix Continuous Optimization and its main use case is the analysis from a capacity management perspective of custom metrics, for example:
- Baselining
- Historical analysis, seasonality identification
- Trending and forecasting
- Correlation with other metrics (specially infrastructure utilization) already present in BMC Helix Continuous Optimization (or in their turn imported from Splunk) to enable capacity modeling and what-if scenarios
In order to do so it must be provided with:
- A Splunk search query for retrieving results
- How the results of the query map into BMC Helix Continuous Optimization data model
At each execution, the connector
- Executes the provided search query (this step can be skipped for Splunk saved search)
- Retrieves the result set of the query
- Transforms the result set according to the specified mapping, producing BMC Helix Continuous Optimization Datasets
- Load the Datasets into CO
- Stores the most recent timestamp to be used as lower time boundary in the next execution
How to specify the search query for retrieving results
Two options are available for specifying the search query:
- a Splunk "saved search": i.e. a ready to use search saved on Splunk instance which can retain historical results of previous executions for a settable retention period and/or be scheduled to be periodically executed. Additionally, saved searches can be "accelerated" for faster execution. "Moviri Integration for BMC Helix Continuous Optimization – Splunk (Generic)" can exploit Splunk saved searches functionality to
- define queries on the Splunk side, thus avoiding the management and maintenance of search query syntax on the BMC Helix Continuous Optimization side
- rely on already available results produced by previous on-demand or scheduled executions, and only optionally execute the saved search if need arises
- take advantage of saved searches acceleration
- a search query text: the search is saved in the connector configuration properties and it is passed at each connector execution to the Splunk instance
In both cases search results must meet the following requirements:
- Results must come in a tabular format comprising multiple fields
- Each record of the results table must refer to the same aggregated period of time (e.g. one record for each hour, one record for each day…)
- A timestamp field must be present in order to identify the beginning of the time period (e.g. "2013-05-24 13:00:00 for hourly granularity, "2013-06-03 00:00:00" for daily granularity). Splunk has an implicit "_time" field for each event it processes, so the natural choice is to use it as the timestamp.
- At least one value field must be present containing the data series to be transferred
In the table below a minimal result set example conforming to the above exposed requirements is presented: the daily number of logins on a system.
_time | Logins_on_sysA |
2013-06-05T00:00:00.000+0200 | 28 |
2013-06-06T00:00:00.000+0200 | 24 |
2013-06-07T00:00:00.000+0200 | 25 |
2013-06-08T00:00:00.000+0200 | 1 |
2013-06-09T00:00:00.000+0200 | 1 |
How to select the appropriate dataset
When creating an ETL task that uses the 'Moviri – Splunk Generic Extractor', the first configuration step is associating one or more datasets to the ETL task.
- Select 'WKLDAT' if the task is going to import 'Business Drivers'
- Select 'SYSDAT' if the task is going to import 'System' data
- Select 'APPDAT' if the task is going to import Application configuration This case is not going to be covered in this document; please refer to support for more information if required
How to map search query results to BMC Helix Continuous Optimization data model
The most important configuration of the connector is the definition of how to map each time series, extracted from Splunk, to
- Entities (either Business Drivers or Systems);
- Metrics (also referred to as resources);
- Metric Subobjects (also referred to as subresources).
These mapping are very similar to the ones required by the built-in Generic - Database extractor:
- Entities:
- For Business Drivers (WKLDAT dataset) this is equivalent to DS_WKLDNM column, representing "Business driver lookup identifier"
- For Systems (SYSDAT dataset) this is the equivalent of the DS_SYSNM column, representing the "System lookup identifier"
- Metrics (also referred to as resources)
- Equivalent of the OBJNM column
- Metric Subobjects (also referred to as subresources)
- Equivalent of the SUBOBJNM column
- required when the metric is not of type 'GLOBAL', for example for every metric that has a sub-category dimension
For any time series the connector allows the following mapping options for the three above mentioned dimensions:
- Entities
- Use the name of the Splunk search query column containing the series values
- Input a fixed string
- Use values from another Splunk search query column to identify entity name
- Metric (or resource)
- Select among some proposed generic metrics (applicable to Business Drivers entities)
- Input a specific metric (Advanced)
- Metric Subobject (or subresource), when metric is not of type GLOBAL
- Input a fixed string
- Use values from another Splunk search query column to identify subobject
Some examples are reported in the remainder of this paragraph to facilitate the application of the above mentioned principles: each example includes the Splunk search query result set, the Splunk-to-BMC Helix Continuous Optimization mappings, the resulting BMC Helix Continuous Optimization series and a screenshot of relevant configuration properties from the "Splunk – Query and Mapping" ETL task configuration tab.
EXAMPLE 1: number of daily logins on two services
_time | svcA | svcB |
2013-06-05T00:00:00.000+0200 | 28 | 12 |
2013-06-06T00:00:00.000+0200 | 24 | 32 |
2013-06-07T00:00:00.000+0200 | 25 | 11 |
2013-06-08T00:00:00.000+0200 | 1 | 4 |
2013-06-09T00:00:00.000+0200 | 1 | 3 |
In this example, the 'Saved Search' query on Splunk returns on two different columns the number of daily logins on two services ('svcA' and 'svcB') and we need to import these metrics in CO, as two separate business drivers.
- Select 'WKLDAT' as dataset, as this data is related to Business Drivers
- As 'Entity', we specify to use the name of the column (so that business drivers 'svcA' and 'svcB' will be created)
- As 'Metric', the number of daily logins can be mapped to "a count of events over time" and thus fits the description of the metric.
- As 'Metric subresource', the selected metric does not require a 'subobject', being a global metric and so we are not required to specify it.
This is the final mapping:
Query Column | Mapping | Resulting Series | ||
Entity | Metric | Subresource | ||
svcA |
| svcA | TOTAL_EVENTS | GLOBAL |
svcB |
| svcB | TOTAL_EVENTS | GLOBAL |
EXAMPLE 2: number of daily logins on monitored services
_time | Services | Logins |
2013-06-05T00:00:00.000+0200 | svcB | 12 |
2013-06-06T00:00:00.000+0200 | svcB | 32 |
2013-06-07T00:00:00.000+0200 | svcB | 11 |
2013-06-08T00:00:00.000+0200 | svcB | 4 |
2013-06-09T00:00:00.000+0200 | svcB | 3 |
2013-06-05T00:00:00.000+0200 | svcA | 28 |
2013-06-06T00:00:00.000+0200 | svcA | 24 |
2013-06-07T00:00:00.000+0200 | svcA | 25 |
2013-06-08T00:00:00.000+0200 | svcA | 1 |
2013-06-09T00:00:00.000+0200 | svcA | 1 |
2013-06-05T00:00:00.000+0200 | svcC | 45 |
2013-06-06T00:00:00.000+0200 | svcC | 56 |
2013-06-07T00:00:00.000+0200 | svcC | 44 |
2013-06-08T00:00:00.000+0200 | svcC | 67 |
2013-06-09T00:00:00.000+0200 | svcC | 87 |
In this example, the query on Splunk returns the number of daily logins on two services ('svcA' and 'svcB') specifying the specific service in the 'Services' column and we need to import these metrics in CO, as two separate business drivers.
- Select 'WKLDAT' as dataset, as this data is related to Business Drivers
- As 'Entity', we specify to use the values in the 'Services' column (so that business drivers 'svcA' and 'svcB' will be created)
- As 'Metric', the number of daily logins can be mapped to "a count of events over time" and thus fits the description of the metric.
- As 'Metric subresource', the selected metric does not require a 'subobject', being a global metric and so we are not required to specify it.
This is the final mapping:
Query Column | Mapping | Resulting Series | ||
Entity | Metric | Subresource | ||
Logins |
| svcA | TOTAL_EVENTS | GLOBAL |
svcB | TOTAL_EVENTS | GLOBAL | ||
svcC | TOTAL_EVENTS | GLOBAL |
Note that in this case as new Services will appear in query results new Entities (Business Drivers) will be created in CO.
EXAMPLE 3: data volumes managed by a Datawarehouse system, split by its sub-procedures (steps)
_time | step | numOf LogLines | Avg Parallelism | Execution Time (s) | Tot Samples Processed |
2013-07-05T21:00:00.000+0200 | step1 | 12 | 1 | 3,854 | 136 |
2013-07-05T21:00:00.000+0200 | step2 | 4 | 1 | 3,017 | 899 |
2013-07-05T21:00:00.000+0200 | step3 | 2 | 8 | 1,485 | 64 |
2013-07-05T21:00:00.000+0200 | step4 | 1 | 8 | 0,312 | 1 |
2013-07-05T21:00:00.000+0200 | step5 | 8 | 3 | 1,083 | 65 |
2013-07-05T21:00:00.000+0200 | step6 | 4 | 3 | 1,507 | 170 |
This is the mapping:
Query Column | Mapping | Resulting Series | ||
Entity | Metric | Subresource | ||
Avg Parallelism |
| Active Jobs | BYSET_EVENTS_CURRENT | step1 |
Active Jobs | BYSET_EVENTS_CURRENT | step2 | ||
Active Jobs | BYSET_EVENTS_CURRENT | step3 | ||
Active Jobs | BYSET_EVENTS_CURRENT | step4 | ||
Active Jobs | BYSET_EVENTS_CURRENT | step5 | ||
Tot Samples Processed |
| Processed Samples | BYSET_EVENTS_CURRENT | step1 |
Processed Samples | BYSET_EVENTS_CURRENT | step2 | ||
Processed Samples | BYSET_EVENTS_CURRENT | step3 | ||
Processed Samples | BYSET_EVENTS_CURRENT | step4 | ||
Processed Samples | BYSET_EVENTS_CURRENT | step5 |
Note that in this case as new Steps will appear in query results new subresources will be attached to existing entities.
EXAMPLE 4: cpu utilization by host
_time | host | numSamples | pctUser | pctSystem | pctIowait | pctUtil |
2013-06-09T10:00:00.000+0200 | movvm123 | 120 | 0,0111 | 0,0127 | 0,0047 | 0,0238 |
2013-06-09T11:00:00.000+0200 | movvm123 | 120 | 0,0193 | 0,0147 | 0,0017 | 0,0340 |
2013-06-09T12:00:00.000+0200 | movvm123 | 120 | 0,0178 | 0,0158 | 0,0021 | 0,0336 |
2013-06-09T13:00:00.000+0200 | movvm123 | 120 | 0,0210 | 0,0158 | 0,0047 | 0,0368 |
2013-06-09T14:00:00.000+0200 | movvm123 | 120 | 0,0088 | 0,0148 | 0,0107 | 0,0236 |
2013-06-09T15:00:00.000+0200 | movvm123 | 120 | 0,0123 | 0,0146 | 0,0024 | 0,0269 |
2013-06-09T16:00:00.000+0200 | movvm123 | 120 | 0,0123 | 0,0139 | 0,0128 | 0,0262 |
2013-06-09T17:00:00.000+0200 | movvm123 | 120 | 0,0097 | 0,0118 | 0,0022 | 0,0215 |
2013-06-09T18:00:00.000+0200 | movvm123 | 120 | 0,0125 | 0,0158 | 0,0023 | 0,0283 |
2013-06-09T19:00:00.000+0200 | movvm123 | 120 | 0,0107 | 0,0139 | 0,0030 | 0,0246 |
2013-06-09T20:00:00.000+0200 | movvm123 | 120 | 0,0098 | 0,0135 | 0,0026 | 0,0233 |
This is the mapping:
Query Column | Mapping | Resulting Series | ||
Entity | Metric | Subresource | ||
pctUser |
| movvm123 | CPU_UTIL_USER | GLOBAL |
pctSystem |
| movvm123 | CPU_UTIL_SYSTEM | GLOBAL |
pctUtil |
| movvm123 | CPU_UTIL | GLOBAL |
pctIowait |
| movvm123 | CPU_UTIL_WAIO | GLOBAL |
Note that in this case as new hosts will appear in query results new BMC Helix Continuous Optimization Entities will be created.
Full list of configuration properties
The following are the specific settings valid for connector "Moviri – Splunk Generic Extractor", they are presented in the "Splunk – Query and Mapping" configuration panel.
Property Name | Condition | Type | Required? | Default | Description |
Search Type | Selection | Yes | Specify to use either a Splunk Saved Search or to manually input the Search Text | ||
Query Text | Search Type = "Input Text" | String | No | The Splunk search query conforming to Splunk syntax Refer to Splunk documentation about searches and search syntax: http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsearchand producing a result set conforming to criteria exposed in paragraph 5.2.1 | |
Saved Search App | Search Type = "Splunk Saved Search" | String | No | The Splunk app the saved search belongs to. It is optional. | |
Saved Search Name | Search Type = "Splunk Saved Search" | String | Yes | The Splunk Saved Search name. | |
Saved Search Mode | Search Type = "Splunk Saved Search" | String | Yes | "Import data only if search is scheduled" and "Import data from existing results" | Three independent behaviors that affect how data is extracted from saved search. Each one can be enabled with the following result:
|
Timestamp column | String | Yes | _time | The name of the result set column containing the records' timestamps | |
Value Columns to import | String (max column lenght: 28) | Yes | Semicolon separated list of the result set columns containing the values of the data series to be imported. No spaces must be included after the semicolon or in the columns name. | ||
-- Following properties are repeated for each value column specified in "Value Columns to import" – | |||||
Use <<columnX>> as BCO Entity Name? | Selection | Yes | Yes | If set to yes tells the connector to use the value column name as the BMC Helix Continuous Optimization Entity Name | |
BCO Entity Name | Use <<columnX>> as Entity Name? = "No" | Selection | Yes | Specify to use as BMC Helix Continuous Optimization entity Name either a "Fixed" string or the values taken from another result set column ("Based on Query Column") | |
Entity Name Value = | Entity Name="Fixed" | String | Yes | The BMC Helix Continuous Optimization Entity Name | |
Query Column for Entity Name= | Entity Name=" Based on Query Column" | String | Yes | The query column where to read BMC Helix Continuous Optimization Entity Name | |
BCO Metric: <<columnX>> represents | Selection | Yes | Specify which BMC Helix Continuous Optimization metric to use to map the data series. A textual description is provided for commonly used Business Drivers metrics (see Table 1 BMC Helix Continuous Optimization Metrics descriptions) | ||
BCO Metric= | Metric: <<columnX>> represents = "Specify Metric (Advanced)" | String | Yes | A valid BMC Helix Continuous Optimization metric that represents the data series to be imported | |
Subobject (sub-category) Name | "Metric: <<columnX>> represents" contains a sub-category or is equal to "Specify Metric (Advanced)" | Selection | Yes | Specify to use as subresource name either a "Fixed" string or the values taken from another result set column ("Based on Query Column") | |
Subobject Name Value = | Subobject Name="Fixed" | String | Yes | The subobject (subresource) name | |
Query Column for Subobject Name= | Subobject Name=" Based on Query Column" | String | Yes | The query column where to read subobject (subresource) | |
Number of events/operations (weight of the response time) | "Metric: <<columnX>> represents" refers to a response time or is equal to "Specify Metric (Advanced)" | String | Yes | Metrics referring to response times (or custom metrics) need a weight to be input in order to more correctly compute averages.
| |
Weight Value = | Number of events/operations (weight of the response time)="Fixed" | Integer | Yes | The value for the weight | |
Query Column for Weight= | Number of events/operations (weight of the response time)=" Based on Query Column" | String | Yes | The query column where to read the weight |
Description | Corresponding BMC Helix Continuous Optimization Metric |
a count of events/items/operations over time | TOTAL_EVENTS |
a number of concurrent/standing/open items/customers... | EVENTS_CURRENT |
the number of users in a system | USERS_CURRENT |
a rate of events/items/operations over time (events/s) | EVENT_RATE |
a response time | EVENT_RESPONSE_TIME |
a count of events/items/operations over time split by a sub-category | BYSET_EVENTS |
a number of concurrent/standing/open items split by a sub-category | BYSET_EVENTS_CURRENT |
the number of users in a system split by a sub-category | BYSET_USERS_CURRENT |
a rate of events/items/operations over time (events/s) split by a sub-category | BYSET_EVENT_RATE |
a response time split by a sub-category | BYSET_RESPONSE_TIME |
Specify Metric (Advanced) | – |