Configuring Splunk Generic Extractor

The "Moviri– Splunk Generic Extractor" connector aims at importing almost any Business Driver or System metric contained in your Splunk instance that are not specifically mapped by the other Splunk connectors provided by Moviri.
It works in a similar fashion to the built-in 'Generic' connectors available in BMC Helix Continuous Optimization and its main use case is the analysis from a capacity management perspective of custom metrics, for example:

Baselining
Historical analysis, seasonality identification
Trending and forecasting
Correlation with other metrics (specially infrastructure utilization) already present in BMC Helix Continuous Optimization (or in their turn imported from Splunk) to enable capacity modeling and what-if scenarios

In order to do so it must be provided with:

A Splunk search query for retrieving results
How the results of the query map into BMC Helix Continuous Optimization data model

At each execution, the connector

Executes the provided search query (this step can be skipped for Splunk saved search)
Retrieves the result set of the query
Transforms the result set according to the specified mapping, producing BMC Helix Continuous Optimization Datasets
Load the Datasets into CO
Stores the most recent timestamp to be used as lower time boundary in the next execution

How to specify the search query for retrieving results

Two options are available for specifying the search query:

a Splunk "saved search": i.e. a ready to use search saved on Splunk instance which can retain historical results of previous executions for a settable retention period and/or be scheduled to be periodically executed. Additionally, saved searches can be "accelerated" for faster execution. "Moviri Integration for BMC Helix Continuous Optimization – Splunk (Generic)" can exploit Splunk saved searches functionality to
- define queries on the Splunk side, thus avoiding the management and maintenance of search query syntax on the BMC Helix Continuous Optimization side
- rely on already available results produced by previous on-demand or scheduled executions, and only optionally execute the saved search if need arises
- take advantage of saved searches acceleration
a search query text: the search is saved in the connector configuration properties and it is passed at each connector execution to the Splunk instance

In both cases search results must meet the following requirements:

Results must come in a tabular format comprising multiple fields
Each record of the results table must refer to the same aggregated period of time (e.g. one record for each hour, one record for each day…)
A timestamp field must be present in order to identify the beginning of the time period (e.g. "2013-05-24 13:00:00 for hourly granularity, "2013-06-03 00:00:00" for daily granularity). Splunk has an implicit "_time" field for each event it processes, so the natural choice is to use it as the timestamp.
At least one value field must be present containing the data series to be transferred

In the table below a minimal result set example conforming to the above exposed requirements is presented: the daily number of logins on a system.

_time	Logins_on_sysA
2013-06-05T00:00:00.000+0200	28
2013-06-06T00:00:00.000+0200	24
2013-06-07T00:00:00.000+0200	25
2013-06-08T00:00:00.000+0200	1
2013-06-09T00:00:00.000+0200	1

How to select the appropriate dataset

When creating an ETL task that uses the 'Moviri – Splunk Generic Extractor', the first configuration step is associating one or more datasets to the ETL task.

Select 'WKLDAT' if the task is going to import 'Business Drivers'
Select 'SYSDAT' if the task is going to import 'System' data
Select 'APPDAT' if the task is going to import Application configuration This case is not going to be covered in this document; please refer to support for more information if required

How to map search query results to BMC Helix Continuous Optimization data model

The most important configuration of the connector is the definition of how to map each time series, extracted from Splunk, to

Entities (either Business Drivers or Systems);
Metrics (also referred to as resources);
Metric Subobjects (also referred to as subresources).

These mapping are very similar to the ones required by the built-in Generic - Database extractor:

Entities:
- For Business Drivers (WKLDAT dataset) this is equivalent to DS_WKLDNM column, representing "Business driver lookup identifier"
- For Systems (SYSDAT dataset) this is the equivalent of the DS_SYSNM column, representing the "System lookup identifier"
Metrics (also referred to as resources)
- Equivalent of the OBJNM column
Metric Subobjects (also referred to as subresources)
- Equivalent of the SUBOBJNM column
- required when the metric is not of type 'GLOBAL', for example for every metric that has a sub-category dimension

For any time series the connector allows the following mapping options for the three above mentioned dimensions:

Entities
- Use the name of the Splunk search query column containing the series values
- Input a fixed string
- Use values from another Splunk search query column to identify entity name
Metric (or resource)
- Select among some proposed generic metrics (applicable to Business Drivers entities)
- Input a specific metric (Advanced)
Metric Subobject (or subresource), when metric is not of type GLOBAL
- Input a fixed string
- Use values from another Splunk search query column to identify subobject

Some examples are reported in the remainder of this paragraph to facilitate the application of the above mentioned principles: each example includes the Splunk search query result set, the Splunk-to-BMC Helix Continuous Optimization mappings, the resulting BMC Helix Continuous Optimization series and a screenshot of relevant configuration properties from the "Splunk – Query and Mapping" ETL task configuration tab.

EXAMPLE 1: number of daily logins on two services

_time	svcA	svcB
2013-06-05T00:00:00.000+0200	28	12
2013-06-06T00:00:00.000+0200	24	32
2013-06-07T00:00:00.000+0200	25	11
2013-06-08T00:00:00.000+0200	1	4
2013-06-09T00:00:00.000+0200	1	3

In this example, the 'Saved Search' query on Splunk returns on two different columns the number of daily logins on two services ('svcA' and 'svcB') and we need to import these metrics in CO, as two separate business drivers.

Select 'WKLDAT' as dataset, as this data is related to Business Drivers
As 'Entity', we specify to use the name of the column (so that business drivers 'svcA' and 'svcB' will be created)
As 'Metric', the number of daily logins can be mapped to "a count of events over time" and thus fits the description of the metric.
As 'Metric subresource', the selected metric does not require a 'subobject', being a global metric and so we are not required to specify it.

This is the final mapping:

Query Column	Mapping	Resulting Series
Query Column	Mapping	Entity	Metric	Subresource
svcA	Entity: use the name of the query column Metric: "a count of events/items/operations over time" subresource: no need to specify as the metric is global	svcA	TOTAL_EVENTS	GLOBAL
svcB	Entity: use the name of the query column Metric: "a count of events/items/operations over time" subresource: no need to specify as the metric is global	svcB	TOTAL_EVENTS	GLOBAL

EXAMPLE 2: number of daily logins on monitored services

_time	Services	Logins
2013-06-05T00:00:00.000+0200	svcB	12
2013-06-06T00:00:00.000+0200	svcB	32
2013-06-07T00:00:00.000+0200	svcB	11
2013-06-08T00:00:00.000+0200	svcB	4
2013-06-09T00:00:00.000+0200	svcB	3
2013-06-05T00:00:00.000+0200	svcA	28
2013-06-06T00:00:00.000+0200	svcA	24
2013-06-07T00:00:00.000+0200	svcA	25
2013-06-08T00:00:00.000+0200	svcA	1
2013-06-09T00:00:00.000+0200	svcA	1
2013-06-05T00:00:00.000+0200	svcC	45
2013-06-06T00:00:00.000+0200	svcC	56
2013-06-07T00:00:00.000+0200	svcC	44
2013-06-08T00:00:00.000+0200	svcC	67
2013-06-09T00:00:00.000+0200	svcC	87

In this example, the query on Splunk returns the number of daily logins on two services ('svcA' and 'svcB') specifying the specific service in the 'Services' column and we need to import these metrics in CO, as two separate business drivers.

Select 'WKLDAT' as dataset, as this data is related to Business Drivers
As 'Entity', we specify to use the values in the 'Services' column (so that business drivers 'svcA' and 'svcB' will be created)
As 'Metric', the number of daily logins can be mapped to "a count of events over time" and thus fits the description of the metric.
As 'Metric subresource', the selected metric does not require a 'subobject', being a global metric and so we are not required to specify it.

This is the final mapping:

Query Column	Mapping	Resulting Series
Query Column	Mapping	Entity	Metric	Subresource
Logins	Entity: use values from query column "Services" Metric: select "a count of events/items/operations over time" subresource: no need to specify as the metric is global	svcA	TOTAL_EVENTS	GLOBAL
		svcB	TOTAL_EVENTS	GLOBAL
		svcC	TOTAL_EVENTS	GLOBAL

Note that in this case as new Services will appear in query results new Entities (Business Drivers) will be created in CO.

EXAMPLE 3: data volumes managed by a Datawarehouse system, split by its sub-procedures (steps)

_time	step	numOf LogLines	Avg Parallelism	Execution Time (s)	Tot Samples Processed
2013-07-05T21:00:00.000+0200	step1	12	1	3,854	136
2013-07-05T21:00:00.000+0200	step2	4	1	3,017	899
2013-07-05T21:00:00.000+0200	step3	2	8	1,485	64
2013-07-05T21:00:00.000+0200	step4	1	8	0,312	1
2013-07-05T21:00:00.000+0200	step5	8	3	1,083	65
2013-07-05T21:00:00.000+0200	step6	4	3	1,507	170

This is the mapping:

Query Column	Mapping	Resulting Series
Query Column	Mapping	Entity	Metric	Subresource
Avg Parallelism	Entity: use "Active Jobs" Metric: "a number of concurrent/standing/open items split by sub-category" subresource: use values from column Step	Active Jobs	BYSET_EVENTS_CURRENT	step1
		Active Jobs	BYSET_EVENTS_CURRENT	step2
		Active Jobs	BYSET_EVENTS_CURRENT	step3
		Active Jobs	BYSET_EVENTS_CURRENT	step4
		Active Jobs	BYSET_EVENTS_CURRENT	step5
Tot Samples Processed	Entity: use "Processed Samples" Metric: "a count of events/items/operations over time split by sub-category" subresource: use values from column Step	Processed Samples	BYSET_EVENTS_CURRENT	step1
		Processed Samples	BYSET_EVENTS_CURRENT	step2
		Processed Samples	BYSET_EVENTS_CURRENT	step3
		Processed Samples	BYSET_EVENTS_CURRENT	step4
		Processed Samples	BYSET_EVENTS_CURRENT	step5

Note that in this case as new Steps will appear in query results new subresources will be attached to existing entities.

EXAMPLE 4: cpu utilization by host

_time	host	numSamples	pctUser	pctSystem	pctIowait	pctUtil
2013-06-09T10:00:00.000+0200	movvm123	120	0,0111	0,0127	0,0047	0,0238
2013-06-09T11:00:00.000+0200	movvm123	120	0,0193	0,0147	0,0017	0,0340
2013-06-09T12:00:00.000+0200	movvm123	120	0,0178	0,0158	0,0021	0,0336
2013-06-09T13:00:00.000+0200	movvm123	120	0,0210	0,0158	0,0047	0,0368
2013-06-09T14:00:00.000+0200	movvm123	120	0,0088	0,0148	0,0107	0,0236
2013-06-09T15:00:00.000+0200	movvm123	120	0,0123	0,0146	0,0024	0,0269
2013-06-09T16:00:00.000+0200	movvm123	120	0,0123	0,0139	0,0128	0,0262
2013-06-09T17:00:00.000+0200	movvm123	120	0,0097	0,0118	0,0022	0,0215
2013-06-09T18:00:00.000+0200	movvm123	120	0,0125	0,0158	0,0023	0,0283
2013-06-09T19:00:00.000+0200	movvm123	120	0,0107	0,0139	0,0030	0,0246
2013-06-09T20:00:00.000+0200	movvm123	120	0,0098	0,0135	0,0026	0,0233

This is the mapping:

Query Column	Mapping	Resulting Series
Query Column	Mapping	Entity	Metric	Subresource
pctUser	Entity: use values from column "host" Metric: specified BMC Helix Continuous Optimization metric "CPU_UTIL_USER" subresource: Fixed=GLOBAL	movvm123	CPU_UTIL_USER	GLOBAL
pctSystem	Entity: use values from column "host" Metric: specified BMC Helix Continuous Optimization metric "CPU_UTIL_SYSTEM" subresource: Fixed=GLOBAL	movvm123	CPU_UTIL_SYSTEM	GLOBAL
pctUtil	Entity: use values from column "host" Metric: specified BMC Helix Continuous Optimization metric "CPU_UTIL" subresource: Fixed=GLOBAL	movvm123	CPU_UTIL	GLOBAL
pctIowait	Entity: use values from column "host" Metric: specified BMC Helix Continuous Optimization metric "CPU_UTIL_WAIO" subresource: Fixed=GLOBAL	movvm123	CPU_UTIL_WAIO	GLOBAL

Note that in this case as new hosts will appear in query results new BMC Helix Continuous Optimization Entities will be created.

Full list of configuration properties

The following are the specific settings valid for connector "Moviri – Splunk Generic Extractor", they are presented in the "Splunk – Query and Mapping" configuration panel.

Property Name	Condition	Type	Required?	Default	Description
Search Type		Selection	Yes		Specify to use either a Splunk Saved Search or to manually input the Search Text
Query Text	Search Type = "Input Text"	String	No		The Splunk search query conforming to Splunk syntax Refer to Splunk documentation about searches and search syntax: http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsearchand producing a result set conforming to criteria exposed in paragraph 5.2.1
Saved Search App	Search Type = "Splunk Saved Search"	String	No		The Splunk app the saved search belongs to. It is optional.
Saved Search Name	Search Type = "Splunk Saved Search"	String	Yes		The Splunk Saved Search name.
Saved Search Mode	Search Type = "Splunk Saved Search"	String	Yes	"Import data only if search is scheduled" and "Import data from existing results"	Three independent behaviors that affect how data is extracted from saved search. Each one can be enabled with the following result: "Import data only if search is scheduled": checks if saved search is scheduled on Splunk instance. If it is not the connector won't extract any data. "Import data from existing results": will look for existing results and if found import data from them "Execute search to look for new data": it allows saved search to be executed if existing results cannot be used or they do not contain recent data (up to latest day or latest hour according to time granularity)
Timestamp column		String	Yes	_time	The name of the result set column containing the records' timestamps
Value Columns to import		String (max column lenght: 28)	Yes		Semicolon separated list of the result set columns containing the values of the data series to be imported. No spaces must be included after the semicolon or in the columns name.
-- Following properties are repeated for each value column specified in "Value Columns to import" –
Use <<columnX>> as BCO Entity Name?		Selection	Yes	Yes	If set to yes tells the connector to use the value column name as the BMC Helix Continuous Optimization Entity Name
BCO Entity Name	Use <<columnX>> as Entity Name? = "No"	Selection	Yes		Specify to use as BMC Helix Continuous Optimization entity Name either a "Fixed" string or the values taken from another result set column ("Based on Query Column")
Entity Name Value =	Entity Name="Fixed"	String (max lenght 28)	Yes		The BMC Helix Continuous Optimization Entity Name
Query Column for Entity Name=	Entity Name=" Based on Query Column"	String (max lenght 28)	Yes		The query column where to read BMC Helix Continuous Optimization Entity Name
BCO Metric: <<columnX>> represents		Selection	Yes		Specify which BMC Helix Continuous Optimization metric to use to map the data series. A textual description is provided for commonly used Business Drivers metrics (see Table 1 BMC Helix Continuous Optimization Metrics descriptions) An option is also present to manually input the BMC Helix Continuous Optimization metric name.
BCO Metric=	Metric: <<columnX>> represents = "Specify Metric (Advanced)"	String	Yes		A valid BMC Helix Continuous Optimization metric that represents the data series to be imported
Subobject (sub-category) Name	"Metric: <<columnX>> represents" contains a sub-category or is equal to "Specify Metric (Advanced)"	Selection	Yes		Specify to use as subresource name either a "Fixed" string or the values taken from another result set column ("Based on Query Column")
Subobject Name Value =	Subobject Name="Fixed"	String	Yes		The subobject (subresource) name
Query Column for Subobject Name=	Subobject Name=" Based on Query Column"	String (max lenght 28)	Yes		The query column where to read subobject (subresource)
Number of events/operations (weight of the response time)	"Metric: <<columnX>> represents" refers to a response time or is equal to "Specify Metric (Advanced)"	String	Yes		Metrics referring to response times (or custom metrics) need a weight to be input in order to more correctly compute averages. This property specify where to read the weight: "Not specified". "Fixed" "Based on Query Column"
Weight Value =	Number of events/operations (weight of the response time)="Fixed"	Integer	Yes		The value for the weight
Query Column for Weight=	Number of events/operations (weight of the response time)=" Based on Query Column"	String	Yes		The query column where to read the weight

Description	Corresponding BMC Helix Continuous Optimization Metric
a count of events/items/operations over time	TOTAL_EVENTS
a number of concurrent/standing/open items/customers...	EVENTS_CURRENT
the number of users in a system	USERS_CURRENT
a rate of events/items/operations over time (events/s)	EVENT_RATE
a response time	EVENT_RESPONSE_TIME
a count of events/items/operations over time split by a sub-category	BYSET_EVENTS
a number of concurrent/standing/open items split by a sub-category	BYSET_EVENTS_CURRENT
the number of users in a system split by a sub-category	BYSET_USERS_CURRENT
a rate of events/items/operations over time (events/s) split by a sub-category	BYSET_EVENT_RATE
a response time split by a sub-category	BYSET_RESPONSE_TIME
Specify Metric (Advanced)	–