Parameters and Recovery Actions

Parameters

Parameters allow you to gather and display metrics of an application. A typical KM defines a number of parameter scripts that collect various metrics. As soon as an applications instance is created, all the parameters defined in the application class will automatically be instantiated as well. Parameters that have scripts defined will be put in the RUNQ and will be scheduled by the scheduler.

There are three types of parameters (standard, collector, and consumer) each with different characteristics. The following table lists the mandatory properties of a parameter.

Mandatory Properties of a Parameter

.kml Files	Description
Name	Identifies the parameter in the namespace After the name has been defined, it can't be changed with the developer console anymore. If you really need the change the name of a parameter and don't want to duplicate it manually, you could edit the KM file with a text editor outside of the developer console. After the KM has been modified in such a way, you must restart the developer console for the changes to take effect.
Active	Allows you to create parameters which are inactive by default Usually you will only create active parameters. The active flag is frequently used by customers who are not interested to see the parameter in the console.
Help	Allows you to enter the help context ID and the help file The help context ID has to be numeric (remember this when you are writing your Windows help files, because Windows also supports string context ID's). The help file, must contain a file name without path information.
Type	Standard, collector, or consumer

Besides these mandatory properties, there are also properties that you will only be able to enter, depending on the parameter type you are using. The properties can be divided in two categories: execution and visualization. Depending on the type of parameter you are defining, you might or might not be able to enter information for the properties listed in the following table.

pconfig KM Files (Part 1 of 2)

.kml Files

Description

Execution information

Parameters that collect data will have to run scripts to gather the data. If the parameter executes a script, you can enter the following information:

Command type - Only command types that are defined on this application level or computer class level will be visible (and usable). The default command types are OS and PSL.
Environment - You can specify OS environment variables you want to add to the environment. The environment variable value can contain %-macro's such as %{/hostname}.
Command - The actual command to run. credentials - You can hardcode OS credentials, but it is not recommended, since this would force you to modify the KM whenever the username or password of an application changes.
Scheduling information - Defines when the execution should run and instructs the agent when a PSL process should be created. However, this can lead to confusion. For example, if you instruct the agent to start your process at 1:00 PM, but the agent was down and starts up at 1:05 PM, the agent does not know it should start the execution.

Visualization and actions

Parameters that are responsible for visualizing data or executing recovery actions will need to define how the data should be shown and when an alarm should be triggered. You will be able to enter the following information:

Alarm ranges and recovery actions - Defines conditions to check and actions to take when a new value for a parameter is obtained.
Output type and attributes - Defines how the parameter must be displayed on the console. It is also possible to specify a 'nooutput' display type, which will suppress the creation of the icon. Depending on the output type you will be able to specify additional attributes like title (except 'no-output' type) and units (except 'no-output' and text type).
History - Defines how many days of history must be kept when storing values for this parameter. This setting can be overridden by changing the namespace variable history Length for this parameter.

Different Parameter Types

Standard Parameters

The standard parameter is the most complete of the three types of parameters. This parameter offers both collection and visualization.

Non-PSL command type scripts will set the parameters value with the data returned on standard output. Ensure that you return only one value if your parameter type is not text.

If you are using PSL, the standard parameter can act as a collector as well. Actually you could say that the standard parameter will then behave like a collector with visualization build in.

Even turning a very important collector to a standard parameter could be beneficial. In this case the value of the standard parameter could contain an indication of the collection status.

If you set the output to something other than no output, you can update the parameter directly from the PATROL Console.

Note

A no-output standard parameter is not the same as a collector, standard has history and alarm ranges and can run recovery actions.

Consumer Parameters

A consumer parameter only implements the visualization and recovery actions.

The most challenging issue with a consumer parameter is that you can't determine which parameter, menu command, or discovery script sets the consumer. Therefore, you must document where the setting of the parameter occurs.

The following are characteristics of consumer parameters:

"Listens" to changes in "value" attributes
No scheduling
Not updateable (no scripts)
No backward references (such as who issued a set() function)
Must be well documented

Collector Parameters

A collector parameter only implements execution of scripts.

You must only use the PSL command type when defining a collector. If you use another command type, stdout will implicitly set the value attribute of the parameter, but since a collector doesn't have a value attribute, you will get a run-time error.

Parameter Styles

There are several parameter styles to select from. As listed above, not all parameter type support all styles. A definition of each parameter style and its meaning is given below.

Text Parameters

The text parameter is the only parameter that does not store history, but "No history" does not mean "no memory". The value of the text parameter is stored in the agent's memory. The more text you store in the parameter, the bigger the agent will become, so you have to be very careful if you type commands like this:

set("value",get("value").new info);

These cumulative parameters, have a risk consuming a great deal of memory. For the same reason, you must not use text parameters for displaying entire log files. If you really want to show something of the log, you could decide to only show the new log entries since the last collection circle.

You cannot define alarm ranges on a text parameter8, but you can change the state of the text parameter by changing the status attribute like this:

{{set("status",AlARM);}}

Because you can't define alarm ranges on a text parameter, it will be impossible to run recovery actions as well. If you really want to run recovery actions, you will have to do it in the collection cycle or by creating a parameter that allows you to run the recovery actions.

No Output Parameters

Although no-output parameters don't show up, you can use them to store history. Even a consumer parameter can have a no-output type.

You can define alarm ranges, but they will be ignored.

Gauge Parameters

A gauge parameter can only show a single value at a time. When annotated data is available, the Info button on the gauge will be activated. You can click the button to display the annotation data.

Graph Parameters

Graph parameters provide the most flexibility. After opening them, you will immediately get an overview of the status over time and the trend of the collected data. It is also possible to drag and drop graph type parameters together so it will be displayed as a single graph. The graph type can be changed to for example a pie or a bar.

Annotated data points will be marked with an asterisk.

State Boolean Parameters

The boolean output type is not really true boolean, but more a tri-state boolean. Although the output type suggests only true and false (1 and 0, OK or not OK), this boolean respects all PATROL states (OK, WARNING and ALARM). The WARNING and ALARM state will be shown as a false sign, OK will be shown as OK.

Stoplight Parameters

This output type behaves a bit bizarre on NT consoles earlier than 3.4 since the stoplight on NT will only show the green or red light, and the platter will display the actual difference in the state. A yellow platter indicates a WARNING state. If the stoplight is red and blinking, it indicates an ALARM state.

From a UNIX console, the stoplight turns green for OK, yellow for WARNING, and red for ALARM. From PATROL v3.4 on the behavior on the UNIX and NT console is the same.

ExtraFilesList Parameter

ExtraFilesList allows a KM developer to specify extra files which should be committed to an agent when the KM is committed. Below are the steps required to specify extra files.

Creating an ExtraFilesList Parameter

From a Parameter Dialog Box, perform the following steps:

Create a standard parameter called ExtraFilesList in the desired KM. Do not change the scheduling information.
Clear the [Active ] check box so that the parameter is not active.
Add the list of files to be committed in the command text window. Each file must be specified on a separate line. A distinction can be made between PSL library files and any other file that should be send by putting the keyword #EXTRA or #LIB before the file name.

The #LIB keyword should be used when you specify PSL libraries, although the lib keyword doesn't really make a difference (yet).

The #EXTRA should be used for all non-library files.

Files for both keywords will be unconditionally send during commit.

Note

File locations are relative to the local or global psl directory ($PATROL HOME/lib/psl ). When the console attempts to find the specified file, it always considers the local directory first. The files will end up on the agent in the same location relative to the global psl directory, regardless of whether the file was taken from the local directory or the global directory.

You must start the line with a # character because the # character is used as the comment character by most shells and PSL. Thus, if someone accidentally activates the ExtraFilesList parameter, it won't cause an issue.

ExtraFilesList will only work with developer consoles, since it uses the commit function. You must never specify big binary files in the ExtraFilesList, because this can impact the network performance. It is also impossible to commit different executables to different platforms.

Parameter History

Patrol stores historical values for parameters (except text parameters) in a binary history file local to the agent. The disk space used is approximately 8 bytes for value and time stamp. There is also a separate index file with one index entry per parameter. You can turn off history collection for a parameter by changing the setting from "inherited" to "local" and setting the number of days to "0". Since PATROL uses the double audiotape for storing values, you might experience incorrect result when storing values that exceed the storage size of a history value.

Annotations are stored in a separate history database and uses disk space approximately equal to the number of text characters.

You can extract the contents of this history via the dump_hist utility or the PSL dump_hist() function.

Annotated Data-points

If you would like to save textual data for a certain value, you can annotate the datapoint. The syntax for annotation is:

{{annotate("<param path>","<fmt>","<data>",...);}}

The annotation data will be saved as long as the history of the parameter.

When you annotate a datapoint, you will always annotate the last value that was set(). Remember to set() before you annotate.

Annotating data doesn't come for free. All the data you annotate will be saved to disk. Make sure to not just annotate every datapoint with all new log-entries that were collected when monitoring a log file. An annotation point must be an exception because otherwise the operator won't know where to look first.

Alarm Ranges

Alarm ranges define the ranges in which the state of the parameter must be considered OK, WARNING, or ALARM.

Settings

There are three ALARM ranges, as listed below, each with a minimum and a maximum attribute. Both minimum and maximum can only be de ned as integers.

When a new value arrives, it is evaluated against the ALARM range rules like this:

Border [-oo,min border[ and ]max border,oo]

Alarm1 [min alarm1,max alarm1]

Alarm2 [min alarm2,max alarm2]

Note

To denote an exclusive range, use ]. . . ,. . . [ To denote an inclusive range, use [. . . ,. . . ]

These attributes will have to meet the following rule:

min border <= min alarm1 <= max alarm1 <= min alarm2

<= max alarm2 <= max border

This rule means border is exclusive, if the definition is ]0,100[ then 0 is not part of the border range, but 100.1 is. Alarm1 and alarm2 are inclusive, so if the definition is [10,30], then 30 would be ALARM, but 30.01 would not.

Range Overlapping

In case max alarm1 = min alarm2, alarm1 takes precedence.

Recovery Actions

Recovery actions arise from the combination of KM design and the agent. The recovery action is an agent-executed corrective action that is launched after an issue is detected, to attempt to correct it. Recovery actions are automatic and do not require any user intervention.

Recovery actions can also have escalated multiple actions. A sequence of recovery actions can be defined to attempt alternative solutions to an alarm. Recovery actions can also be used to return to normal operation if the issue is corrected.

When a new value is set, the agent will perform a range check (this explains why you don't see something going into alarm immediately after you modified the range, the agent will first have to collect a new value).

No matter what happens, there will only be one recovery action running per parameter! No matter how often you switch between ranges. If there is a recovery action that must be executed (and no recovery action is running), the recovery action will re off immediately. If there was a recovery action running (even on another range), your recovery action will not run.

Each range can have multiple recovery actions. The moment the agent changes range (NOT the same as a state change), he will restart processing the list of recovery actions from the top. If the range is still the same when a new value arrives, the agent will try to re of the next recovery action in the list (as long as there is not another recovery action running for that parameter).

Use Recovery Actions Intelligently

In addition to using recovery actions to recover from an issue, consider using them to prevent an issue from happening, and to prevent situations that adversely affect availability or application throughput.

Recovery Actions on Alarm1, Alarm2, and Border

A recovery action is defined for a range and not a certain state. The state is merely an attribute of this range. This behavior is most of the times overlooked and it seems to be taken for granted that you must not define a recovery action for an OK state. This is unfortunate, because recovery actions on OK states can be very useful. For example, you can define a recovery action for an OK state that would allow more processes to connect to a certain service.

Can Be Used to Tune or Detune

You can define automatic recovery actions to correct problems indicated by a parameter's ALARM, WARNING, or border alarms or to take other appropriate actions based on the parameter's value. When a parameter changes to a triggering state, one or more recovery action command scripts can be queued for execution, each according to its properties settings. Recovery actions can be set up to execute by increasing strength of action depending on the state of the parameter such as ALARM or WARN. In addition, you can create recovery actions that determine the corrective action to take based on other parameter values or data available in the PATROL Agent namespace.

Debugging Recovery Actions

Debugging recovery actions is pretty much the same as with menu commands. Recovery actions only have processes when they are spawned. To debug a recovery action you need to load, compile, and run within the PSL debugger (remember the wrong context) or use the PSL debugger function within your recovery action (to keep context).

Pitfalls

The following sections describe pitfalls with recovery actions.

Turn History Off If Not Needed

If it is very unlikely that a customer will ever use the history for one of your parameters, you must set the history Length to zero. (Maybe you must also reconsider if this parameter serves a purpose.)

Set Scheduling to a Reasonable Interval

If out-of-the-box scheduling is set to 10 minutes, it might be too much or too small. You must adjust the scheduling according to your needs.

Value attribute

You do not set an object. You will set an attribute of the object. The following instruction is therefore invalid and can cause a lot of confusion:

set("/MYKM/myinst/param1",50);

Parameter-related observations

The following are parameter-related observations:

Parameters are always executed per instance
Turn off history if not needed
One collector can replace multiple "standards"
Set scheduling to a reasonable interval
Annotated datapoints reside on disk
Text parameters are resident in memory(make sure they do not grow at each
execution)
Make sure to set the value attribute (/<APPL>/<INST>/<PARM>/value)

Parameters and Recovery Actions

Parameters

Different Parameter Types

Standard Parameters

Consumer Parameters

Collector Parameters

Parameter Styles

Text Parameters

No Output Parameters

Gauge Parameters

Graph Parameters

State Boolean Parameters

Stoplight Parameters

ExtraFilesList Parameter

Creating an ExtraFilesList Parameter

Parameter History

Annotated Data-points

Alarm Ranges

Settings

Range Overlapping

Recovery Actions

Use Recovery Actions Intelligently

Recovery Actions on Alarm1, Alarm2, and Border

Can Be Used to Tune or Detune

Debugging Recovery Actions

Pitfalls

Turn History Off If Not Needed

Set Scheduling to a Reasonable Interval

Value attribute

Parameter-related observations

Comments