Parameters and Recovery Actions
Parameters allow you to gather and display metrics of an application. A typical KM defines a number of parameter scripts that collect various metrics. As soon as an applications instance is created, all the parameters defined in the application class will automatically be instantiated as well. Parameters that have scripts defined will be put in the RUNQ and will be scheduled by the scheduler.
There are three types of parameters (standard, collector, and consumer) each with different characteristics. The following table lists the mandatory properties of a parameter.
Mandatory Properties of a Parameter
Identifies the parameter in the namespace After the name has been defined, it can't be changed with the developer console anymore. If you really need the change the name of a parameter and don't want to duplicate it manually, you could edit the KM file with a text editor outside of the developer console. After the KM has been modified in such a way, you must restart the developer console for the changes to take effect.
Allows you to create parameters which are inactive by default Usually you will only create active parameters. The active flag is frequently used by customers who are not interested to see the parameter in the console.
Allows you to enter the help context ID and the help file The help context ID has to be numeric (remember this when you are writing your Windows help files, because Windows also supports string context ID's). The help file, must contain a file name without path information.
Standard, collector, or consumer
Besides these mandatory properties, there are also properties that you will only be able to enter, depending on the parameter type you are using. The properties can be divided in two categories: execution and visualization. Depending on the type of parameter you are defining, you might or might not be able to enter information for the properties listed in the following table.
pconfig KM Files (Part 1 of 2)
Parameters that collect data will have to run scripts to gather the data. If the parameter executes a script, you can enter the following information:
Visualization and actions
Parameters that are responsible for visualizing data or executing recovery actions will need to define how the data should be shown and when an alarm should be triggered. You will be able to enter the following information:
Different Parameter Types
The standard parameter is the most complete of the three types of parameters. This parameter offers both collection and visualization.
Non-PSL command type scripts will set the parameters value with the data returned on standard output. Ensure that you return only one value if your parameter type is not text.
If you are using PSL, the standard parameter can act as a collector as well. Actually you could say that the standard parameter will then behave like a collector with visualization build in.
Even turning a very important collector to a standard parameter could be beneficial. In this case the value of the standard parameter could contain an indication of the collection status.
If you set the output to something other than no output, you can update the parameter directly from the PATROL Console.
A no-output standard parameter is not the same as a collector, standard has history and alarm ranges and can run recovery actions.
A consumer parameter only implements the visualization and recovery actions.
The most challenging issue with a consumer parameter is that you can't determine which parameter, menu command, or discovery script sets the consumer. Therefore, you must document where the setting of the parameter occurs.
The following are characteristics of consumer parameters:
- "Listens" to changes in "value" attributes
- No scheduling
- Not updateable (no scripts)
- No backward references (such as who issued a set() function)
- Must be well documented
A collector parameter only implements execution of scripts.
You must only use the PSL command type when defining a collector. If you use another command type, stdout will implicitly set the value attribute of the parameter, but since a collector doesn't have a value attribute, you will get a run-time error.
There are several parameter styles to select from. As listed above, not all parameter type support all styles. A definition of each parameter style and its meaning is given below.
The text parameter is the only parameter that does not store history, but "No history" does not mean "no memory". The value of the text parameter is stored in the agent's memory. The more text you store in the parameter, the bigger the agent will become, so you have to be very careful if you type commands like this:
These cumulative parameters, have a risk consuming a great deal of memory. For the same reason, you must not use text parameters for displaying entire log files. If you really want to show something of the log, you could decide to only show the new log entries since the last collection circle.
You cannot define alarm ranges on a text parameter8, but you can change the state of the text parameter by changing the status attribute like this:
Because you can't define alarm ranges on a text parameter, it will be impossible to run recovery actions as well. If you really want to run recovery actions, you will have to do it in the collection cycle or by creating a parameter that allows you to run the recovery actions.
No Output Parameters
Although no-output parameters don't show up, you can use them to store history. Even a consumer parameter can have a no-output type.
You can define alarm ranges, but they will be ignored.
A gauge parameter can only show a single value at a time. When annotated data is available, the Info button on the gauge will be activated. You can click the button to display the annotation data.
Graph parameters provide the most flexibility. After opening them, you will immediately get an overview of the status over time and the trend of the collected data. It is also possible to drag and drop graph type parameters together so it will be displayed as a single graph. The graph type can be changed to for example a pie or a bar.
Annotated data points will be marked with an asterisk.
State Boolean Parameters
The boolean output type is not really true boolean, but more a tri-state boolean. Although the output type suggests only true and false (1 and 0, OK or not OK), this boolean respects all PATROL states (OK, WARNING and ALARM). The WARNING and ALARM state will be shown as a false sign, OK will be shown as OK.
This output type behaves a bit bizarre on NT consoles earlier than 3.4 since the stoplight on NT will only show the green or red light, and the platter will display the actual difference in the state. A yellow platter indicates a WARNING state. If the stoplight is red and blinking, it indicates an ALARM state.
From a UNIX console, the stoplight turns green for OK, yellow for WARNING, and red for ALARM. From PATROL v3.4 on the behavior on the UNIX and NT console is the same.
ExtraFilesList allows a KM developer to specify extra files which should be committed to an agent when the KM is committed. Below are the steps required to specify extra files.
Creating an ExtraFilesList Parameter
From a Parameter Dialog Box, perform the following steps:
- Create a standard parameter called ExtraFilesList in the desired KM. Do not change the scheduling information.
- Clear the [Active ] check box so that the parameter is not active.
- Add the list of files to be committed in the command text window. Each file must be specified on a separate line. A distinction can be made between PSL library files and any other file that should be send by putting the keyword #EXTRA or #LIB before the file name.
The #LIB keyword should be used when you specify PSL libraries, although the lib keyword doesn't really make a difference (yet).
The #EXTRA should be used for all non-library files.
Files for both keywords will be unconditionally send during commit.
File locations are relative to the local or global psl directory ($PATROL HOME/lib/psl ). When the console attempts to find the specified file, it always considers the local directory first. The files will end up on the agent in the same location relative to the global psl directory, regardless of whether the file was taken from the local directory or the global directory.
You must start the line with a # character because the # character is used as the comment character by most shells and PSL. Thus, if someone accidentally activates the ExtraFilesList parameter, it won't cause an issue.
ExtraFilesList will only work with developer consoles, since it uses the commit function. You must never specify big binary files in the ExtraFilesList, because this can impact the network performance. It is also impossible to commit different executables to different platforms.
Patrol stores historical values for parameters (except text parameters) in a binary history file local to the agent. The disk space used is approximately 8 bytes for value and time stamp. There is also a separate index file with one index entry per parameter. You can turn off history collection for a parameter by changing the setting from "inherited" to "local" and setting the number of days to "0". Since PATROL uses the double audiotape for storing values, you might experience incorrect result when storing values that exceed the storage size of a history value.
Annotations are stored in a separate history database and uses disk space approximately equal to the number of text characters.
You can extract the contents of this history via the dump_hist utility or the PSL dump_hist() function.
If you would like to save textual data for a certain value, you can annotate the datapoint. The syntax for annotation is:
The annotation data will be saved as long as the history of the parameter.
When you annotate a datapoint, you will always annotate the last value that was set(). Remember to set() before you annotate.
Annotating data doesn't come for free. All the data you annotate will be saved to disk. Make sure to not just annotate every datapoint with all new log-entries that were collected when monitoring a log file. An annotation point must be an exception because otherwise the operator won't know where to look first.
Alarm ranges define the ranges in which the state of the parameter must be considered OK, WARNING, or ALARM.
There are three ALARM ranges, as listed below, each with a minimum and a maximum attribute. Both minimum and maximum can only be de ned as integers.
When a new value arrives, it is evaluated against the ALARM range rules like this:
Border [-oo,min border[ and ]max border,oo]
Alarm1 [min alarm1,max alarm1]
Alarm2 [min alarm2,max alarm2]
To denote an exclusive range, use ]. . . ,. . . [ To denote an inclusive range, use [. . . ,. . . ]
These attributes will have to meet the following rule:
min border <= min alarm1 <= max alarm1 <= min alarm2
<= max alarm2 <= max border
This rule means border is exclusive, if the definition is ]0,100[ then 0 is not part of the border range, but 100.1 is. Alarm1 and alarm2 are inclusive, so if the definition is [10,30], then 30 would be ALARM, but 30.01 would not.
In case max alarm1 = min alarm2, alarm1 takes precedence.
Recovery actions arise from the combination of KM design and the agent. The recovery action is an agent-executed corrective action that is launched after an issue is detected, to attempt to correct it. Recovery actions are automatic and do not require any user intervention.
Recovery actions can also have escalated multiple actions. A sequence of recovery actions can be defined to attempt alternative solutions to an alarm. Recovery actions can also be used to return to normal operation if the issue is corrected.
When a new value is set, the agent will perform a range check (this explains why you don't see something going into alarm immediately after you modified the range, the agent will first have to collect a new value).
No matter what happens, there will only be one recovery action running per parameter! No matter how often you switch between ranges. If there is a recovery action that must be executed (and no recovery action is running), the recovery action will re off immediately. If there was a recovery action running (even on another range), your recovery action will not run.
Each range can have multiple recovery actions. The moment the agent changes range (NOT the same as a state change), he will restart processing the list of recovery actions from the top. If the range is still the same when a new value arrives, the agent will try to re of the next recovery action in the list (as long as there is not another recovery action running for that parameter).
Use Recovery Actions Intelligently
In addition to using recovery actions to recover from an issue, consider using them to prevent an issue from happening, and to prevent situations that adversely affect availability or application throughput.
Recovery Actions on Alarm1, Alarm2, and Border
A recovery action is defined for a range and not a certain state. The state is merely an attribute of this range. This behavior is most of the times overlooked and it seems to be taken for granted that you must not define a recovery action for an OK state. This is unfortunate, because recovery actions on OK states can be very useful. For example, you can define a recovery action for an OK state that would allow more processes to connect to a certain service.
Can Be Used to Tune or Detune
You can define automatic recovery actions to correct problems indicated by a parameter's ALARM, WARNING, or border alarms or to take other appropriate actions based on the parameter's value. When a parameter changes to a triggering state, one or more recovery action command scripts can be queued for execution, each according to its properties settings. Recovery actions can be set up to execute by increasing strength of action depending on the state of the parameter such as ALARM or WARN. In addition, you can create recovery actions that determine the corrective action to take based on other parameter values or data available in the PATROL Agent namespace.
Debugging Recovery Actions
Debugging recovery actions is pretty much the same as with menu commands. Recovery actions only have processes when they are spawned. To debug a recovery action you need to load, compile, and run within the PSL debugger (remember the wrong context) or use the PSL debugger function within your recovery action (to keep context).
The following sections describe pitfalls with recovery actions.
Turn History Off If Not Needed
If it is very unlikely that a customer will ever use the history for one of your parameters, you must set the history Length to zero. (Maybe you must also reconsider if this parameter serves a purpose.)
Set Scheduling to a Reasonable Interval
If out-of-the-box scheduling is set to 10 minutes, it might be too much or too small. You must adjust the scheduling according to your needs.
You do not set an object. You will set an attribute of the object. The following instruction is therefore invalid and can cause a lot of confusion:
The following are parameter-related observations:
- Parameters are always executed per instance
- Turn off history if not needed
- One collector can replace multiple "standards"
- Set scheduling to a reasonable interval
- Annotated datapoints reside on disk
- Text parameters are resident in memory(make sure they do not grow at each
- Make sure to set the value attribute (/<APPL>/<INST>/<PARM>/value)