This is the latest documentation for BMC Helix Network Management (formerly known as Netreo).

 

System Management Tools


Description

Caution

These tools are intended to be used only as directed by a BMC Helix BMC Helix support engineer and are available only to users with the SuperAdmin access level. They are not intended for customer use in troubleshooting their own systems and are not supported for that use.

The System Management page contains several tools that can be used to aid BMC Helix support in diagnosing issues with your BMC Helix Network Management deployment.

Select Administration >> System >> System Management from the main menu to navigate to the System Management page.

These tools can be run from the UI of any active BMC Helix Network Management appliance (with one exception), including:

  • Any primary instance (including high-availability and Overview)
  • The replica instance of a high-availability deployment
  • Any service engine instance
  • Any client instance of a BMC Helix Network Management Overview deployment

(Exception) These tools can not be run from the arbitrator instance of a high-availability deployment, as it does not have a UI. However, an arbitrator can be selected as the target of the tools when run from another instance.

If run from a primary instance, any other connected BMC Helix Network Management instance can be selected as the tool's target. However, if run from a non-primary instance, only that instance (that is, itself) can be selected as the target.

MySQL Query Tool

With this tool, users can run queries (Select only) against the MySQL databases of BMC Helix Network Management VMs.

The purpose of this tool is to enable BMC Helix Network Management support personnel to easily obtain database information from a customer's system to aid in troubleshooting. BMC Helix Network Management support personnel will provide the customer with specific queries that can be run from this tool to collect the necessary data. That data can then be exported via the tool and sent to BMC Helix Network Management for evaluation. This enables support personnel to examine the data without requiring direct access to the customer system.

Procedure

To run a BMC Helix Network Management-provided query through this tool, follow the procedure below.

  1. Log in as a user with the SuperAdmin access level to either your primary BMC Helix Network Management appliance or the BMC Helix Network Management instance against which you plan to run the query.
  2. Select Administration >> System >> System Management from the main menu to open the System Management page.
  3. If not already selected, select the "MySQL Query" tool by using the buttons at the top of the page.
  4. On the Application System panel, in the NETREO DEVICE field, use the pull-down menu to select the BMC Helix Network Management instance against which to run the query.
  5. On the Query panel, copy/paste the query provided to you by BMC Helix Network Management support. (Only supports read-only commands.)
  6. Click the Submit button.
  7. The results of the query are displayed in the Result panel.
  8. Click the CSV button to export the data to a CSV file.
  9. Send the file to BMC Helix Network Management using the method requested by support.

The tool has a maximum limit of 1000 results. However, it does accept commands to limit the maximum results to fewer than that.

Processes Tool

With this tool, users can manually restart selected BMC Helix Network Management internal processes or completely restart a BMC Helix Network Management VM from within the BMC Helix Network Management UI.

The purpose of this tool is to enable BMC Helix Network Management support personnel to easily restart BMC Helix Network Management internal processes (a common troubleshooting step) without resorting to the command line. Using this tool enables BMC Helix Network Management support personnel to direct customers to restart specific processes without the need for direct access to a customer's system. It can also be used to restart a BMC Helix Network Management VM from within the BMC Helix Network Management UI (expect an appropriate amount of downtime for the relevant system when rebooting a VM).

When the BMC Helix Network Management processes affected by this tool are listed, they are organized into the following categories:

  • Availability Engine - Processes responsible for service checks and host availability checking.
  • Database - Processes responsible for the database that holds BMC Helix Network Management system configurations. (This does not include managed device historical performance data. That is stored separately.)
  • Incident Management - Processes responsible for incident creation, alarm correlation, and the sending of alert notifications.
  • Logging and Traps - Processes responsible for processing SNMP traps and syslogs sent to BMC Helix Network Management.
  • NetFlow - Processes responsible for processing traffic flow packets sent to BMC Helix Network Management.
  • BMC Helix Network Management Monitor - Processes responsible for starting, stopping, and monitoring BMC Helix Network Management's internal functions. Includes all of the services and processes listed in these categories.
  • Polling Engine - Processes responsible for all the jobs that request, receive, process, and store all historical performance data for managed devices.

This tool supports two methods of process restart:

  • Graceful Restart (uses the restart command) - The square icon. Attempts to restart the process normally.
  • Force Restart (uses the kill-9 command) - The X icon. Forces the process to stop without attempting to restart. BMC Helix Network Management then attempts to restart the process using its own internal logic.

Procedure

Follow the procedure below to use the tool to restart a group of BMC Helix Network Management services (or a BMC Helix Network Management VM).

  1. Log in as a user with the SuperAdmin access level to either your primary BMC Helix Network Management appliance or the BMC Helix Network Management instance against which you plan to run the query.
  2. Select Administration >> System >> System Management from the main menu to open the System Management page.
  3. If not already selected, select the "Processes" tool by using the buttons at the top of the page.
  4. On the Application System panel, in the NETREO DEVICE field, use the pull-down menu to select the BMC Helix Network Management instance against which to run the tool.
  5. Click the List Processes button.
  6. A list of BMC Helix Network Management internal process categories is displayed, along with their current status. (Each category includes multiple related processes.)
    • The list results from a single query and is not a real time display of running processes.
    • Different categories are displayed depending on your BMC Helix Network Management deployment configuration (not all categories are available on all appliances).
    • BMC Helix Network Management processes currently not running are displayed as "missing" in the category info area.
  7. As directed by a BMC Helix support  engineer, restart a process category using the buttons to the right.
    1. In the pop-up dialog that appears, click the Yes button.
  8. Click the List Processes button again to confirm that the selected category has actually stopped all of its processes.

It is essential to note that BMC Helix Network Management frequently restarts its own processes during the normal execution of events, so stopped processes displayed in the list are not necessarily a cause for concern. However, if a process continually shows as stopped, we recommend you contact BMC Helix support for additional help.

HA Database Processes

Due to the way databases work in a BMC Helix Network Management high-availability (HA) deployment, the Database category of processes is never available for appliances with an active HA configuration (Administration >> System >> High Availability). Delete your HA configuration to access the Database category of processes for those appliances.

BMC Helix Network Management Workers

Advanced BMC Helix Network Management Users Only

The BMC Helix Network Management Workers tool is intended for use by advanced BMC Helix Network Management Administrators. BMC Helix Network Management attempts to check memory usage to prevent system crashes caused by unreasonable values. Still, misconfiguring BMC Helix Network Management using this tool can lead to extreme performance issues, resulting in missing or lost data.

This tool enables you to manually override the maximum number of simultaneously spawned BMC Helix Network Management worker instances for a specific appliance.

A worker is a single process tasked with completing a specific job within the BMC Helix Network Management monitoring workflow. Many worker instances for different jobs are continuously spawned and die as BMC Helix Network Management monitors your environment.

If your BMC Helix Network Management deployment includes a highly demanding environment (for example, a large network environment monitoring QoS and traffic flows), this tool provides you with the ability to customize BMC Helix Network Management worker spawning to maximize the usage of your available resources.

The adjustable worker types are:

  • Pollmaster Workers - These workers collect metrics from a managed device. They know which device to query and what metrics to collect.
  • Pollmaster Result Workers - These workers process the metrics data retrieved by the Pollmaster Workers and send the results to storage.
  • NetFlow Workers - These workers process the data from incoming traffic flow packets and send the results to storage.
  • OAM Workers - These workers perform the service checks assigned to managed devices. Each worker executes one service check and processes the results.

Normally, the maximum number of simultaneous workers for each type is calculated based on the available resources for the appliance at the time of deployment (except for NetFlow Workers, which is a static value). Refer to the Fields section below for details on how the default values for each worker type are calculated.

Fields

  • Application System Panel
    • NETREO DEVICE - Selects the appliance on which to make changes. You can select the core appliance of a standard on-premises deployment, the primary appliance in an HA deployment, a service engine appliance, or a service engine group. Selecting a service engine group applies the settings equally to all service engine appliances within the group. Replica and arbitrator appliances for HA deployments cannot be selected, as their settings automatically mirror those of the primary appliance.
  • BMC Helix Network Management Workers Configuration Panel
    • POLLMASTER WORKERS - Limits the maximum number of simultaneous Pollmaster Workers. By default, this value is the number of CPU cores available to the appliance, up to a maximum of 40.
    • POLLMASTER RESULT WORKERS - Limits the maximum number of simultaneous Pollmaster Result Workers. The default value is based on the type of appliance:
      • Primary appliance default = number of CPU cores * 1.5, up to a maximum of 20
      • Service engine appliance default = 4
    • NETFLOW WORKERS - Limits the maximum number of simultaneous NetFlow Workers. By default, this value is 3.
    • OAM WORKERS - Limits the maximum number of simultaneous OAM Workers. By default, this value is the number of CPU cores * 5, up to a maximum of 125.

By default, the worker configuration fields do not display the current value calculated for that field, as this would require the field to live-update on any resource changes. Instead, the fields display "Job count limit" until the user manually overrides them.

To set a custom value: Check the OVERWRITE checkbox for the desired field, enter the new value in the text field, and click Save. Once a value has been overwritten, the new value displays in the text field.

To remove a custom value and return to the default: Delete the value in the text field, uncheck the OVERWRITE checkbox, and click Save. BMC Helix Network Management will recalculate the correct default value based on currently available resources for that appliance.

After changing the worker configuration, a primary appliance will take at least 5 minutes to adjust to the new values. Service engines have their worker values updated when synchronizing their data with the primary appliance, so allow at least 20 minutes for the new values to take effect.

Reccomendations

Follow the recommendations below to maximize BMC Helix Network Management's performance based on your deployment's available resources.

Note: When checking the various performance statistics identified in the recommendations during tuning, also check the Swap memory statistic on the Performance tab of the Device Dashboard of the BMC Helix Network Management managed device you want to tune. If the value in the Swap graph is greater than 10% before tuning, then increasing the workers will not help (in fact, it would be detrimental), as you are likely already experiencing performance issues due to insufficient memory resources.

Caution

Setting worker values that are too high can cause gaps in historical data, as an overall system load that is too high can cause delays in processing. Conversely, setting values that are too low can also result in gaps in historical data, as there might not be enough workers to process the work volume according to BMC Helix Network Management's schedule.

Pollmaster Workers

When tuning the Pollmaster Workers value, first check the Performance tab of the Device Dashboard for the BMC Helix Network Management managed device that you want to tune. (Remember, this can be the BMC Helix Network Management core appliance, a service engine, or a service engine group.)

On the Performance tab, look for the Polling Queue statistic. This value represents the percentage of the total monitored device count for which pollmaster jobs are waiting to be processed. Ideally, the value of this statistic would always be zero. Occasional spikes in the graph are considered normal and acceptable. However, if there is a persistent nonzero value in this graph, try increasing the number of pollmaster workers until the Pollmaster Queue consistently remains at zero. Increase the number of workers in increments of 10% of the default value. Because the default value is not displayed in the UI, determine the default value for your appliance by using the information provided in the Fields section discussed earlier.

If Swap is at zero and increasing the number of workers fails to affect the Polling Queue value, it is likely that you have devices that are responding slowly to polling, and they are the reason for the increased polling queue. Remove any worker adjustments and address any slow-responding devices first. Then, begin increasing the workers again to see if it reduces the Polling Queue value. Be careful as you increase the number of pollmaster workers, so that the Swap value does not increase as well.

As an additional note, when tuning the worker values, it is recommended to never set the values lower than what the default calculations would produce. For example, if 16 CPU cores are available for the appliance, don't set the number of pollmaster workers to less than 16. See the Fields section discussed earlier for default values.

Pollmaster Result Workers

The only time you would want to change the number of pollmaster result workers is when the job completion rate for the pollmaster_result statistic is less than 100% for more than 10% of the time (2-3 hours out of a given 24-hour period). Ideally, the value of this statistic would always be 100%.

Check the pollmaster_result statistic by opening the Performance tab of the Device Dashboard for the BMC Helix Network Management managed device that you want to tune and locating the pollmaster_result statistic within the BMC Helix Network Management Queue Performance statistical group. The percentage of completed jobs is displayed in the "JOB COMPLETION" column. Click the value in the column to open a 24-hour graph for that statistic.

Try increasing the number of pollmaster result workers by 10% of the default value. (Because the default value is not displayed in the UI, determine the default value for your appliance by using the information provided in the Fields section discussed earlier.)

If adjusting the number of pollmaster result workers fails to affect the pollmaster_result job completion value within roughly 2 hours, it might indicate a database/storage speed issue. Do not continue to increase the workforce beyond the initial 10%. Checking the I/O Performance statistics for consistently high values might reveal a storage performance bottleneck.

NetFlow Workers

BMC Helix Network Management tracks several traffic flow processing statistics for each appliance that is receiving flow data. Review these statistics by locating them on the Performance tab of the Device Dashboard for the BMC Helix Network Management managed device that you would like to tune. Try increasing the number of NetFlow workers by 1. Wait at least 1 hour, and then check to see if the traffic flow statistic values increase.

If the values do not increase, then BMC Helix Network Management is already processing all received traffic flows, and no adjustment is necessary. Remove the adjustment to reset the default value.

If the values do increase, then BMC Helix Network Management was not processing all received traffic flows. Continue increasing the number of NetFlow workers by 1 and rechecking the statistics until they stabilize, and no further gains are made. At that point, BMC Helix Network Management is processing all received traffic flows.

OAM Workers

When tuning the OAM Workers value, first check the Performance tab of the Device Dashboard for the BMC Helix Network Management managed device that you want to tune. If using service engines, you should only need to tune this value on them (or the service engine group to which they are assigned). If not using service engines, you would tune this value on the core appliance.

On the Performance tab, look for the OAM Latency statistic. Ideally, this value would be 0, with minor bursts for no more than a few minutes. However, if the graph shows persistent latency greater than 5 seconds over a period of more than 30 minutes, you can try increasing the OAM Workers value until the OAM Latency is reduced.

Try increasing the number of OAM workers by 10% of the default value. (Because the default value is not displayed in the UI, determine the default value for your appliance by using the information provided in the Fields section discussed earlier.)

If increasing the OAM workers is effective in reducing latency, but the latency value continues to be greater than desired, continue increasing the number of workers by 10% of the original value each attempt until latency is reduced to the desired level.

Grow Disk Size

With this tool, users can allocate more virtual storage space to the BMC Helix Network Management disk partition. Useful if you plan to add more devices to BMC Helix Network Management for monitoring (see On-premise Deployment Hardware Guide for information on storage requirements). Note that additional space must first be allocated in your virtual environment so that the tool can detect it, as this tool cannot change your virtual environment for you.

Procedure

Follow the procedure below to grow your BMC Helix Network Management disk partition to fully utilize the allocated storage space.

Note: Before performing this procedure, you must use your virtual environment management console to allocate the desired additional storage space to the BMC Helix Network Management VM. (Depending on your environment, you might or might not need to shut down your VM. Refer to your virtual environment manager documentation for requirements.)

  1. Log in as a user with the SuperAdmin access level.
  2. Select Administration >> System >> System Management from the main menu to open the System Management page.
  3. Click the Grow Disk Size button at the top of the page.
  4. In the NETREO DEVICE field, select the deployed BMC Helix Network Management virtual appliance to grow the disk size for.
  5. Click Show Disk Partition to analyze and display information about the current disk partition. The following information is displayed:
    • Partition name
    • Current total size
    • Current usage
    • Any unallocated space (there must be unallocated space to grow the BMC Helix Network Management disk partition)
  6. Click the play button icon to the right of the partition information to extend the disk size. Then click Yes when prompted.
  7. The BMC Helix Network Management disk partition is extended, consuming all available unallocated space.
  8. The partition information is updated, displaying the new values.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Network Management