Troubleshooting performance issues

When you observe the poor performance of scanning or consolidation in BMC Discovery, you can use the following troubleshooting steps to identify and resolve the problem or create a BMC Support case.

Issue symptoms

The user interface response is unreasonably slow.
It takes too long to execute maintenance operations, such as compact, backup, upgrade, restart, reboot, and new generations.
The hardware usage is unduly high.
Scans must be stopped to let BMC Discovery delete old nodes and compact them. Failure to stop scans could cause disk saturation.
The scan or consolidation appears slow or stuck.
The storage is saturated. The relation between this symptom and the appliance performance is documented in the impact diagram in this article on the BMC Community page.

Issue scope

The issue can affect any of the installed appliances.

How to diagnose a performance issue in an appliance?

It is your call to decide which performance is fast enough and which is not. The following recommendations help you determine if appliance performance is an issue:

If a UI page consistently takes more than 2 to 4 seconds to reload, it's slow. If it takes more than 5 to 6 seconds, it's very slow.
If a restart or a reboot takes more than 15 minutes, it's slow. If it takes more than 25 minutes, it's very slow.
If a TKU upgrade takes over 1 hour, it is slow. If it takes more than 2 hours, it's very slow.
If a backup takes more than 2 to 3 hours, it's slow. If it takes more than 6 hours, it's very slow.
If a monthly or weekly compaction takes over 6 hours, it's slow. If it takes more than 12 hours, it's very slow.

How to set your scan performance expectations?

Perform the following steps to determine your performance expectations:

Decide the volume of IPs you want to scan per day. For example, you may plan to scan 100,000 IPs per day.
Divide this volume by 18 hours. This would leave the remaining 6 hours of the day for deleting the old nodes. For example, to scan 100,000 IPs in 18 hours every day=5,555 IPs per hour.
Compare this number of IPs scanned per hour with the one measured with tw_support_tool.

How to measure the scan performance?

Perform the following steps to measure the scan performance:

Install tw_suppport_tool as described in this article on the BMC Community page. If possible, install the tool on the coordinator.
Run tw_suppport_tool in an ssh session opened with the tideway user.
Check the following section in /tmp/tw_support_tool.latest.
------------ average scan metrics

             daily activity[...]
             scan rate measured while scans are running
                   1034 ip/h
                   41 success/h
                   986 dropped/h
The above sample section means that in the last 10 days, the scan rate was 1034 IPs per hour. This metric points to the IP scan attempts.

Best practices to follow if performance expectations are not met

Performance problems are often systemic issues, as suggested by the impact diagram in this article on the BMC Community page. A single action or optimization is rarely enough to meet your expectations. For this reason, the best practice is to review the most frequent problems listed in the following sections and identify the scenario that applies to your environment:

Problem 1: Unreasonable accumulation of DDD nodes
Problem 2: Infrequent compaction
Problem 3: The BMC Discovery version is old
Problem 4: The hardware may be undersized
Problem 5: Some patterns are too slow

You can also look up additional information to investigate performance issues. If it does not help or you need assistance to investigate further, create a new BMC Support case that includes the following information:

Your expectations (minimum acceptable scan rate in IP/hour)
The result of tw_support_tool
The list of checkpoints that you verified and the changes you implemented to meet your expectations

Problem 1: Unreasonable accumulation of DDD nodes

Use the guidance available in this article on the BMC Community page to diagnose the problem of an unreasonable accumulation of DDD nodes. It could be enough to resolve your problem.

Such a problem can be a cause or a consequence of a performance issue. For more information, see the impact diagram on the BMC Community page that shows the links between this problem and the performance issue.

Problem 2: Infrequent compaction

To diagnose the problem of infrequent compaction, check the compaction frequency. Review the section compaction history (10 last ones) in /tmp/tw_support_tool.latest and compare it with the section deployment date and the latest datastore reset date in tw_model_wipe history.

Perform any of the following actions depending on your environment settings:

If the multi-generational datastore is disabled by default, and the latest compaction date is older than 1 month, perform compaction again and check if the performance is acceptable thereafter.
If the multi-generational datastore is enabled, and the latest compaction date is older than 1 week, perform compaction again and check if the performance is acceptable thereafter.
Review the size of the doomed files. You can locate the section size of the doomed file(s) in /tmp/tw_support_tool.latest. You can decide the need for compaction based on the following guidelines:
- If the biggest doomed file size is < 50 MB, compaction is probably not needed.
- If the biggest doomed file size is > 200 GB, compaction will probably help.
- If the biggest doomed file size is > 500 GB, compaction will certainly help.

Problem 3: The BMC Discovery version is old

The latest version of BMC Discovery is usually the fastest. If you are using one of the older versions of BMC Discovery, especially a version earlier than 21.3 (12.3), then an upgrade may be enough to meet your performance expectations.

Problem 4: The hardware may be undersized

To diagnose the problem of undersized hardware, perform the following:

For clusters: Check that all the members of a cluster are compliant with the documented minimum requirements. Otherwise, upgrade the hardware accordingly.
The hardware must be the same for all the members of a cluster. If you upgrade the hardware of a member, upgrade the hardware of the other members in the cluster too.
For standalone appliances: Make sure that the appliance is compliant with the documented sizing recommendations. Otherwise, upgrade the hardware accordingly.

If this solution does not help, review /tmp/tw_support_tool.latest after measuring the performance with tw_support_tool as described in How to measure the scan performance.

For clusters, check the section ARCHITECTURE of /tmp/tw_support_tool.latest to review the hardware settings. Upgrade the RAM or CPU settings to make the settings consistent across all the cluster members.
Review the section top 3 biggest db files. If a file is bigger than the RAM, upgrade the RAM.
Review the sections swapping hard fault/s, swap page/s (in+out), and cpu time spent by kswapd0. These sections allow you to evaluate the swapping activity. If the swapping activity is unreasonable, upgrade the RAM.
Review the sections load per core, cpu stats, and most loaded core. These sections allow you to evaluate the processor activity. If the processor activity is unreasonable, upgrade the CPU.

Problem 5: Some patterns are too slow

If a pattern is slow, it can be a cause or a consequence of a performance issue. Refer to the following guidelines for debugging pattern-related issues:

In /tmp/tw_support_tool.latest, check the section most time consuming patterns, and identify the slowest patterns.
- If there is no big difference between the first pattern and the following ones, the slowness is probably a consequence of a performance issue. Therefore, disabling the patterns is unlikely to help.
- If there is a big difference between the first pattern and the following ones, the slowness is probably the root cause of a performance issue. To confirm the cause, disable the first pattern and measure the performance again.

Usually, the patterns released by BMC Software are rarely involved in performance issues.

Additional information to investigate performance issues

Besides the solutions offered for problems listed in the preceding sections, the following additional information may help you investigate the performance issues:

Review the black boxes in the impact diagram. Each of them can cause or contribute to a performance issue.
Review the factors affecting performance.
If the problem looks specific to some IPs, review the Unfinished Endpoints and Endpoint Timings reports on the DiscoveryRun page.