Troubleshooting performance issues
When you observe the poor performance of scanning or consolidation in BMC Discovery, you can use the following troubleshooting steps to identify and resolve the problem or create a BMC Support case.
Issue symptoms
- The user interface response is unreasonably slow.
- It takes too long to execute maintenance operations, such as compact, backup, upgrade, restart, reboot, and new generations.
- The hardware usage is unduly high.
- Scans must be stopped to let BMC Discovery delete old nodes and compact them. Failure to stop scans could cause disk saturation.
- The scan or consolidation appears slow or stuck.
- The storage is saturated. The relation between this symptom and the appliance performance is documented in the impact diagram.
How to diagnose a performance issue?
- It is your call to decide what is fast enough and what is not. The following recommendations help you determine if performance is an issue:
- If a UI page consistently takes more than 2 to 4 seconds to reload, it's slow. If it takes more than 5 to 6 seconds, it's very slow.
- If a restart or a reboot takes more than 15 minutes, it's slow. If it takes more than 25 minutes, it's very slow.
- If a TKU upgrade takes over 1 hour, it is slow. If it takes more than 2 hours, it's very slow.
- If a backup takes more than 2 to 3 hours, it's slow. If it takes more than 6 hours, it's very slow.
- If a monthly or weekly compaction takes over 6 hours, it's slow. If it takes more than 12 hours, it's very slow.
How to set your scan performance expectations?
The following recommendations help you determine your performance expectations:
- Decide the volume of IPs you want to scan per day. For example, you may plan to scan 1,00,000 IPs per day.
- Divide this volume by 18 hours. This would leave the remaining 6 hours of the day for deleting the old nodes. For example, to scan 1,00,000 IPs in 18 hours every day=5,555 IPs per hour.
- Compare this number of IPs scanned per hour with the one measured with
tw_support_tool
.
How to measure the scan performance?
Perform the following steps to measure the scan performance:
- Install tw_suppport_tool. If possible, install the tool on the coordinator.
- Run
tw_suppport_tool
in an ssh session opened with thetideway
user. Check the following section in
/tmp/tw_support_tool.latest
.------------ average scan metrics daily activity[...] scan rate measured while scans are running 1034 ip/h 41 success/h 986 dropped/h
The above section means that in the last 10 days, the scan rate was 1034 IPs per hour. This metric points to the IP scan attempts.
When you observe the poor performance of scanning or consolidation in BMC Discovery, you can use the following troubleshooting steps to identify and resolve the problem or create a BMC Support case.
Issue symptoms
- Slow scanning
- Slow consolidation
- Slow user interface
Issue scope
The affected appliance or appliances
Knowledge articles
000302844:
Troubleshooting performance issues when discovery scans and/or the UI are slow
in BMC Communities.
000379045:
How to quickly collect performance data for BMC Customer Support?
in BMC Communities.
Identification of problem
Investigate when the performance problem began. Did it begin after a TKU update, a version update, or has the performance slowly degraded over time? Is the performance generally good, but bad for a specific scan?
- If the problem began after a TKU update, proceed to Problem 5.
- If the problem is for a specific scan, proceed to Problem 6.
In all other cases, review the following information to identify problem(s) that may be present on the appliance:
Problem 1: Too much DDD (directly discovered data)
- In the UI, go to the Administration > Performance > DDD Removal chart.
- Check the following:
- Ideally, the blue line, DAs in datastore, should be below one million. On very large systems with clusters, it may be normal for the blue line to exceed one million.
- Ideally, the red line and the orange line should both touch zero often. Zero is at the bottom of the chart. If they are both always at zero, you won’t really see each line distinctly.
A large number of DAs in the datastore will drag down the performance of BMC Discovery. If DAs are not removed in a timely manner, the number of DAs will increase, and performance is adversely affected.
Solution:
- In the UI, go to the Administration >Model Maintenance page and lower the DDD removal age. Many users lower the value to 14 days from the default of 28 days. Some users lower it even further to 7 days.
- After lowering the DDD removal age, you must STOP ALL SCANS, and then START ALL SCANS to cause the new setting to take effect.
- After lowering the DDD removal age, wait a few days to allow the DAs to be removed.
- After a few days, check the DDD Removal chart again. The number of DAs should be lower.
- After the number of DAs is sufficiently lower, you may want to perform a datastore compaction to reduce the size of the datastore.
- If the red and orange lines never go to zero, you should open a Customer Support case about the problem. There are some known defects and solutions which may be causing problems.
Problem 2: Insufficient RAM or swap
- In the UI, go to the Administration > Performance > Hardware page.
- Look at the Hourly Memory Usage Statistics. If the application is constantly using Swap space (magenta color), it may indicate insufficient RAM or swap.
Solution:
You may benefit by increasing the amount of RAM.
Problem 3: Insufficient CPU
- In the UI, go to the Administration > Configuration page to see how many CPU engines (“Logical Processors”) are on your appliance.
Discovery determines the number of engines based on the number of CPU available. The more CPU you have, the more engines for discovery. CPU needs to be carefully balanced with the amount of RAM or Swap.
Solution:
Increase the CPU along with RAM/Swap. If you increase the number of CPU, you should also increase RAM. For example, if you double your CPU, then also double the RAM. Otherwise, there will not be enough RAM to keep up with all the discovery engines, and the appliance will run out of memory during the discovery scanning.
Problem 4: Datastore is bloated
A bloated datastore could occur when you have not compacted it for a considerable period of time.
Solution:
The best practice is to compact the datastore every 3 to 6 months. If you have not been compacting the datastore, you are strongly advised to do so. This should improve the performance of the UI and the scans. For more information, see the Communities article 000223668,
Best practices for compacting the datastore
Problem 5: Some patterns are too slow
Did the performance problem begin after a TKU update, or after installing a custom pattern?
- Check pattern performance in the UI. To do this, go to Administration > Performance > Patterns.
- Change the date to a day where slow scanning performance occurred for a large part of the day.
- Sort twice on the Average Execution Time.
Notice if there is a pattern at the top of the list that is taking an extraordinary amount of time. - Sort twice on Total Execution Time.
Again, look for problematic patterns at the top of the list.
The times on the page are expressed in seconds. For example, 35.16 translates to 35.16 seconds. You can compare the performance to earlier dates by changing the date on the page. This is especially useful if the performance slowed down due to a TKU update. Remember to choose dates when the same problematic scans were running.
The date range can go back for 30 days because there are 30 days of logs. But to get the full 30 days, you will need to decompress the pattern performance log files from the appliance command-line as the “tideway” user with this command:
gunzip tw_svc_eca_perf_pattern*.gz
Solution:
You can temporarily disable the problematic patterns and see if this resolves the problems with performance. Open a Support case if you think there is a performance problem in a TKU pattern. Rework or disable any custom patterns that may be causing a problem.
If you open a Support case, include a CSV export of the Pattern Performance chart. (Actions->Export as CSV)
Problem 6: Some specific scans are slow
To investigate if specific scans are slow, look at some of the useful Discovery reports.
- Click on the currently-running or completed DiscoveryRun. A Reports list is displayed.
Some useful reports in the list are:
- Unfinished Endpoints (for currently-running scans) – shows the IPs are still in-process. The duration is printed as: d.hh:mm:ss (days.hours:minutes:seconds).
- Endpoint Timings – shows the durations of all finished IPs. The IPs with the largest Total Duration are at the top. The duration is printed as: d.hh:mm:ss (days.hours:minutes:seconds).
Note that there is Total Duration and Total Discovery Duration in the report. When the Total Duration is significantly larger than the Total Discovery Duration, note that patterns and datastore activity account for most of that difference.
- DiscoveryAccess Finishing Rate.
Solution:
Investigate the problematic IPs.
- Try scanning just one long-running IP by itself and check if it still takes a long time.
- Try excluding the longest-running IPs from your discovery run. Does that help the run go faster?
- If the Total Duration is significantly longer than the Total Discovery Duration, then investigate the performance of the patterns (see Problem 5). Also, you may consider compacting your datastore (Problem 4).
Open a Discovery Support Case
If none of these steps are helpful, open a Support case. See
KA 000379045
in BMC Communities for instructions to run a tool that collects performance information that can be attached to your Support case.
Comments
Log in or register to comment.