Troubleshooting Java memory management


This topic describes some of the issues and troubleshooting related to TrueSight Infrastructure Management Java memory management. The goal is to help you avoid potential performance impacts.

Java memory management - Useful links from BMC Communities

Memory issues can appear on almost any component, but the most common areas of concern are pserver, jserver, csr, index server, and agent controller.

If you experience performance and/or memory issues, run the TrueSight Health Check Tool. See TrueSight Health Check Tool.


Infrastructure Management Maintenance Tool

The Infrastructure Management Maintenance Tool utility is included with the Infrastructure Management installation, and is made up of tabbed pages that provide options for administering Infrastructure Management. You can use this utility to:

  • View installation and configuration log files
  • Locate and view other log files on your system
  • Zip and send files to BMC Customer Support
  • Run the post-installation health check
  • Set an encrypted password
  • Run the memory usage and disk space check
  • Update performance tuning parameters

Administering Infrastructure Management Maintenance Tool - Documentation link

Tune Parameters tab

You can use the Infrastructure Management Maintenance Tool Tune Parameters tab to configure the performance tuning parameters. 

 Perform the following steps to configure the performance tuning parameters using the Tune Parameters tab:

  1. Start Infrastructure Management Maintenance Tool.
  2. Click the Tune Parameters tab.
  3. Click Tune Parameters if you want to tune server performance parameters after modifying the parameter values in the ServerPerformanceParameters.csv file located in the pw\custom\conf directory.
  4. When the performance tuning is completed, restart all the Infrastructure Management services. 

Diagnosing and reporting an issue

After you identify the symptoms and scope of the issue, use the troubleshooting guide to help diagnose and resolve the issue or to contact Customer Support. 

Action

Steps

Jserver process crash due to insufficient memory:

If the process has crashed due to insufficient memory then a memory dump file will be created in <PnServerPath>\pw\pronto\logs directory with .hprof extension.

In case of a JVM crash, a core dump file with this naming convention 'hs_err_pid*.log' will be created under <PnServerPath>\pw\pronto\tmp.
'pid' refers to the process id of jserver process.

Jserver process in hung state

Determining this state is not straight forward, typically Operations/Admin console could become unresponsive because the requests to jserver take more time to respond. Please not this is a possibility and may not be an exact symptom.
We can collect the thread dump in this case.
In windows : pw threaddump jserver
In solaris : kill –QUIT <pid>
In windows, a file by name jserverTD.out will be created under <PnServerPath>\pw\pronto\logs directory.
In case of solaris, the output will be captured in <PnServerPath>\pw\pronto\logs\jserver.out.

How to enable debug logs for jserver process

The command to enable debug of jserver process is

pw debug on -p jserver [-s <subsystem>]

Enabling debug for all the subsystem is expensive and the log files will be flooded with messages. So it is always advisable to enable for the required subsystems. The subsystem to be enabled should be determined based on the nature of the issue. 

Following command can be used to find the various subsystems under jserver process

pw debug list -p jserver

On enabling debug, the log messages will be captured in <BPPM_HOME>\pw\prontp\logs\debug\jserver.log

The debug should be turned off after collecting the logs as this is an expensive operation.

Resolutions for common issues

Slow performance in TrueSight Infrastructure Management

If you are seeing poor GUI performance and jserver is consuming high levels of CPU when attempting to navigate the tool.

Slowness can have many causes, so Support will ask for hardware specifications as well as load on the application, and if there are any other outside factors (patching,other applications on the box, network connectivity issues, etc). The goal is to get an overall view of your environment to help narrow down the focus of the search.

While there may be a variety of factors, one way to clear up slowness is to enable lazyloading. This is a property which can be set in order to allow more effective loading of objects in the console.

 Certain performance problems can be observed within the navigation tree of the Ops Console. Examples of this could be:

  • Navigation tree does not load at all
  • Navigation tree only partially loads
  • Navigation tree takes a long time to load
  • Operations Console becomes inaccessible and the number of https processes increases

Setting the lazy loading parameter to true can help to resolve these problems.  The performance problems seen in the navigation tree can be caused by a number of things such as:

  • High number of events
  • High number of dynamic collectors
  • High number of CIs in the service model
  • Depth of service model
  • Large number of component folders
  • Large number of event folders

 

We recommend that you uncheck those items in the navigation tree preferences that you are not interested in. For example, if you are interested in events only, then uncheck the boxes related to service model and folders.

This is because of the way the navigation tree works, in that data is loaded in the order of top to down. So, it will be event collectors first, then groups, then service model CIs, then component folders, then event folders. All of this can take time as they have to be loaded from various sources such as cell, jserver, backend database, and it can result in a bottleneck.

The lazyloading option says to load only the top-level elements. Then as soon as a user wishes to expand that top-level it will fetch the data at that point. That is why traversing from top-level down can seem slower, but it is greatly reducing the load on the jserver.

 To turn the lazyloading option on, add the following property to the pw/custom/conf/pronet.conf file:

    pronet.navtree.lazyloading=true

Then reload the jserver properties with command:

    pw jproperties reload

Then monitor the performance of the TrueSight Infrastructure Management Operations Console.

If the performance continues to be an issue, there is another property to set which can also help with TrueSight Infrastructure Management slowness, the nearcache property.

Add the following properties to the pw/custom/conf/pronet.conf

pronet.hotrod.client.rate.custom.nearcache.enable=false
pronet.hotrod.client.jserver.custom.nearcache.enable=true
pronet.tsim.console.data.fetch.optimized=false

Then restart the TrueSight Infrastructure Management server (pw system start). These properties will help to solve performance issues in the Operations Console and Administration Console and these properties are set by default in TrueSight 11.3.01

If the issue remains after enabling the properties above, collect the pw dump 1 output and send the details to Support for further assistance.

TrueSight Infrastructure Management component stuck in the “Initializing” state

While there may be many factors involved in this type of situation, it is always best to send the logs to Support to review them for other possible causes of the issue.

Due to the elastic search errors below the component was unable to communicate (sync) with TSPS which caused it to fail.

INFO 06/21 09:32:53.393 [EventMsg_28] TsimAudit Failed to insert events to ES
java.util.concurrent.ExecutionException: RemoteTransportException[[ZbkOM4q][127.0.0.1:9300][cluster:admin/ingest/pipeline/put]];
nested: ScriptException[compile error]; nested: IllegalArgumentException[Scripts may be no longer than 16384 characters. The passed in
script is 21057 characters. Consider using a plugin if a script longer than this length is a requirement.];

This issue with elastic search is fixed in TrueSight 11.3.01. A workaround is to delete the elastic search data folder and restart the Presentation Server. The elastic search folder can be found here: <TSPS_HOME>/modules/elasticsearch/

Delete the entire folder as it will be recreated upon restart of the Presentation Server.

If clearing the elastic search folder does not allow the TrueSight Infrastructure Management component to initialize, gather up the output of the tssh dump export command and send it to Support for further review.


Question

Why TrueSight Infrastructure Management  Java processes establish connections with the remote cell? How do I stop this from happening?

Answer

This is as per product design, To stop it from happening

  1. Create a separate <process>.dir  files for the TSIM
    Rate, Agent Controller, Local Agent and Jserver processes that
    are establishing connection with remote cells. Make sure you do not add
    any remote cell entry in <process>.dir
  2. Amend respective pw\custom\conf\<process>.conf file to read newly created <process>.dir  file.
  3. Once changes are made restart process and check result.

For example:-  To stop 'Rate' process from establishing connection with remote cell(s),

  • pnrate.dir is created without any remote cell entry in it.
  • pnrate.conf is modified so that it would read newly defined pnrate.dir 

Add the following line under the Options section of the pw\custom\conf\pnrate.conf file:
Option=DCELLCONFDATAPATH= <path to the pnrate.dir file>
Option=DCUSTOM_MCELL_DIR_FILE_NAME=pnrate.dir

Similarly to stop 'Agent Controller' process from establishing multiple connection with remote cell(s),


  • pnagentcntl.dir is created without any remote cell entry in it.
  • pnagentcntl.conf is modified so that it would read newly defined pnagentcntl.dir 

Add the following line under the Options section of the pw\custom\conf\pnagentcntl.conf file:
Option=DCELLCONFDATAPATH= <path to the pnagentcntl.dir file>
Option=DCUSTOM_MCELL_DIR_FILE_NAME=pnagentcntl.dir

Similarly to stop 'Local Agent' process from establishing multiple connection with remote cell(s),


  • pnagent.dir is created without any remote cell entry in it.
  • pnagent.conf is modified so that it would read newly defined pnagent.dir 

Add the following line under the Options section of the pw\custom\conf\pnagent.conf file:
Option=DCELLCONFDATAPATH= <path to the pnagent.dir file>
Option=DCUSTOM_MCELL_DIR_FILE_NAME=pnagent.dir

Similarly to stop 'Jserver' process from establishing multiple connection with remote cell(s),


  • pnjserver.dir is created without any remote cell entry in it.
  • pnjserver.conf is modified so that it would read newly defined pnjserver.dir


Add the following line under the Options section of the pw\custom\conf\pnjserver.conf file:
Option=DCELLCONFDATAPATH= <path to the pnjserver.dir file>
Option=DCUSTOM_MCELL_DIR_FILE_NAME=pnjserver.dir

After above changes restart process. (We recommend to perform 'pw system start' instead of individual process restart)


NOTE
Removing remote cell(s) from the pnjserver.dir file will result in not being able to connect to the remote cell(s) under 'Other Cells'
drawer in the operations console. If it is still required to connect to those cells then consider not having Jserver configured to use a
pnjserver.dir file.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*