Troubleshooting ARServer Thread Tuning and Queue Wait Time issues

When you observe performance issues of AR System server and the Queue Wait Time values in the API log are high, use the following information to either troubleshoot and resolve the issues or create a BMC Support case.

Symptoms

When the Queue Wait Time is high, one or more of the following symptoms might occur because the server might be using all of the available threads assigned to a specific Remote Procedure Call (RPC) queue:

Clients report timeout errors, typically ARERROR 93.
Poor performance is observed for a private queue.

Intermittent general poor performance.

Scope

Multiple users are affected. However, if the problem is related to a private RPC queue used by a specific client, the problems are limited to components using that queue.
The problem might be intermittent but occurs more frequently during periods of high activity such as the start of the working day when many users begin to use the system.

Diagnosing and reporting an issue

Step	Task	Description
1	Understand RPC queues and threads	The AR System server is a multi-threaded application where a configurable number of threads might be grouped to service a particular RPC queue. There are several default queues, including fast and list, and private queues which might also be defined. Each API call has a default RPC queue which will be assigned unless it is configured to use a private queue by the client or a server restriction. After an API call is assigned to a thread, it has exclusive use of that thread till completion of the service. For example, consider an airport check-in process. Upon arrival at the terminal, you (the API call) are directed (by a dispatcher) to a collection of check-in desks for different types of ticket (RPC queues). There are multiple check-in desks (threads) for each ticket type (RPC queue) and you are placed in a line for the appropriate ticket by the dispatcher. You wait in line until a check-in desk (thread) is available and you are then processed and check-in is completed. Delays might occur if there are not enough check-in desks for a type of ticket or the check-in process takes a long time for each passenger. The capacity of the system (the rate at which passengers are checked-in) might be increased in several different ways: Adding additional check-in desks (threads) for a ticket type (RPC queue) Adding additional lines with dedicated check-in desks (private RPC queues and threads) Reducing the time taken for each check-in (workflow or SQL tuning) If your system is suffering from a lack of threads or other issues which are causing threads to be consumed for extended periods, this will be recorded as the queue wait time at the end of each API line in the AR API logs. The // :q: value is the time the API call waited between being received by the server and assigned to a thread. +GLEWF ARGetListEntryWithFields -- schema HPD:WorkLog from Mid-tier (protocol 23) at IP address 192.168.1.1 using RPC // :q:0.3s
2	Collect Logs	Enable server API logging during the time the problem occurs. If your system has exception logging enabled, any API calls which time out or exceed the configured threshold are recorded in the arexception.log file. AR thread logging shows the RPC queues and threads as the server creates them. If the exception logging is enabled during startup, the arthread.log records entries such as: <THRD> <0000000042> /* Wed Sep 11 2019 16:31:25.0350 / Thread Id 381(thread number 18) on FAST queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0350 / Thread Id 382(thread number 19) on FAST queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0350 / Thread Id 383(thread number 20) on FAST queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0360 / Thread Id 384(thread number 1) on PRV:390626 queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0380 / Thread Id 385(thread number 2) on PRV:390626 queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0390 / Thread Id 386(thread number 3) on PRV:390626 queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0390 / Thread Id 387(thread number 4) on PRV:390626 queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0390 / Thread Id 388(thread number 5) on PRV:390626 queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0390 / Thread Id 389(thread number 1) on LIST queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0390 / Thread Id 390(thread number 2) on LIST queue started. <THRD> <0000000042> / Wed Sep 11 2019 16:31:25.0400 / Thread Id 391(thread number 3) on LIST queue started. As new threads are created after startup, they are recorded in arthread.log*. This information may be useful when tuning server performance or trying to optimize the number of threads assigned to a queue. The log may also be helpful in troubleshooting. For example, if multiple new threads are logged (up to the configured maximum for a queue) in a short period of time, this suggests that there is a sudden increase in activity or that there are delays in completing API calls assigned to that queue, causing the server to allocate additional threads.
3	Review Logs	Use the AR System Log Analyzer utility to review API logs and show a list of the API calls with the longest queue wait times. Small (less than a second) values are not generally a concern but further investigation is recommended if: Queued time is very long (more than a few seconds) Queued time is large relative to the API Execution Time For more information, see Analyzing-AR-System-Log-Analyzer-output.
4	Determine the problem type	There are two main causes of large queue wait times: Not enough threads to handle the volume of API calls: This scenario occurs when a server receives API calls for an RPC queue at a rate greater than its processing capacity. If the API calls are running as expected, this may be a short term issue that does not require action but should be monitored to ensure that it does not occur frequently. For example, onboarding a new customer may cause an increase in the concurrent usage and lead to increase in the requirement of number of threads for fast and list queues to maintain performance. If you determine that additional threads may be required use the steps in this table to configure them. Long running API calls consuming threads: Each API call has exclusive use of a thread until it completes a task. If there are any issues that cause an API to run for an extended time, all of the threads for that RPC queue will be occupied at the same time. In this case the number of threads is not the main concern. Even when you increase the maximum available threads, it may provide a temporary solution. This increase may simply result in more threads getting into the same state. The focus here should be to identify and resolve the cause of the API delays.
5	Review your configuration	Compare your current RPC queue configuration with respect to the current checklist from BMC engineering. For more information, see Configuration checklist for AR System.
6	Adjust thread configuration	If you need to change the number of threads assigned to different RPC queues, use the Private-RPC-Socket settings for each server. These settings can be set by using the AR System Administration: AR System Configuration Generic UI form or the AR System Administration Console: Server Information: Ports and Queues tab. Each RPC queue definition includes a queue number, minimum and maximum threads values. The format of the setting is: Private-RPC-Socket: <queue_number> <min_threads> <max_threads> Private queues might be defined in the ranges between: 390621 – 390634 390636 – 390669 390680 – 390694 On startup, the server will create an RPC queue with the minimum configured number of threads. If there is sufficient load, the server will add maximum additional threads. Changes to the minimum and maximum number of thread values for a queue become effective immediately and do not require a restart.
7	Find a solution	Use the following table to troubleshoot specific problems with Queue Wait Times If the cause has not been found or no solution is available, proceed to the next step to gather logs and create a Support Case.
8	Creating a BMC Support Case	Collect and send logs and detailed information when creating a case with BMC Support: Provide the following information as part of your case: Run the log zipper on each affected indexer server. Select the option Zip Logs, (not Zip All Logs). If you are providing the log zipper file from multiple servers, rename the zip file to include the server name as part of the filename. Attach the zip file to your case (up to 2 GB) or transfer the files to BMC using FTP. For more information, see Steps to send logs, files, screenshots, etc to BMC Support for a Remedy Product related case.

Error messages and resolution

Issue	Where	Description	Resolution	Reference
High // :q: values but short execution time for related API calls.	API log lines in arapi.log or arexception.log	If you observe high // :q: values for threads in a particular RPC queue but none of the API calls are long running. This indicates that there are not enough threads configured for the queue. It is possible that there may be periods when activity on the system is very high which leads to increased queue times. A balance must be found between the number of threads and the acceptable delay for API calls. For example, delays of one or two seconds during peak load time may be acceptable if they do not adversely impact client activity. Extended periods of delay, or very high delays, may warrant an increase in the number of threads for that RPC queue.	Increase the maximum number of threads for the related RPC queue: If you see increased queue wait time values for a particular RPC queue over an extended period, or the delays at peak times are enough to cause client errors, then add additional processing capacity by adding new threads. Segregate clients using private RPC queues: If there is a significant volume of API calls from one type of client then creating a new RPC queue dedicated to this activity may help. For example, an integration that sends lots of calls in a short period might swamp an RPC queue and prevent user API calls from being processed in a timely manner. Defining a private RPC queue for this integration separates the activity from normal users and prevents delays.	Analyzing AR System Log Analyzer output
High // :q: values and long execution times for related API calls.	API log lines in arapi.log or arexception.log	If individual API calls take a long time to complete and there are multiple, concurrent, similar calls, all of the threads in an RPC queue may be consumed due to the delay. Possible causes include: Delays in authentication due to AREA problems (typically seen on fast queue threads). Filters making WebServices calls which take time to complete. Poorly qualified SQL searches leading to table scans which take a long time on large forms. Database problems such as blocking, deadlocks LINK or missing/sub-optimal indices.	Appropriate action in this case is to increase in the number of threads. The next step should be to review the longest running API calls assigned to the RPC queue to determine the cause of the delays. Enable server API, SQL and FILTER logging when the problem is experienced. Use tools such as the AR System Log Analyzer to identify the longest running API calls and review them to find the cause of the delays. Remember that the long running calls may not always show long queued times as they cause the problem for other calls once they are being processed. After the reason for the poor API performance has been identified and addressed, performance should improve without requiring changes to the RPC queue and thread configuration. If it is possible to isolate the problem API calls to a particular client, create a private RPC queue to dedicate to that activity to minimize the impact on other users.	Troubleshooting-database-performance-issues Analyzing AR System Log Analyzer output
Error encountered while executing a Web Service.	ARException: Web Service; java.net.SocketTimeoutException:Read timed out	Sometimes the Level 2 (L2) incidents created in the ITSM workflow do not populate the CI information. Possible causes include: If the threads are processing the incident creation request and if a CMDB API call is also added to the queue, the CMDB API call request will not be processed due to unavailability of threads. A timeout error occurs and the incident workflow process stops. As the default setting for number of FAST threads is low, the API calls pile-up and cause a timeout error.	Increase the number of FAST threads to at least 60.

Troubleshooting ARServer Thread Tuning and Queue Wait Time issues

Symptoms

Scope

Diagnosing and reporting an issue

Error messages and resolution

On this page