Troubleshooting ARServer Thread Tuning and Queue Wait Time issues
Symptoms
When the Queue Wait Time is high, one or more of the following symptoms might occur because the server might be using all of the available threads assigned to a specific Remote Procedure Call (RPC) queue:
- Clients report timeout errors, typically ARERROR 93.
- Poor performance is observed for a private queue.
- Intermittent general poor performance.
Scope
- Multiple users are affected. However, if the problem is related to a private RPC queue used by a specific client, the problems are limited to components using that queue.
- The problem might be intermittent but occurs more frequently during periods of high activity such as the start of the working day when many users begin to use the system.
Diagnosing and reporting an issue
Step | Task | Description |
---|---|---|
1 | Understand RPC queues and threads | The AR System server is a multi-threaded application where a configurable number of threads might be grouped to service a particular RPC queue. There are several default queues, including fast and list, and private queues which might also be defined. Each API call has a default RPC queue which will be assigned unless it is configured to use a private queue by the client or a server restriction. After an API call is assigned to a thread, it has exclusive use of that thread till completion of the service. For example, consider an airport check-in process. Upon arrival at the terminal, you (the API call) are directed (by a dispatcher) to a collection of check-in desks for different types of ticket (RPC queues). There are multiple check-in desks (threads) for each ticket type (RPC queue) and you are placed in a line for the appropriate ticket by the dispatcher. You wait in line until a check-in desk (thread) is available and you are then processed and check-in is completed. Delays might occur if there are not enough check-in desks for a type of ticket or the check-in process takes a long time for each passenger. The capacity of the system (the rate at which passengers are checked-in) might be increased in several different ways:
If your system is suffering from a lack of threads or other issues which are causing threads to be consumed for extended periods, this will be recorded as the queue wait time at the end of each API line in the AR API logs. The // :q: value is the time the API call waited between being received by the server and assigned to a thread. +GLEWF ARGetListEntryWithFields -- schema HPD:WorkLog from Mid-tier (protocol 23) at IP address 192.168.1.1 using RPC // :q:0.3s |
2 | Collect Logs | Enable server API logging during the time the problem occurs. If your system has exception logging enabled, any API calls which time out or exceed the configured threshold are recorded in the arexception.log file. AR thread logging shows the RPC queues and threads as the server creates them. If the exception logging is enabled during startup, the arthread.log records entries such as: <THRD> <0000000042> /* Wed Sep 11 2019 16:31:25.0350 */ Thread Id 381(thread number 18) on FAST queue started. As new threads are created after startup, they are recorded in arthread.log. This information may be useful when tuning server performance or trying to optimize the number of threads assigned to a queue. The log may also be helpful in troubleshooting. For example, if multiple new threads are logged (up to the configured maximum for a queue) in a short period of time, this suggests that there is a sudden increase in activity or that there are delays in completing API calls assigned to that queue, causing the server to allocate additional threads. |
3 | Review Logs | Use the AR System Log Analyzer utility to review API logs and show a list of the API calls with the longest queue wait times. Small (less than a second) values are not generally a concern but further investigation is recommended if:
|
4 | Determine the problem type | There are two main causes of large queue wait times:
|
5 | Review your configuration | Compare your current RPC queue configuration with respect to the current checklist from BMC engineering. For more information, see Configuration checklist for AR System. |
6 | Adjust thread configuration | If you need to change the number of threads assigned to different RPC queues, use the Private-RPC-Socket settings for each server. These settings can be set by using the AR System Administration: AR System Configuration Generic UI form or the AR System Administration Console: Server Information: Ports and Queues tab. Each RPC queue definition includes a queue number, minimum and maximum threads values. Private-RPC-Socket: <queue_number> <min_threads> <max_threads> Private queues might be defined in the ranges between:
On startup, the server will create an RPC queue with the minimum configured number of threads. If there is sufficient load, the server will add maximum additional threads. Changes to the minimum and maximum number of thread values for a queue become effective immediately and do not require a restart. |
7 | Find a solution |
|
8 | Creating a BMC Support Case | Collect and send logs and detailed information when creating a case with BMC Support: Provide the following information as part of your case:
|
Error messages and resolution
Issue | Where | Description | Resolution | Reference |
---|---|---|---|---|
High // :q: values but short execution time for related API calls. | API log lines in arapi.log or arexception.log | If you observe high // :q: values for threads in a particular RPC queue but none of the API calls are long running. This indicates that there are not enough threads configured for the queue. It is possible that there may be periods when activity on the system is very high which leads to increased queue times. A balance must be found between the number of threads and the acceptable delay for API calls. For example, delays of one or two seconds during peak load time may be acceptable if they do not adversely impact client activity. Extended periods of delay, or very high delays, may warrant an increase in the number of threads for that RPC queue. |
| |
High // :q: values and long execution times for related API calls. | API log lines in arapi.log or arexception.log | If individual API calls take a long time to complete and there are multiple, concurrent, similar calls, all of the threads in an RPC queue may be consumed due to the delay. Possible causes include:
| Appropriate action in this case is to increase in the number of threads. The next step should be to review the longest running API calls assigned to the RPC queue to determine the cause of the delays.
After the reason for the poor API performance has been identified and addressed, performance should improve without requiring changes to the RPC queue and thread configuration. If it is possible to isolate the problem API calls to a particular client, create a private RPC queue to dedicate to that activity to minimize the impact on other users. | |
Error encountered while executing a Web Service. | ARException: Web Service; java.net.SocketTimeoutException:Read timed out | Sometimes the Level 2 (L2) incidents created in the ITSM workflow do not populate the CI information. Possible causes include: If the threads are processing the incident creation request and if a CMDB API call is also added to the queue, the CMDB API call request will not be processed due to unavailability of threads. A timeout error occurs and the incident workflow process stops. As the default setting for number of FAST threads is low, the API calls pile-up and cause a timeout error. | Increase the number of FAST threads to at least 60. |