Troubleshooting Database Blocking Issues
Symptoms
- API calls take a very long time to complete.
- The SQLs called by the API take a long time due to database blocking.
- Often, an INSERT, UPDATE, or DELETE statement do not work due to database blocking.
Scope
- One or more users can be affected. When blocking occurs, an AR Server thread becomes tied up. If many threads are tied up due to blocking, the entire AR Server can stop responding.
- One ARServer can cause blocking on other servers in the group since they share a database. So the problem may affect multiple or all servers in a server group.
Diagnosing and reporting an issue
Step | Task | Description |
1 | Understand Database Blocking | The first step is to understand the symptom:
Often, a database blocking issue will be observed as an UPDATE statement that takes a very long time or never ends. An UPDATE statement in Remedy is expected to be very fast since it uses a WHERE clause with the C1 column, which is a unique index. The reason of slowness is because some other transaction has the same row, page, or table locked open and is not committing the transaction. A few possible reasons for database blocking are:
|
2 | Stop the blocking | It is difficult to troubleshoot a blocking condition that occurred in the past. If the database administrator (DBA) had some database monitoring enabled, you may be able to use that to gather some information. If you had AR Server SQL/API/Filter logging enabled on all of your servers, use the following steps:
MSSQL: You can use the Activity Monitor to both identify a blocking operation and the blocked operation. Once you identify the blocker, you can right-click on it and kill the process: You can right click on a row and choose 'Details' to see the exact SQL that is running in that session. Oracle: The ASH report is often used to identify a blocking session. Other tools are mentioned in step 4. The ASH report will display something like this: If it is determined that this is the session that should be killed, you can you the first 2 values, '152' and '29690' in the following SQL*PLUS command: ALTER SYSTEM KILL SESSION 'sid,serial#; For example: ALTER SYSTEM KILL SESSION '152,29690' IMMEDIATE; Note: When killing a SQL connection, the current AR Server transaction will abort and the activity will have to be retried. |
3 | Enable AR System Logging |
|
4 | Other AR Monitoring Tools |
|
5 | Enable Database Monitoring |
To actively monitor for database blocking: MSSQL: You can use the Activity Monitor as mentioned in step 2. There are many 3rd party tools available as well. One such tool is 'sp_whoisactive' from http://whoisactive.com/. Here is an example output. You can see that it makes it easy to identify the blocking and blocker SQLs: Oracle: As mentioned, you can use the ASH report to identify a blocking session. There are various other tools available to the DBA. One of many is a set of SQL statements at this topic. These provide the sql text of both the blocking and blocked sessions. This may make it easier to identify the correct session to kill. |
6 | Reproduce the problem or wait for it to occur | Usually, blocking conditions do not occur often enough or with enough consistency to predict when they will occur. If you can reproduce the blocking condition, do that now. Otherwise, wait until it occurs on its own. |
7 | Collect and Analyze Diagnostics |
|
8 | Identify the "Blocked" SQL |
Oracle: the ASH report may be a good place to start. MSSQL: Running sp_who2 or other related Stored Procedures may help. If you were able to determine the host that is causing the blocking as well as the 'blocking' SQL, proceed to step 10, otherwise:
|
9 | Identify the "Blocking" SQL | Identify the blocking SQL with the AR System logs since this will help understand the cause. Using AR Log Analyzer or a better text editor, look through the logs for an API call that UPDATEs the same table at a time equal or earlier than the ‘Blocked” SQL from the prior step.
Note: It is possible that there are more than one Blocked API on multiple servers. It is important to look at all the servers' logs to find the first API call at the correct timeframe that UPDATED the table in question. |
10 | Identify why the SQL was blocking | Once you have found the “Blocking” API call and SQL, you need to understand why it ran for so long.
|
11 | Find a solution | a. If the cause has not been found or no solution is available, proceed to the next step to gather logs and create a Support Case. b. Use the table below to troubleshoot specific problems with blocking. |
12 | Run log zipper | If you're still experiencing issues after following the steps or if you are unable to complete any of the steps above, run the log zipper utility. These logs will be requested by BMC Support when submitting a case on database blocking. You will need to run it on each server that you enabled logs on. Rename the zipped file to contain the server name. |
13 | Create a support ticket | Collect and send output of log zipper and detailed information when creating a case with BMC Support:
|
Error messages and resolution
Symptom | Where | Resolution |
---|---|---|
A SQL was not called by the AR System server but was called from an external source. | Database Report | Identify why it was run and try to prevent it in the future |
A valid API call takes a long time | ARLogAnalyzer | Ensure that this API does not run during normal business hours. |
An API running abnormally | ARLogAnalyzer | Create a support case. |