Troubleshooting Database Blocking Issues
Symptoms
- API calls take a very long time to complete.
- The SQLs called by the API take a long time due to database blocking.
- Often, an INSERT, UPDATE, or DELETE statement do not work due to database blocking.
Scope
- One or more users can be affected.
- When blocking occurs, an AR System server thread becomes tied up. If many threads are tied up due to blocking, the entire AR System server can stop responding.
- One AR System server can cause blocking on other servers in the group since they share a database. So the problem may affect multiple or all servers in a server group.
Diagnosing and reporting an issue
Step | Task | Description |
---|---|---|
1 | Understand database blocking | The first step is to understand the symptom:
Often, a database blocking issue will be observed as an UPDATE statement that takes a very long time or never ends. An UPDATE statement in AR System is expected to be very fast since it uses a WHERE clause with the C1 column, which is a unique index. The reason of slowness is because some other transaction has the same row, page, or table locked open and is not committing the transaction. A few possible reasons for database blocking are:
|
2 | Stop the blocking | It is difficult to troubleshoot a blocking condition that occurred in the past. If the database administrator (DBA) had some database monitoring enabled, you may be able to use that to gather some information. If you had AR System server SQL/API/Filter logging enabled on all of your servers, perform the following steps:
MSSQL: You can use the Activity Monitor to both identify a blocking operation and the blocked operation. Once you identify the blocker, you can right-click on it and kill the process: You can right-click a row and choose 'Details' to see the exact SQL that is running in that session. Oracle: The Active Session History (ASH) report is often used to identify a blocking session. Other tools are mentioned in step 4. Here is a example of the ASH report: If it is determined that this is the session that should be killed, you can use the first 2 values, '152' and '29690' in the following SQL*PLUS command: ALTER SYSTEM KILL SESSION 'sid,serial#; For example: ALTER SYSTEM KILL SESSION '152,29690' IMMEDIATE; Note: When killing a SQL connection, the current AR System server transaction will abort and the activity will have to be retried. |
3 | Enable AR System Logging |
|
4 | Other AR System Monitoring Tools |
|
5 | Enable database monitoring |
To actively monitor for database blocking: MSSQL: You can use the Activity Monitor as mentioned in step 2. However, there are many 3rd party tools available as well; for example, 'sp_whoisactive' from http://whoisactive.com/. Here is an example output. You can see that it makes it easy to identify the blocking and blocker SQLs: Oracle: As mentioned, you can use the ASH report to identify a blocking session. There are various other tools available to the DBA. These provide the sql text of both the blocking and blocked sessions. This may make it easier to identify the correct session to kill. |
6 | Reproduce the problem or wait for it to occur | Usually, blocking conditions do not occur often enough or with enough consistency to predict when they will occur. If you can, reproduce the blocking condition. Else, wait until it occurs again. |
7 | Collect and analyze diagnostics |
|
8 | Identify the Blocked SQL |
MSSQL: Running sp_who2 or other related Stored Procedures may help. If you were able to determine the host that is causing the blocking as well as the 'blocking' SQL, proceed to step 10, else perform the following steps:
|
9 | Identify the "Blocking" SQL | Identify the blocking SQL with the AR System logs since this will help understand the cause. By using the AR Log Analyzer or any other text editor, look through the logs for an API call that UPDATEs the same table at a time equal or earlier than the Blocked SQL.
Note: It is possible that there are more than one Blocked API on multiple servers. It is important to look at logs from all the servers to find the first API call at the correct timeframe that UPDATED the table in question. |
10 | Identify why the SQL was blocking | After you have found the Blocking API call and SQL, you must understand why it ran for so long.
|
11 | Find a solution | If the cause has not been found or no solution is available, proceed to the next step to gather logs and create a Support Case. Use the table below to troubleshoot specific problems with blocking. |
12 | Run support assistance tool | If you're still experiencing issues after following the steps or if you are unable to complete any of the steps above, Viewing and collecting logs by using the Support Assistant tool. You must run the utility on each server that you enabled logs on. Rename the zipped file to contain the server name. BMC Support will request for the logs when you submit a case on database blocking. |
13 | Create a support ticket |
|
Error messages and resolution
Symptom | Where | Resolution |
---|---|---|
A SQL was not called by the AR System server but was called from an external source. | Database Report | Identify why it was run and try to prevent it in future. |
A valid API call takes a long time | AR System Log Analyzer | Make sure that this API does not run during normal business hours. |
An API is running abnormally | AR System Log Analyzer | Create a support case. |