Default language.

Troubleshooting Database Blocking Issues


When you observe performance issues due to blocking issues at the database, use this information to either perform the appropriate logging and troubleshooting steps to resolve the problem or create a BMC Support case

Related topics

Symptoms

  • API calls take a very long time to complete. 
  • The SQLs called by the API take a long time due to database blocking.  
  • Often, an INSERT, UPDATE, or DELETE statement do not work due to database blocking.

Scope

  • One or more users can be affected. When blocking occurs, an AR Server thread becomes tied up. If many threads are tied up due to blocking, the entire AR Server can stop responding.
  • One ARServer can cause blocking on other servers in the group since they share a database. So the problem may affect multiple or all servers in a server group.

Diagnosing and reporting an issue


Step

Task

Description

1

Understand Database Blocking

The first step is to understand the symptom:

  • How was it observed? 
  • What do the AR Server or Database logs show?  

Often, a database blocking issue will be observed as an UPDATE statement that takes a very long time or never ends. An UPDATE statement in Remedy is expected to be very fast since it uses a WHERE clause with the C1 column, which is a unique index. The reason of slowness is because some other transaction has the same row, page, or table locked open and is not committing the transaction.  

A few possible reasons for database blocking are:

  1. Another transaction is updating the same row, page, or table and is taking a long time due to the number of records or amount of data that it is updating.
    This is not common since Remedy usually updates quickly.
  2. A bulk transaction has this row, page, or table locked as one operation, but other operations are not yet complete so the entire transaction has not been committed.
  3. A Remedy transaction that has the row, page, or table locked is stuck in a mutex lock or other programmatic loop or hang that is not allowing it to complete and commit the transaction.

2

Stop the blocking

It is difficult to troubleshoot a blocking condition that occurred in the past. If the database administrator (DBA) had some database monitoring enabled, you may be able to use that to gather some information. If you had AR Server SQL/API/Filter logging enabled on all of your servers, use the following steps:

  • If you need to stop a currently active blocking condition, consult the DBA to identify and break the SQL or the process that is causing the blocking. Within the AR Server, you cannot stop a single SQL or database connection.
  • Once the blocking has been stopped, the DBA can try to identify the 'blocked' and 'blocking' SQLs.
  • To identify and stop the blocking, note the SQL statements and the host that caused the blocking. We can use this later to identify the cause.

MSSQL:  You can use the Activity Monitor to both identify a blocking operation and the blocked operation. Once you identify the blocker, you can right-click on it and kill the process:

DB1.png

You can right click on a row and choose 'Details' to see the exact SQL that is running in that session.

Oracle: The ASH report is often used to identify a blocking session. Other tools are mentioned in step 4. The ASH report will display something like this:

DB2.png

If it is determined that this is the session that should be killed, you can you the first 2 values, '152' and '29690'  in the following SQL*PLUS command:

ALTER SYSTEM KILL SESSION 'sid,serial#;

For example:

ALTER SYSTEM KILL SESSION '152,29690' IMMEDIATE;

Note: When killing a SQL connection, the current AR Server transaction will abort and the activity will have to be retried.

3

Enable AR System Logging

  • In order to get the best information about the blocking and to help identify a root cause, enable the SQL, API, and Filter logging on every server in the server group.
  • If we are able to determine the AR Server host that is causing the blocking from the database information, you can simply focus on that server's logs.
    To enable logs on multiple servers in a server group, use the Server Group Logging option from the Server Group Admin Console.
  • Because it may take some time for the blocking condition to occur, set the log size to be relatively high such as 500 MB or higher. You can combine the logs into a single file, but this will cause the logs to grow quickly, meaning the total time of logs gathered will be shorter. 
  • Ensure that you have at least 8 backups set for log history. Also, ensure that you have enough disk space to contain all the logs.

4

Other AR Monitoring Tools

  • Another valuable tool is the "Longest Running APIs and SQLs" data. You can configure AR Server to record all API and SQL statements that exceed a specified threshold and write all of these to the arexception.log
    For example, for blocking conditions that last over 60 seconds, you know that the 'blocker' will be at least that long or unfinished. See this topic for information on how to set this feature up. 
  • If the blocker SQL takes a long time but completes, you will see it recorded in the arexception.log. If the blocker has never completed and is still blocking, you can dump the 'Pending API' and 'Pending SQL' operations to "Server Statistics: Longest APIs" and Server Statistics: Longest SQLs" forms, respectively. You can view the data in these forms to identify SQL statements (and the API call that called the SQL) that are the likely ones to be the 'blocker'.

5

Enable Database Monitoring

  • The DBA can enable database monitoring to try to report when a blocking condition occurs and log the blocking and blocked SQLs. There are various tools for doing this and this is outside the scope of this document.

To actively monitor for database blocking:

MSSQL:  You can use the Activity Monitor as mentioned in step 2. There are many 3rd party tools available as well. One such tool is 'sp_whoisactive' from http://whoisactive.com/.

Here is an example output. You can see that it makes it easy to identify the blocking and blocker SQLs:

DB3.png

Oracle: As mentioned, you can use the ASH report to identify a blocking session. There are various other tools available to the DBA. One of many is a set of SQL statements at this topic. These provide the sql text of both the blocking and blocked sessions. This may make it easier to identify the correct session to kill.

6

Reproduce the problem or wait for it to occur

Usually, blocking conditions do not occur often enough or with enough consistency to predict when they will occur.  

If you can reproduce the blocking condition, do that now. Otherwise, wait until it occurs on its own.

7

Collect and Analyze Diagnostics

  1. Collect the SQL/API/Filter logs from every server in the server group.  The logs should include the timeframe from a time well before the problem was observed until well after (At least 30 minutes on either side) if possible.
  2. Collect the arexception.log, arerror.log, and ardebug.log from each server in the group.
  3. Collect any database information that the DBA can provide:
    1. MSSQL:  This will be dependent on which tools the DBA is using. At a minimum, run the "All Blocking Transactions" report and save as a PDF. You can run this report in SQL Server Studio by right-clicking on the AR System database and choosing it from the standard reports.
    2. Oracle:  ASH, ADDM, and AWR reports are all useful. The ASH report is often used for blocking issues.  Ensure that these reports are created for the proper time that includes the blocking timeframe. The ASH report will provide the SID and serial# of the blocking session. The ADDM report will provide the SQLID, SID, and Serial# for the blocking session.

8

Identify the "Blocked" SQL

  • Use database tools or AR System Log Analyzer to identify the blocked and blocking SQL. See step 2 in this table. It is important to identify the 'Blocked' SQL, the 'Blocked By' or 'Blocking' SQL, and the hostname of the blocker.
  • This information will help identify which logs are set to analyze (the blocking host) and which SQLs to look for.

Oracle:  the ASH report may be a good place to start.

MSSQL:  Running sp_who2 or other related Stored Procedures may help.

If you were able to determine the host that is causing the blocking as well as the 'blocking' SQL, proceed to step 10, otherwise:


    1. Start with the server where the problem was observed.  You will need to perform these steps on every server in the group.
    2. You should see an extremely long or unfinished (EXCEPTION) API call that contains the potential ‘blocked’ SQL.
    3. The SQLs are often UPDATEs, INSERTs, or DELETEs.  Therefore, often the APIs are CE, SE, ME, DE, SVE, SGE.
  • Once you have identified the “blocked” SQL,  note the tablename and the timestamp.

9

Identify the "Blocking" SQL

Identify the blocking SQL with the AR System logs since this will help understand the cause.

Using AR Log Analyzer or a better text editor, look through the logs for an API call that UPDATEs the same table at a time equal or earlier than the ‘Blocked” SQL from the prior step.

  • This API should also be very long or unfinished (EXCEPTION). 
  • You can use the results of AR Log Analyzer to identify the ThreadID, Timestamp, and SQL text for the blocking SQL. If there are multiple results, you want to find the earliest one that is in the same time frame as the blocking.
  • The blocker will start before all of the blocked SQLs start and will end just before all of the blocked SQLs end. Once the block is freed, the blocked SQLs can continue quickly.
  • If you see the blocking SQL in AR Log Analyzer, you can investigate the related API call to understand what the entire API is doing.
  • You can also use a text editor to filter the logs by the Thread ID (bookmark and only look at those bookmarked lines) and timestamp to identify all the SQL, API, and Filter lines for that singular API call that causes the blocking.

Note: It is possible that there are more than one Blocked API on multiple servers. 

It is important to look at all the servers' logs to find the first API call at the correct timeframe that UPDATED the table in question.

10

Identify why the SQL was blocking

Once you have found the “Blocking” API call and SQL, you need to understand why it ran for so long.


    1. What type of API was it?
      1. A bulk transaction might simply take a long time. 
      2. Identify the cause of the delay.
    2. A CE, SE, ME, ,SGE, or SVE might update a table but is expected to be fast.
      1. Identify if a specific filter in the API call was taking a long time and caused the transaction to not commit.
      2. An example would be that a plugin was called during filter processing and waited for 10 minutes to timeout while leaving the transaction uncommitted.
      3. Troubleshoot the filter/plugin issue to resolve the blocking.

11

Find a solution

a. If the cause has not been found or no solution is available, proceed to the next step to gather logs and create a Support Case.

b. Use the table below to troubleshoot specific problems with blocking.

12

Run log zipper

If you're still experiencing issues after following the steps or if you are unable to complete any of the steps above, run the log zipper utility. These logs will be requested by BMC Support when submitting a case on database blocking. You will need to run it on each server that you enabled logs on. Rename the zipped file to contain the server name.

13

Create a support ticket

Collect and send output of log zipper and detailed information when creating a case with BMC Support:

  1. Provide the following information as part of your case:
    • Users that experienced the problem.
    • Approximate timestamp that the block was noticed.
    • If identified, provide the SQL statement and hostname of the 'Blocked" and "Blocking" SQLs.
  2. Attach any materials/reports provided by the DBA.
  3. Attach the Log Zipper zip file to your case (up to 2 GB) or transfer the files to BMC using FTP. See Steps to send logs, files, screenshots, etc to BMC Support for a Remedy Product related case.

Error messages and resolution

Symptom

Where

Resolution

A SQL was not called by the AR System server but was called from an external source.

Database Report

Identify why it was run and try to prevent it in the future

A valid API call takes a long time

ARLogAnalyzer

Ensure that this API does not run during normal business hours.

An API running abnormally

ARLogAnalyzer

Create a support case.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*