Troubleshooting Database Blocking Issues


When you observe performance issues due to blocking issues at the database, use this information to enable appropriate logging, troubleshoot and resolve the issues, or create a BMC Support case

Related topics

Symptoms

  • API calls take a very long time to complete. 
  • The SQLs called by the API take a long time due to database blocking.  
  • Often, an INSERT, UPDATE, or DELETE statement do not work due to database blocking.

Scope

  • One or more users can be affected.
  • When blocking occurs, an AR System server thread becomes tied up. If many threads are tied up due to blocking, the entire AR System server can stop responding.
  • One AR System server can cause blocking on other servers in the group since they share a database. So the problem may affect multiple or all servers in a server group.

Diagnosing and reporting an issue


Step

Task

Description

1

Understand database blocking

The first step is to understand the symptom:

  • How was it observed? 
  • What do the AR System server or database logs show?  

Often, a database blocking issue will be observed as an UPDATE statement that takes a very long time or never ends. An UPDATE statement in AR System is expected to be very fast since it uses a WHERE clause with the C1 column, which is a unique index. The reason of slowness is because some other transaction has the same row, page, or table locked open and is not committing the transaction.  

A few possible reasons for database blocking are:

  1. Another transaction is updating the same row, page, or table and is taking a long time due to the number of records or amount of data that it is updating.
    This is not common since AR Systemusually updates quickly.
  2. A bulk transaction has this row, page, or table locked as one operation, but other operations are not yet complete so the entire transaction has not been committed.
  3. A AR System transaction that has the row, page, or table locked is stuck in a synchronization lock or other programmatic loop or hang that is not allowing it to complete and commit the transaction.

2

Stop the blocking

It is difficult to troubleshoot a blocking condition that occurred in the past. If the database administrator (DBA) had some database monitoring enabled, you may be able to use that to gather some information. If you had AR System server SQL/API/Filter logging enabled on all of your servers, perform the following steps:

  • If you need to stop a currently active blocking condition, consult the DBA to identify and break the SQL or the process that is causing the blocking. Within the AR System server, you cannot stop a single SQL or database connection.
  • Once the blocking is stopped, the DBA can try to identify the 'blocked' and 'blocking' SQLs.
  • To identify and stop the blocking, note the SQL statements and the host that caused the blocking. You can use this later to identify the cause.

MSSQL:  You can use the Activity Monitor to both identify a blocking operation and the blocked operation. Once you identify the blocker, you can right-click on it and kill the process:

DB1.png

You can right-click a row and choose 'Details' to see the exact SQL that is running in that session.

Oracle: The Active Session History (ASH) report is often used to identify a blocking session. Other tools are mentioned in step 4. Here is a example of the ASH report:

DB2.png

If it is determined that this is the session that should be killed, you can use the first 2 values, '152' and '29690'  in the following SQL*PLUS command:

ALTER SYSTEM KILL SESSION 'sid,serial#;

For example:

ALTER SYSTEM KILL SESSION '152,29690' IMMEDIATE;

Note: When killing a SQL connection, the current AR System server transaction will abort and the activity will have to be retried.

3

Enable AR System Logging

  • To get the best information about the blocking and to help identify the root cause, enable the SQL, API, and Filter logging on every server in the server group.
  • If we are able to determine the AR System server host that is causing the blocking from the database information, you can simply focus on that server's logs.
    To enable logs on multiple servers in a server group, use the Server Group Logging option from the Server Group Admin Console.
  • Because it may take some time for the blocking condition to occur, set the log size to be relatively high, such as 500 MB or higher.
    You can combine the logs into a single file, but this will cause the logs to grow quickly, that is, the total time of logs gathered will be shorter. 
  • Ensure that you have at least 8 backups set for log history and make sure that you have enough disk space to contain all the logs.

4

Other AR System Monitoring Tools

  • Another valuable tool is the Longest Running APIs and SQLs data. You can configure AR System server to record all API and SQL statements that exceed a specified threshold and write all of these to the arexception.log
    For example, for blocking conditions that last over 60 seconds, you know that the 'blocker' will be at least that long or unfinished. 
    See Managing-logs-for-server-groups for more information. 
  • If the blocker SQL takes a long time but completes, you will see it recorded in the arexception.log. If the blocker has never completed and is still blocking, you can dump the Pending API and Pending SQL operations to Server Statistics: Longest APIs and Server Statistics: Longest SQLs forms, respectively.
    You can view the data in these forms to identify SQL statements (and the API call that called the SQL) that were likely responsible for blocking.

5

Enable database monitoring

  • The DBA can enable database monitoring to report when a blocking condition occurs and log the blocking and blocked SQLs. 

To actively monitor for database blocking:

MSSQL:  You can use the Activity Monitor as mentioned in step 2. However, there are many 3rd party tools available as well; for example, 'sp_whoisactive' from http://whoisactive.com/.

Here is an example output. You can see that it makes it easy to identify the blocking and blocker SQLs:

DB3.png

Oracle: As mentioned, you can use the ASH report to identify a blocking session. There are various other tools available to the DBA. These provide the sql text of both the blocking and blocked sessions. This may make it easier to identify the correct session to kill.

6

Reproduce the problem or wait for it to occur

Usually, blocking conditions do not occur often enough or with enough consistency to predict when they will occur.  

If you can, reproduce the blocking condition. Else, wait until it occurs again.

7

Collect and analyze diagnostics

  1. Collect the SQL/API/Filter logs from every server in the server group.  The logs should include the timeframe from a time well before the problem was observed until well after (At least 30 minutes on either side) if possible.
  2. Collect the arexception.log, arerror.log, and ardebug.log from each server in the group.
  3. Collect any database information that the DBA can provide:
    1. MSSQL: At a minimum, run the All Blocking Transactions report and save as a PDF. You can run this report in SQL Server Studio by right-clicking the AR System database and selecting the report from the standard reports.
    2. Oracle:  ASH, ADDM, and AWR reports are all useful.
      The ASH report is often used for blocking issues. Make sure that these reports are created for the proper time that includes the blocking timeframe.
      The ASH report will provide the SID and serial# of the blocking session.
      The ADDM report will provide the SQLID, SID, and Serial# for the blocking session.

8

Identify the Blocked SQL

  • Use database tools or AR System Log Analyzer to identify the blocked and blocking SQL. See step 2 in this table. It is important to identify the Blocked SQL, the Blocked By or Blocking SQL, and the hostname of the blocker.
  • This information will help identify which logs are set to analyze (the blocking host) and which SQLs to look for.

MSSQL: Running sp_who2 or other related Stored Procedures may help.
Oracle
: The ASH report may be a good place to start.

If you were able to determine the host that is causing the blocking as well as the 'blocking' SQL, proceed to step 10, else perform the following steps:


    1. Start with the server where the problem was observed.  You must perform these steps on every server in the group.
    2. You should see an extremely long or unfinished (EXCEPTION) API call that contains the potential blocked SQL.
    3. The SQLs are often UPDATEs, INSERTs, or DELETEs.  Therefore, often the APIs are CE, SE, ME, DE, SVE, SGE.
  • After you have identified the blocked SQL,  note the table name and the timestamp.

9

Identify the "Blocking" SQL

Identify the blocking SQL with the AR System logs since this will help understand the cause.

By using the AR Log Analyzer or any other text editor, look through the logs for an API call that UPDATEs the same table at a time equal or earlier than the Blocked SQL.

  • This API should also be very long or unfinished (EXCEPTION). 
  • You can use the results of AR Log Analyzer to identify the ThreadID, Timestamp, and SQL text for the blocking SQL. If there are multiple results, you should find the earliest one that is in the same time frame as the blocking.
  • The blocker will start before all of the blocked SQLs start and will end just before all of the blocked SQLs end. Once the block is freed, the blocked SQLs can continue quickly.
  • If you see the blocking SQL in AR Log Analyzer, you can investigate the related API call to understand what the entire API is doing.
  • You can also use a text editor to filter the logs by the Thread ID (bookmark and only look at those bookmarked lines) and timestamp to identify all the SQL, API, and Filter lines for that singular API call that causes the blocking.

Note: It is possible that there are more than one Blocked API on multiple servers. 

It is important to look at logs from all the servers to find the first API call at the correct timeframe that UPDATED the table in question.

10

Identify why the SQL was blocking

After you have found the Blocking API call and SQL, you must understand why it ran for so long.


    1. What type of API was it?
      1. A bulk transaction might take a long time. 
      2. Identify the cause of the delay.
    2. A CE, SE, ME, ,SGE, or SVE might update a table but is expected to be fast.
      1. Identify if a specific filter in the API call was taking a long time and caused the transaction to not commit.
        For example, a plug-in is called during filter processing. The plug-in times out after 10 minutes. Therefore, the transaction is not committed.
      2. Troubleshoot the filter/plugin issue to resolve the blocking.

11

Find a solution

If the cause has not been found or no solution is available, proceed to the next step to gather logs and create a Support Case. Use the table below to troubleshoot specific problems with blocking.

12

Run support assistance tool

If you're still experiencing issues after following the steps or if you are unable to complete any of the steps above,  Viewing and collecting logs by using the Support Assistant tool. You must run the utility on each server that you enabled logs on. Rename the zipped file to contain the server name.

BMC Support will request for the logs when you submit a case on database blocking. 

13

Create a support ticket

  1. Provide the following information as part of your case:
    • Users who experienced the problem.
    • Approximate timestamp that the block was noticed.
    • If identified, provide the SQL statement and hostname of the Blocked and Blocking SQLs.
  2. Attach any material/report provided by the DBA.

Error messages and resolution

Symptom

Where

Resolution

A SQL was not called by the AR System server but was called from an external source.

Database Report

Identify why it was run and try to prevent it in future.

A valid API call takes a long time

AR System Log Analyzer

Make sure that this API does not run during normal business hours.

An API is running abnormally

AR System Log Analyzer

Create a support case.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*