Troubleshooting Business Rules Engine issues in server groups
Important
Starting with the version 9.1.04, BMC does not provide support for the SLM Collector and any BMC provided integrations that make use of the SLM Collector.
You might encounter the following issues with the Business Rules Engine in server groups.
Issue symptoms
You might get ARERR 92
and ARERR 94
(database time-out) errors, ARERR 93
(busy server) errors, and ARAPPERR 4502
and ARAPPNOTE 4501
errors. One of the most important symptoms is the confirmation of BRIE server termination (crash) and restart from review of the entries in armonitor.log file.
The following are the sample error message that you might get:
ddd mmm xx hh:mm:ss yyyy BRIE : Operation cancelled due to error (ARAPPERR 4502)
ddd mmm xx hh:mm:ss BRIE : Operation cancelled due to error (ARAPPERR 4502) ddd mmm xx hh:mm:ss - -
ddd mmm xx hh:mm:ss BRIE : Timeout during database update – the operation has been accepted by the server and will usually complete successfully (<server>) ARERR - 92 ddd mmm xx hh:mm:ss BRIE : AR System Application server terminated – fatal error encountered (ARAPPNOTE 4501)
ddd mmm xx hh:mm:ss BRIE : Timeout during database query – consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) ddd mmm xx hh:mm:ss BRIE : <server>
Issue scope
Business Rules Engine (BRIE) errors are reported to the arerror.log file during the build of service targets, at startup, and when there is no BRIE server activity other than normal polling action. The BRIE server is a client connection to the AR System server, and any systemic issues that cause ARERR 92, 93, 94
errors for users accessing the system have a similar impact on the BRIE server connection.
Entries in arerror.log file which include ARERR 4502 errors indicate failure of a service target build. However, not all service target build failures result in ARERR 4502 errors being generated to the arerror.log file.
Resolution
If the errors for the BRIE server in the arerror.log file are reported during the same time frame as the user reporting these same errors, they can be considered as a symptom of underlying system issues (for example, poor database performance, inadequate resources, and so on) rather than a source of a problem. In that case, to troubleshoot these errors, review the logs and system performance that these errors require.
BRIE server issues
The following section covers some of the systemic problems that can impact the BRIE server when the server group configuration is either incomplete or incorrect.
Best practice
To re-evaluate BRIE errors
Restart all servers in the server group,
Server-Group-Member: F
.
Restart the servers after you de-select the Server Group Member from the Configuration tab on the Application Administration Console or by manually setting the Server-Group-Member in the ar.cfg[conf] file. The next time you restart the AR System server, it starts without being a member of a server group.For a robust server group configuration, the following entries are in the ar.cfg[conf] file for each server in the server group:Best practice
We recommend that you run one AR System server at a time during maintenance to prevent duplication of server group operations on the same database (for example, escalations). The system runs with a single AR System server for the duration of the maintenance period. However, this allows your production system to be online as it may not be possible to temporarily shut down production for a maintenance window.- Server-Name entry in common for all servers in server group
- Server-Connect-Name as hostname (FQDN) of the server box where this member of server group is running
- Separate IP-Name entry for each of the following fields, for every member in the server group:
- hostname
- FQDN
- IP address
For minimum server groups with two AR System servers, there are six IP-Name entries in each ar.cfg[conf] file: IP-Name: hostnameServer1 P-Name: FQDNServer1 IP-Name: IPAddressServer1 IP-Name: hostnameServer2 IP-Name: FQDNServer2 IP-Name: IPAddressServer2
Enter the Domain-Name.
Typically, the value for Server-Name is the host name of the server machine where the AR System server is running. The requirement to have a Server-Name entry is common for all servers in a server group, as this default value is no longer suitable. Typically, the Server-Name entry is the name of the load balancer when load balancer is part of the environment.Important
You do not need to have a load balancer in order to configure server groups.
The value for the Server-Name entry is the alias for theAR System server group and can be any value that is appropriate. The Server-Name alias must be resolvable in the environment where each AR System server is running. You can verify this by using the ping utility.
To update the server group configuration
- Update the ar.cfg file for the primary AR System server to temporarily disengage server group functionality:
Server-Group-Member: F
- Add IP-Name entries to the ar.cfg file for both AR System servers:
- IP-Name:
hostnameServer1
- IP-Name:
FQDNServer1
- IP-Name:
IPAddressServer1
- IP-Name:
hostnameServer2
- IP-Name:
FQDNServer2
- IP-Name:
IPAddressServer2
- IP-Name:
- Shut down both primary and secondary AR System servers.
- Restart the primary AR System server.
- Verify the value for the Server-Group-Name on the Advanced tab in the Application Administration Console (not required for AR System server 7.5 and later).
- Update the ar.cfg[conf] file for the primary AR System server to engage server group functionality:
Server-Group-Member: T
- Restart all AR System servers in the server group.
- Verify the entries in the Server Group Operation Ranking form, specifically whether the expected values are listed for Server field options. The expected values are a single option for each server in the server group with values corresponding to the Server-Connect-Name for that AR System server.
- If the entries listed for Server field options in the Server Group Operation Ranking form are as expected, then maintenance on server group configuration is complete. If the entries listing for Server field are not what is expected, the following tasks are required:
- Delete all entries in the form and re-enter them.
In some cases, it might be necessary to do some maintenance at the database level and delete entries in the server group tables, but this should be done under the supervision of technical support as changes at the database level are not supported. If you need to update server group tables at the database level, provide output from the following queries to technical support:
select * from servgrp_applic
select * from servgrp_board
select * from servgrp_config
select * from servgrp_ftslic
select * from servgrp_op_mstr
select * from servgrp_userlic*
Best practice
We recommend that you not make any changes at the database level until you have a viable database backup, especially in critical production systems and only under the specific direction of technical support.If you made any changes to the Server Group Operation Ranking form, you must restart all AR System servers in the server group. When the server group is properly configured, the ar.cfg[conf] file for each server in the server group is updated with entries for the server group operations where the value of F indicates the AR System server ranked as 1 for that operation and T indicates the AR System server is not ranked as 1 for that operation. For example:
Approval-Server-Suspended: F (or T)
Assignment-Engine-Suspended: F (or T)
Business-Rules-Engine-Suspended: F (or T)
CMDB-Service-Suspended: F (or T)
Reconciliation-Engine-Suspended: F (or T)
For server group configuration where BMC SLM is a component of the environment, it is required that BRIE has the same ranking as the Administration operation. To build service targets, you must log directly into the AR System server where Administration operation is ranked as 1.
Comments
Log in or register to comment.