Troubleshooting installer failure during upgrade
This topic provides the following information:
Restoring background processes
During upgrade, a group of background processes are disabled until the upgrade completes. See Processes automated during upgrade.
This is handled by the Operating-Mode server setting. During upgrade, the value of Operating-Mode is set to 1 or 2 based on whether the installer is running on the primary or secondary server, respectively. This is applicable to all Remedy installers. However, if the installer crashes or aborts unexpectedly, the background processes may not be restored to their pre-upgrade settings. This requires that the Operating-Mode is set to value 0, the normal mode of operating.
To restore the background processes to their pre-upgrade values:
- Double-click the javadriver.bat (Windows) and javadriver.sh (Unix) file. The file is located at the following location:
- (Windows) <InstallDirectory>\ARSystem\Arserver\api\lib
Ensure that the value of JAVA_HOME is set correctly. If you do not set the JAVA_HOME correctly, you may not be able to run the BMC Remedy Configuration Check utility outside of the installer to perform pre-upgrade and configuration checks.
Use Java 8 update 45 or later.
- To initialize, enter the
- To log on, enter the
logcommand and provide details such as user name, password, and server name.
- If you are not using the port mapper, enter the
ssp(Set Server Port) command and then enter the server port number.
- Enter 0 or a blank for Using private socket.
- Enter the
vercommand to verify the login information.
- Enter the
ssi(Set Server Info) command and perform the following:
- Enter 1 for the Number of server info operations that you want to perform.
- Enter 463 as the Operation number to set the server to the operating mode.
- Enter 2 for integer as the Datatype.
- Enter 0 or 1 as the Integer Value.
Command: ssi SET SERVER INFO Number of server info operations (0): 1 Operation (1-605) (1):463 Datatype Null/Key/Int/Real/Char/DiaryList/Enum/Time/Bitmask/Byte Decimal/attach/currency/date/timeofday/join/trim/control/Table/Column/ulong/ coords/view/display (0 - 14, 30-34, 40-43) (0): 2 Integer Value (0): 0 Set Server Information Status ReturnCode: OK Status List : 0 items
The ar.cfg is populated with pre-upgrade values.
Frequently asked questions
This section lists some of the scenarios you might come across while performing the zero-downtime upgrade of the platform components:
If you upgrade the platform components on some of the servers of a server group, the AR System Administration > Server Information form, Platform tab displays the Upgrade status as pending.
From 9.1.04 onward, when you perform upgrade for the platform components, you must upgrade AR, Atrium, and Atrium Integrator on all the servers of a server group. The AR System Administration > Server Information form, Platform tab displays the Upgrade status as Done after you upgrade the platform components on all the servers of a server group. ITSM installer cannot perform upgrade unless the platform components are upgraded on all the primary servers.
After upgrading the platform components on all the servers of a server group, if you still see the upgrade status as pending, check the following:
- Check the platform component version information on AR System Server Group Operation Ranking from. The following fields on the AR System Server Group Operation Ranking from display 9.1.04 for all the upgraded servers:
- AR Server Version
- CMDB Version
- AI Version
- If you have any inactive servers in the server group (either a server is down or entries are there in the ranking form for a deprecated server), server waits for 48 hours and it updates the status as done.
- Check the AR error log if there are any errors related to [Post Upgrade]. If there are any errors, resolve them.
- If you find multiple entries in the ft_pending table, the post-upgrade activities take time. Wait till the process is completed. Check the status in the AR error log.
Exampe for inactive servers: If you have 3 servers in your server group, Platform components are upgraded on server 1 and server 2. Server 3 is down for more than 48hrs. After 48hrs, the post upgrade activity is triggered and the upgrade status is displayed as 'Done'. Due to this reason, you will not be able to start the server 3 which is not upgraded. If you want to bring up the server 3, you must upgrade it. During upgrade, if any of the platform components fails, filesystem is rolled back but the server will not come up because of DB version mismatch. See the DB mismatch version error in the AR error logs.
Example for active server but not upgraded: If you have 3 servers in your server group. Platform components are upgraded on server 1 and server 2. Server 3 is up but not yet upgraded. Even after 48hrs, the post upgrade activity is not triggered on server 3 and upgrade status will be 'Pending'.
On the server 3 (last server), if you completed AR installation, ensure that Atrium Core and AI upgrade is completed within 48 hrs. If you do not upgrade Atrium Core or AI within 48hrs, the post upgrade activity is triggered and the upgrade status is marked as Done. Post that, if you upgrade, Atrium Core and AI and if the upgrade fails, only the file system of AR, Atrium Core, and AI is rolled back but the server will not come up because of DB version mismatch. See the DB mismatch version error in the AR error logs.
If you want to change the default 48hrs wait time, add ZDT-Upgrade-Max-Wait-Hour-For-Inactive-Server as a CCS shared parameter and add a higher value (> 48hrs). The maximum value that you can set is 336hrs.
Zero-downtime upgrade is not supported for the applications.
Before upgrading the application components on the primary server, ensure that the AR server is down on all the secondary servers. If you do not shut down the secondary servers, you might come across with some issues. For example, delay in the upgrade time and deadlocks.
Yes, you can perform the zero-downtime upgrade for AR even if you do not have CMDB and AI. After upgrading the AR on all the servers of a server group, AR System Administration > Server Information form, Platform tab displays the Upgrade status as Done.
For a single server environment also, you can perform the zero-downtime upgrade. Few of the operations may not be available during upgrade.
After upgrading the platform components successfully, the installer deletes the backup folders. If the backup folders are not deleted automatically, you have to manually run the cleanup utility. For information about backup folder, see
BMC recommends not to cancel the installation after you click Install. However, if you cancel the installation, platform components are not rolled back to the earlier version automatically. You must manually run the Rollback utility to roll back the platform components and the file system to the earlier version.
Example (to rollback a particular component): If you have upgraded AR and CMDB successfully to 9.1.04 and cancel the upgrade process, while running the AI installer, you must run the Rollback utility with Atrium Integrator parameter in the following manner:
[Usage] rollback.bat "<AR Server Name>" "<AR Admin User Name>" "<AR Admin password>" "<AtriumIntegrator>
After reverting AI, run AI installer again to upgrade it to 9.1.04. After upgrading AI successfully, check the Upgrade Summary screen for the status of the upgrade.
For more details, see Rollback mechanism
Yes, upgrade fails. BMC recommends that you upgrade the platform components (AR, CMDB, and AI) on all the servers of a server group, and then start upgrading the application components.
AR only upgrade is not supported. You must upgrade CMDB and AI also as part of platform upgrade.
There is no separate zero-downtime installer. It is the regular installer. By default, the installer runs in zero-downtime mode. For more information, see Preparing for zero downtime upgrade of the platform
Yes. Normally it is the first server that was installed in the server group, but correct definition is any server with rank=1 for Administration operation is the primary server. That is the first server to be upgraded in a server group.
In a server group environment, if the platform components are upgraded to the latest version (for example, 9.1.04) and couple of servers are yet to be upgraded (still on old version), it is called mixed-server-mode.
Yes. Any client such as SmartIT/DWP that uses AR API to connect to the AR server continues to work in case of mixed-server-versions mode. Since all APIs and functionality of the server is backward compatible, receiving a call from a client such as SmartIT using the same API to any version of server is served with the same expected response.
If users do not use a true cluster, there will not be seamless failover. So if that mid tier is taken out of load balancer, users will see session timeout. If they need to use zero-downtime, they need to have true cluster with failover configured.
Only the backup and restore functionality is changed in Mid Tier installer 9.1.04. The Mid Tier installer continues to work in the same manner as it was in the versions prior to 9.1.04.
No. There are dependencies in AR, CMDB and AI - so AI will rollback AI+CMDB+AR and it has to start again from AR.
If AI installation is cancelled, run the rollback utility manually to rollback only AI and reinstall AI.
Yes. Installer always takes backup of latest file system everytime you start upgrade.
No backups of VM/DB. There is live data coming into the database, so it cannot be restored. If upgrade fails, fix the issues that casused failure and then start with the upgrade. Restoring database is not recommended.
In a production environment, Mid Tier is usually installed on a different machine than AR. AR, CMDB and AI are rolled back together as they have to be on the same version at any time. If AR is upgraded successfully and CMDB upgrade fails, users may decide to stop the upgrade and resume it after a week. In that case, all the platform components (AR, CMDB, and AI) have to be on the same version so that users can continue to use the older version.
It can be done technically. However, for zero-downtime upgrade, it is not recommended. As users are still accessing the system and only one primary server may not be able to take up the user load.
After performing the zero-downtime upgrade through a script or through a job ( for example, Jenkis job) successfully for the platform components, you may notice the following scenarios:
- The AI installer cleans up the AI backup folder but is unable to clean up the CMDB and AR backup folders.
- If Atrium Core (CMDB) upgrade fails, the CMDB installer cleans up the Atrium Core backup folder but is unable to clean up the AR backup folder.
- If AR upgrade fails, AR installer cleans up the AR backup folder.
Workaround: Before starting the zero-downtime upgrade through a script or a job on a Windows or UNIX environment, ensure that the environment variable is correctly set for all the 3 platform components - AR, CMDB, and AI.
If the BMC Atrium Core Web Services are installed on external Tomcat 7.0.58 or lower, rollback mechanism does not work. BMC recommends that you upgrade your external tomcat to 7.0.59 or above.
The upgrade process gets stuck if the back up directory path contains a soft link. Instead of providing the soft link for backup path, point to the directory where you would like to create a backup folder. Make sure that the directory intended for the backup contains the required space.
If you are upgrading the Remedy platform components in a server group, the installer triggers the post installation tasks after successfully upgrading all the secondary servers. However, no message is displayed indicating all the secondary servers are upgraded or the post install tasks are triggered.
- To view the upgrade status of a server group, go to AR System Administration > Server Information form, Platform tab. The Server Group Upgrade Status field displays the status Done if all the servers of a server group are upgraded successfully.
- To verify if the post upgrade activities are triggered, check the arerror.log located at <InstallDirectory>\BMC Software\ARSystem\Arserver\Db.
After automatic rollback of AR, when you click the Applicaction License list on the User form, the following error is displayed:
The specified menu is invalid. (ARERR 9372)
As a workaround, manually enter the license details in the Application License field.
When zero-downtime upgrade fails, platform components and file system are rolled back to the older version either automatically or manually.
- Automatic rollback: The installer (AR, Atrium Core, and AI) triggers rollback whenever upgrade fails.
- Manual rollback: Every installer provides rollback utility. You have to run it manully if the automatic rollback fails.
The database remains upgraded even if the upgrade fails. As database is upgraded, some of the forms may display additional fields introduced in 9.1.04 but do not function.
The graphic below shows the platform components that are rolled back at different stages of zero-downtime upgrade:
While performing zero-downtime upgrade, BMC recommends that you do not cancel the installer after clicking Install. If you cancel the installer, you have to manually run the rollback utility and use the scripts to rollback the respective platform component to the older version. For example, after performing zero-downtime upgrade successfully for AR and CMDB, if you cancel the AI installer, you have to run the rollback utility and use the script to roll back AI to the older version.
Note: Rollback utility works if the zdt backup folder is available. If the backup folder is deleted, manual rollback does not work.
Do not delete the backup folder. The installer deletes the backup folder after successfully upgrading the platform components on an upgraded server. For example, if you have AR, Atrium Core, and AI system, only after completing the AI upgrade, the AI installer deletes the backup folder of AR, Atrium Core, and AI. If the automatic cleanup of backup fails, you must run the cleanup utility from the installation folder.
Running the rollback utility manually
You must run the rollback utility manually only when one of the following conditions arises:
- Zero-downtime upgrade of Remedy platform failed and automatic rollback faialed. For example, automatic rollback did not copy some of the files or folders.
- Zero-downtime upgrade of Remedy platform is cancelled explicitly or abruptly.
When automatic rollback fails on a server in a server group, run the rollback utility on the server where automatic rollback failed.
BMC suggests that you do not run the rollback utility when all servers of a server group are successfully upgraded as couple of tasks that get executed after the upgrade are not reversible through rollback.
To run the rollback utility
The table below lists the actions to be performed if automatic rollback fails:
|Run the rollback.sh (UNIX) or rollback.bat (Windows) utility |
<AR installation directory >\AR\installcompletionutility\
rollback.sh or rollback.bat.
|Run the rollback.sh (UNIX) or rollback.bat (Windows) utility located at <AI installation directory>\AI\installcompletionutility\ rollback.sh or rollback.bat.|
|Cancel the installer for a platform component|
Run the rollback utlity from the respective installation directory and pass on the required parameter.
For example, while performing the zero-downtime upgrade for Atrium Core, if you cancel the installer, you must run the rollback utility from the Atrium Core installation folder manually to roll back CMDB to the older version. You have to pass on the AtriumCore parameter through the rollback utility in the following manner:
While passing the
For more information regarding the Rollback utility, see
Free and available ports
While you install the BMC Remedy AR System server, if the port number you provide during the installation of the BMC Remedy AR System components is already in use, the following error messages are displayed in the error log or on the installation panels during preinstallation:
The AR System Server TCP Port Number :: PortNumber in use. Specify an unused port address between 1024 and 65535 The Java Plugin Server TCP Port Address :: PortNumber in use. Specify an unused port address between 1024 and 65535
You must change the port number and then continue with the installation. For more information, see Overview of the portmapper service.
Troubleshooting data and workflow import issues
During installation, if you encounter data import issues (.ARX file) or workflow import issues (.DEF file), perform the following steps for troubleshooting:
- Examine the ProductName_Install_log.txt file located in the Temp directory.
- Look for a statement with RIKj exceptions, followed by -n and -l parameters, where -n indicates the name of the log file and -l indicates the location of the log file.
- Refer to the log file, for example logfilename.log, indicated by the -n parameter for specific information on the failure. Ideally, this file is located in the InstallDirectory\Logs directory.
- If an error has occurred while importing a particular file, a message is written to the corresponding error file, for example logfilename_error.log file, located in the InstallDirectory\Logs directory.
The message indicates the reason for the import failure.
- In the logfilename_error.log file, note the time stamp of the failure entry.
- Scan the logfilename.log file for the entries immediately preceding the failure entry in the chronological order.
- If the error message in the log file does not describe the issue, enable SQL and API logging and import the .DEF or .ARX file again. This will reproduce the error with additional information.
When you enable these logging options, log files are created in the following directory:
- (Microsoft Windows) ApplicationInstallDirectory\db
- (UNIX) ApplicationInstallDirectory/db
You can change the log file names and their locations at any time:
- In the BMC Remedy AR System Administration Console, click on System > General > Server Information > Log Files
- On the Log Files tab, change the name and location of the required log files.
To enable SQL and API logging:
- In the BMC Remedy AR System Administration Console, click on System > General > Server Information> Log Files.
On the Log Files tab, select the API Log and SQL Log checkboxes.
- Examine the status of all the data and workflow imports in the ApplicationInstallDirectory\Logs\applicationName_error.html file, which is generated when an installation fails.
- Examine the arerror.log for issues with the server. For more information about arerror.log, see from the BMC Atrium Core documentation.
Escalations stop running after upgrading in server group environment
After you have successfully upgraded the BMC Remedy AR System server, and if escalations do not run, you must change the Disable-Escalations-Global configuration parameter to 'F'.
See, in BMC Remedy AR System documentation.
In a server group environment this configuration parameter is shared among all the servers. If you set this parameter on any one of the servers, it will be applicable for all the servers.
For a non-server group environment, this parameter is same as Disable-Escalations. If you configure Disable-Escalations parameter, Disable-Escalations-Global is automatically updated with the same value and vice-versa.
The Disable-Escalations-Global configuration parameter is enabled (set to T) in the following scenarios:
- In a 9.x server group environment you have removed one of the secondary servers out of server group wherein, it still points to same database as the Server Group. You set Disable Escalations to T. As a result, Disable-Escalations-Global parameter is set to T. Due to this none of the escalations work in server group environment.
- In non-server group environment , you set Disable-Escalations parameter to T and Disable-Escalations-Global parameter to T. If you edit the ar.cfg and change the parameter Disable-Escalations parameter to F, Centralized Configuration does not update the global parameter Disable-Escalations-Global to F; it only updates the local parameter Disable-Escalations to F and thus the escalations work will not work in your environment.