Troubleshooting cell startup failures
Content contributed by Steve Mundy
This section describes troubleshooting of the cell startup failures.
Enabling cell tracing and starting cell in foreground
To determine why the cell is unable to start, enable cell trace and start the cell in foreground mode. Starting the cell in foreground mode is the preferred way to troubleshoot cell startup failures because it will output messages even if the cell is unable to write to the trace file.
- In pw/server/etc/<cell> directory, create a file named mcell.trace (if not present) with contents: ALL ALL stderr
- Start the cell in foreground mode: mcell -d -n <cell>
The trace will be displayed in stdout and will display the reason for the cell failing to start. The common causes of cell startup failure are as follows:
Cell fails to start due to corrupt mcdb
Starting cell in foreground displays following error:
The preceding error message indicates that there is an inconsistency with the state file (mcdb). In this particular situation it will be necessary to revert to a previous mcdb file located in the same directory. A dir listing shows:
Directory of C:\Program Files\BMC Software\ProactiveNet\pw\server\var\Admin
15/05/2013 10:38 <DIR> .
15/05/2013 10:38 <DIR> ..
15/05/2013 10:38 11 datid.txt
15/05/2013 10:38 11 evtid.txt
15/05/2013 10:38 0 mcdb
15/05/2013 08:26 237,483 mcdb.11932a910
15/05/2013 09:28 237,326 mcdb.119338a70
15/05/2013 10:17 237,495 mcdb.119344760
14/05/2013 12:51 12 smid
15/05/2013 10:37 4,286 xact
15/05/2013 09:28 47,343 xact.119338a70.1
15/05/2013 10:16 38,431 xact.119344760.1
15/05/2013 10:17 299 xact.119344870.1
15/05/2013 10:37 9,105 xact.119349320.1
12 File(s) 811,802 bytes
2 Dir(s) 10,376,409,088 bytes free
From the dir listing you can see that mcdb.119344760 was the previous file and xact.119349320.1 was the transaction file that needs to be reapplied. The following steps will be needed to ensure that there is no loss of data:
- Take a backup of the cell var directory
- Rename mcdb to mcdb.bak
- Rename mcdb.119344760 to mcdb
- Rename xact to xact.2
- Rename xact.119349320.1 to xact.1
- Run statbld to create a new mcdb from the xact.1 and xact.2 files with command statbld -n <cell>
- After statbld has run successfully, then the cell can be started.
Cell fails to start due to stabld not working
Starting cell in foreground shows following error:
The preceding error messages, indicate a problem with the statbld process. There are a number of reasons for its failure. Now, run the mlogchk -n <cell> command, as this will perform a consistency check and advise of any action required. If mlogchk does not find any inconsistency then run statbld with trace enabled:
- In pw/server/etc directory, modify file statbld.trace so that it contains: ALL ALL stderr
- Run statbld from a command window: statbld -n <cell>
The trace will be displayed in the stdout and will show the reason for statbld failure.
Cell fails to start with message "Impossible to bind endpoint"
Starting cell in foreground shows the following error:
This indicates that the cell has been unable to bind to the port defined in the pw\server\etc\mcell.dir file. From the message we can see it is port 1827. The following are known reasons for this problem:
- There is another process using that port. A netstat command should be run to see if anything is already listening on that port.
- This is a HA cell and the definition for that cell in mcell.dir file on primary server and secondary server are different.
- This is a secondary HA cell and the mcell.conf incorrectly contains CellDuplicateMode=1
- The cell is already running.
Cell fails to start with message "BMC-IMC032205F: Cannot read knowledge base file"
Starting cell in foreground shows following error:
20130515 160755.223000 mcell: EVTPROC: BMC-IMC090004F: Failed to load knowledgebase definitions
This indicates that the cell is unable to load the knowledge base (KB). Open a command window and run mccomp -n <cell> to recompile the KB. Resolve any errors it reports (if any) and then start the cell again.