Analyzing a CI's critical failures
After service components and associated functions are identified, you need to monitor their status to analyze their effects and watch for failures. To do so, perform the following tasks:
- Identify the cause of failures and degraded performances for the service CI.
- Categorize the failures into availability, performance, and capacity.
- Identify the effects of the failures.
- Assign a severity level to each failure.
Severity level values are listed in the following table:
Severity level index
- Assign a frequency or occurrence level to each failure.
Occurrence level index values are listed in the following table:
Occurrence level index
Sample of failure modes effects and analysis
- Component–Message Transfer Agent (MTA)
- Function–Routes and converts messages
- Point of failure–Queue length size growing
- Issue type–Performance
- Cause of failure–Network connection failure, receiving MTA failure, problem on sending or receiving computer
- Effect of failure–Remote recipients will not receive an email message while MTA is down
- Severity–Significant
- Occurrence–Slight
- Prevention–Monitoring of the system, network, and exchange services
- Detection–PATROL NT and Exchange parameters related to the issue
Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*