Modifying reconciliation identification rules to avoid duplicates
After investigating and ruling out other potential causes of duplicate CIs, the issue may be that uniquely identifying a CI with the out of box identification rules is not possible. Below are three ways to address this situation.
Solving reconciliation identification issues by adding identification attributes
If the combination of identification attributes is not sufficient to uniquely identify a CI, it may be necessary to add another attribute to be used for identification.
For example, LPARs are populated to the BMC_ComputerSystem class.
The attributes used to identify Computer Systems include: Hostname, Domain, SerialNumber, isVirtual, and TokenId. Several LPARs on a host may have the same values for all five of these attributes, so these attributes alone are not sufficient to uniquely identify the LPAR. To make the description of the LPARs more unique, an attribute could be added to the BMC_ComputerSystem class, and used in identification rules.
For example, if the PartionId attribute were discovered from the LPAR and populated to an attribute on the ComputerSystem, this attribute plus the other five may be sufficient to uniquely identify the LPAR.
This is the most general solution, but it requires more steps. You should:
- Identify a suitable attribute or add one to the BMC_ComputerSystem class.
- Have each discovery source populate the attribute with a consistent value which can be discovered from the LPAR.
- Update Reconciliation identification rules to use this attribute.
The example problem occurred in earlier versions of BMC Atrium Discovery and Dependency Mapping (ADDM) and BMC Atrium CMDB Suite (CMDB). It was resolved by adding and populating the PartitionId attribute. On earlier versions of the product, the workaround described in Knowledge article KA344625 was used, which is an example of the approach described below: Solving reconciliation identification issues using an IntegrationId attribute.
Solving Reconciliation identification issues using an IntegrationID attribute
Many methods of bulk loading data to the CMDB use an IntegrationId attribute as a foreign key to the CI in the discovery product or data source. Examples of IntegrationId’s include:
- ADDMIntegrationId – used by BMC Atrium Discovery and Dependency Mapping product
- CDIntegrationId – used by BMC BladeLogic for Server Automation and BMC BladeLogic for Client Automation products
- SMSParentID – used by some integrations which load data from Microsoft System Management Configuration Manager (SCCM)
Since the IntegrationId is a unique identifier within the dataset, it may be tempting to use it as an identification attribute.
The IntegrationId attribute is not a good primary identification attribute because:
- The value will be NULL for CIs merged from other datasets
- No CIs in the target dataset will have a matching value of the IntegrationId, until after the CI is identified and merged to the target dataset.
For these reasons, the IntegrationId is only useful as an additional identification attribute to handle the case of re-identification of a dataset, or for CIs which are only populated by one data source.
To investigate if the solution will be viable
- Determine if there is only one data provider which populates the class in the CMDB. For example, BMC Atrium Discovery and Dependency Mapping populates the BMC_SoftwareServer class, but no other known data sources populate that class at this time.
- Determine if there is only one data provider which populates the CI to the CMDB. For example, it is common to have one data provider for servers and another discovery source for laptops and workstations. If there is a logical separation of the devices discovered by different data sources, this also makes it possible to match on an IntegrationId attribute.
- Determine which other identification rules besides Computer System encounter “multiple match” errors during reconciliation.
- Add the IntegrationId attribute to also be included in those identification rules.
- Determine whether the value of IntegrationId utilized is ever re-used, or if there are scenarios where the value of IntegrationId may change.
The areas of risk are when the CI is not found and removed from the discovery source and then re-discovered by the discovery source at a later time. This is the situation when several discovery providers either re-use the integrationid for a different CI, or change the value of the IntegrationId for the same computer system. A persistent, predictable value of IntegrationId is needed to use it for identification. For an example of this approach, see Knowledge article KA344629.
Solving Reconciliation identification issues by adding a fallback identification rule
If you select the “Generate IDs” option for the source dataset in the Identify activity, reconciliation will generate a new identify if no matches are found in the target dataset with any of the identification rules. If duplicate CIs are created, that means none of the identification rules found a match and the CI was auto-identified.One way to address this is to add another identification rule to be run after all of the existing ones. The goal is to add a less restrictive match than earlier identification rules. For example, if the hostname will be unique to the enterprise a fallback identification rule can be added to match only on hostname.
Always query on your data in the source dataset to ensure this attribute is as unique as expected before implementing a fallback rule. For example, LPARs generally create multiple computer systems with the same hostname value, so matching on hostname is not viable as an out of box rule. Adding a fallback identification rule that is too open can introduce over-reconciliation, where two different computer systems merge into the same one in the target dataset, so be careful to validate your assumptions about uniqueness within your environment.
Testing reconciliation job changes to address duplicate CI's
Remove duplicate CI's from the target dataset before testing reconciliation changes. See Removing duplicate CI's from the CMDB for more details on this process. You must test on a development or QA server. It may be helpful to test with a few representative computer systems, or to remove all data from the target dataset as a simple way to meet the requirement of no duplicate CI's in the target dataset to validate reconciliation changes.
To test changes to the reconciliation job, use cmdbdiag to reset the
ReconciliationIdentity for the CIs to be tested. This causes the identification activity to attempt to identify the CIs again. See Resetting reconciliation identities for more details.
Test each of the following scenarios:
- No data exists in target dataset, data in source dataset is un-identified.
This is a simple case where issues are rarely uncovered, but it is the initial situation for data population.
- Data already exists in the target dataset, but the entire source dataset is unidentified.
This is an uncommon situation, but it is a good stress test that confirms that the reconciliation rules are effective in identifying all the data, independent of the order the data was discovered and populated to the target dataset.
- Two or more source datasets are populated with some of the same computer systems, but some different computer systems.
This is a test that the computer systems are identified as the same in the target dataset, and not duplicated.
- Data already exists in the target dataset from multiple source datasets, and both source datasets are entirely unidentified.
This is a more comprehensive stress test.