Modifying reconciliation identification rules to avoid duplicates

After investigating and ruling out other potential causes of duplicate CIs, the issue may be that uniquely identifying a CI with the out of box identification rules is not possible. Below are three ways to address this situation.

To solve reconciliation identification issues by adding identification attributes

If the combination of identification attributes is not sufficient to uniquely identify a CI, it may be necessary to add another attribute to be used for identification.

For example, LPARs are populated to the BMC_ComputerSystem class.

The attributes used to identify Computer Systems include: Hostname, Domain, SerialNumber, isVirtual, and TokenId. Several LPARs on a host may have the same values for all five of these attributes, so these attributes alone are not sufficient to uniquely identify the LPAR. To make the description of the LPARs more unique, an attribute could be added to the BMC_ComputerSystem class, and used in identification rules.

For example, if the PartitionID attribute was discovered from the LPAR and populated to an attribute on BMC_ComputerSystem, this attribute plus the other five attributes may be sufficient to uniquely identify the LPAR.

Perform the following steps to resolve reconciliation identification issues:

Identify a suitable attribute or add an attribute to the BMC_ComputerSystem class.
Ensure every discovery source populates the attribute with a consistent value which can be discovered from the LPAR.
Update reconciliation identification rules to use this attribute.

Note

The example problem occurred in earlier versions of BMC Discovery and BMC Helix CMDB (CMDB). It was resolved by adding and populating the PartitionId attribute. On earlier versions of the product, the workaround described in Knowledge Article number 000344625 (Support logon ID required) was used, which is an example of the approach described below: Solving reconciliation identification issues using an IntegrationId attribute.

To solve reconciliation identification issues using an IntegrationID attribute

Many methods of bulk loading data to the CMDB use an IntegrationId attribute as a foreign key to the CI in the discovery product or data source. Examples of IntegrationId’s include:

ADDMIntegrationId – used by BMC Discovery product
CDIntegrationId – used by BMC BladeLogic for Server Automation and BMC BladeLogic for Client Automation products
SMSParentID – used by some integrations which load data from Microsoft System Management Configuration Manager (SCCM)

Even though the IntegrationId attribute is a unique identifier within the dataset, it is not a good primary identification attribute because:

The value will be NULL for CIs merged from other datasets
No CIs in the target dataset will have a matching value of the IntegrationId, until after the CI is identified and merged to the target dataset.

The IntegrationId is only useful as an additional identification attribute to handle the case of re-identification of a dataset, or for CIs which are only populated by a single data source.

To investigate if the solution is usable

Determine if there is only one data provider which populates the class in the CMDB. For example, BMC Discovery populates the BMC_SoftwareServer class, but no other known data sources populate that class at a given time.
Determine if there is only one data provider which populates the CI to the CMDB. For example, it is common to have one data provider for servers and another discovery source for laptops and workstations. If there is a logical separation of the devices discovered by different data sources, this also makes it possible to match on an IntegrationId attribute.
Determine which other identification rules besides BMC_ComputerSystem encounter “multiple match” errors during reconciliation.
Add the IntegrationID attribute also to the identification rules.
Determine whether the value of IntegrationID utilized is ever re-used, or if there are scenarios where the value of IntegrationID may change.
The areas of risk are when the CI is not found and removed from the discovery source and then re-discovered by the discovery source at a later time. This is the situation when several discovery providers either re-use the IntegrationID for a different CI, or change the value of the IntegrationID for the same computer system. A persistent, predictable value of IntegrationID is needed to use it for identification. For an example of this approach, see Knowledge article KA344629.

To solve reconciliation identification issues by adding a fallback identification rule

If you select the “Generate IDs” option for the source dataset in the Identify activity, reconciliation will generate a new identity if no matches are found in the target dataset with any of the identification rules. If duplicate CIs are created, that means none of the identification rules found a match and the CI was auto-identified. To check this, add a less restrictive match than the earlier identification rules.

Add another identification rule
Run it after all of the existing identification rules are run.
For example, if the hostname is unique to the enterprise, a fallback identification rule can be added to match only on hostname.

Always query on your data in the source dataset to ensure this attribute is as unique as expected, before implementing a fallback rule. For example, LPARs generally create multiple computer systems with the same hostname value, so matching on hostname is not viable as an out of box rule. Adding a fallback identification rule that is too open can introduce over-reconciliation, where two different computer systems merge to become one computer system in the target dataset. Ensure the uniqueness of the ID within your environment.

To test reconciliation job changes to address duplicate CIs

Remove duplicate CI's from the target dataset before testing reconciliation changes. See Removing duplicate CI's from the CMDB for more details on this process. You must test on a development or QA server. It may be helpful to test with a few representative computer systems, or to remove all data from the target dataset to meet the requirement of uniqueness of the CI's in the target dataset to validate reconciliation changes.

Start the cmdbdiag program.
For information about starting the cmdbdiag program, see Verifying-your-data-model.
To test changes to the reconciliation job, reset the ReconciliationIdentity for the CIs that will be tested. To perform reset action, see Cleaning-up-CMDB-data-by-using-the-CI-and-Relationship-Correction-Tool-option.
This causes the identification activity to attempt to identify the CIs again.

Test each of the following scenarios:

No data exists in target dataset, data in source dataset is un-identified.
This is a simple case where issues are rarely uncovered, but it is the initial situation for data population.
Data already exists in the target dataset, but the entire source dataset is unidentified.
This is an uncommon situation, but it is a good stress test that confirms that the reconciliation rules are effective in identifying all the data, independent of the order the data was discovered and populated to the target dataset.
Two or more source datasets are populated with some of the same computer systems, but some different computer systems.
This is a test that the computer systems are identified as the same in the target dataset, and not duplicated.
Data already exists in the target dataset from multiple source datasets, and both source datasets are entirely unidentified.
This is a more comprehensive stress test.