Investigating issues with duplicate data in CMDB

Successful identification between multiple source datasets must fulfil the following conditions:

Each data provider populates data to the same classes in the CMDB, but into different source datasets.
Each data provider populates values to the attributes used for identification.
Each data provider populates identical values to the attributes used for identification.
Reconciliation identification rules look for a match on the identification attributes, from the most restrictive to less restrictive combination of attribute values.
Reconciliation tries to identify the CI in the second source dataset after the first one has been identified and merged into the golden dataset
Reconciliation identification rules ignore default values such as NULL or 0 in identification attributes.
Data providers do not populate non-unique, non-default values like "N/A" or "None" for identification attributes.
If host names are re-used by different computer systems, they are purged from the golden dataset before the new computer system using the host name is leveraged.

Note

Some of these requirements depend on implementation decisions and quality of data before reconciliation is attempted. Below is a sequence of steps to follow to determine which requirements are met.

To check for ...	Query or operation	Results and next steps
Duplicate values of IntegrationId attribute	Perform a query in the ARSystem database for the specified DatasetId and attribute, for example BMC.ADDM and ADDMIntegrationId SELECT COUNT(),ADDMIntegrationId FROM BMC_CORE_BMC_BASEELEMENT WHERE DatasetId = 'BMC.ADDM' GROUP BY ADDMIntegrationid HAVING COUNT () > 1 ;	The query should return no results for the IntegrationId, DatasetId combination. Check if the discovery source has multiple CIs with the same value of the IntegrationId, or if one of the CIs is considered deleted in the discovery source. This may indicate a deletion was not propagated to the source dataset in CMDB. Also check if the attribute used as an IntegrationId is recycled, or re-used by the discovery source after it is considered deleted. This is another case where you would not want to leave-soft-deleted data in the CMDB. See Purging-soft-deleted-data-to-optimize-CMDB-performance
Hosts with duplicate identification attribute values within a specified DatasetId	Examine identification rules for the class and make a list of all attributes used for identification. Perform a query in the ARSystem database for a match on all these attributes within a dataset. Query: SELECT Hostname, isVirtual FROM BMC_CORE_BMC_ComputerSystem WHERE DatasetId = 'BMC.ASSET' GROUP BY HostName, Domain, SerialNumber, TokenId, isVirtual HAVING COUNT(*)>1;	If this returns any results, it means the data in the identification attributes is insufficient to uniquely identify a CI. See section below on "Solving Reconciliation issues by adding identification attributes".
Computer System Identification attributes not populated	Perform a query in the ARSystem database: SELECT Name, instanceid, DatasetId FROM BMC_CORE_BMC_ComputerSystem WHERE TokenId = '0' AND SerialNumber IS NULL AND Hostname IS NULL;	Computer systems without values for TokenId, SerialNumber, or Hostname are the primary attributes for identifying a computer system. If none of these three attributes are populated, Reconciliation is unlikely to find a match with data populated by another data source. See section below on "Solving Reconciliation identification issues using an IntegrationID attribute"
Identification attribute value consistency for computer systems	Perform a query in the ARSystem database: SELECT Name, SerialNumber, Hostname, Domain, TokenId, isVirtual FROM BMC_CORE_BMC_ComputerSystem ORDER BY Hostname, SerialNumber, DatasetId, TokenID;	View the results to verify that the identification attribute values are populated to both datasets in the same format, with consistent values and case. Also scan for meaningless values like N/A, None, or Unknown.
Identification attribute value consistency for weak members between datasets	Perform a query in the ARSystem database, for example, on the BMC_OperatingSystem class: SELECT Name, DatasetId, TokenId FROM BMC_CORE_BMC_OperatingSystem GROUP BY Name, DatasetId, TokenId;	View the results to verify that the Name value is used consistent across datasets. For example, if the Name attribute is used to identify OperatingSystems, it is important that both data sources provide the same value of the operating system name. Displaying the query results on classes of interest will keep the results list short enough to scan. If the OperatingSystem name is different for different datasets, consider using Normalization before running reconciliation.
Sequence of operations	Select an example of a CI duplicated in the golden dataset which is merged from two source datasets with identical values for identification attributes. Check whether the CreateDate of both CIs in the golden dataset are later than the Modified Date of both CIs in the source datasets. This indicates that both the CIs were identified before they were merged to the golden dataset.	Avoid this situation by scheduling reconciliation jobs appropriately so identification and merging of the first source dataset is complete, before the second reconciliation job runs.

Investigating issues with duplicate data in CMDB

On this page