Investigating issues with duplicate data in CMDB
Successful identification between multiple source datasets must fulfil the following conditions:
- Each data provider populates data to the same classes in the CMDB, but into different source datasets.
- Each data provider populates values to the attributes used for identification.
- Each data provider populates identical values to the attributes used for identification.
- Reconciliation identification rules look for a match on the identification attributes, from the most restrictive to less restrictive combination of attribute values.
- Reconciliation tries to identify the CI in the second source dataset after the first one has been identified and merged into the golden dataset
- Reconciliation identification rules ignore default values such as NULL or 0 in identification attributes.
- Data providers do not populate non-unique, non-default values like "N/A" or "None" for identification attributes.
- If host names are re-used by different computer systems, they are purged from the golden dataset before the new computer system using the host name is leveraged.
To check for ... | Query or operation | Results and next steps |
---|---|---|
Duplicate values of IntegrationId attribute
| Perform a query in the ARSystem database for the specified DatasetId and attribute, for example BMC.ADDM and ADDMIntegrationId SELECT COUNT(*),ADDMIntegrationId FROM BMC_CORE_BMC_BASEELEMENT WHERE DatasetId = 'BMC.ADDM' GROUP BY ADDMIntegrationid HAVING COUNT (*) > 1 ; | The query should return no results for the IntegrationId, DatasetId combination. Check if the discovery source has multiple CIs with the same value of the IntegrationId, or if one of the CIs is considered deleted in the discovery source. This may indicate a deletion was not propagated to the source dataset in CMDB. Also check if the attribute used as an IntegrationId is recycled, or re-used by the discovery source after it is considered deleted. This is another case where you would not want to leave-soft-deleted data in the CMDB. See Purging-soft-deleted-data-to-optimize-CMDB-performance |
Hosts with duplicate identification attribute values within a specified DatasetId |
Query: SELECT Hostname, isVirtual FROM BMC_CORE_BMC_ComputerSystem WHERE DatasetId = 'BMC.ASSET' GROUP BY HostName, Domain, SerialNumber, TokenId, isVirtual HAVING COUNT(*)>1; | If this returns any results, it means the data in the identification attributes is insufficient to uniquely identify a CI. See section below on "Solving Reconciliation issues by adding identification attributes". |
Computer System Identification attributes not populated | Perform a query in the ARSystem database: SELECT Name, instanceid, DatasetId FROM BMC_CORE_BMC_ComputerSystem WHERE TokenId = '0' AND SerialNumber IS NULL AND Hostname IS NULL; | Computer systems without values for TokenId, SerialNumber, or Hostname are the primary attributes for identifying a computer system. |
Identification attribute value consistency for computer systems | Perform a query in the ARSystem database: SELECT Name, SerialNumber, Hostname, Domain, TokenId, isVirtual FROM BMC_CORE_BMC_ComputerSystem ORDER BY Hostname, SerialNumber, DatasetId, TokenID; | View the results to verify that the identification attribute values are populated to both datasets in the same format, with consistent values and case. Also scan for meaningless values like N/A, None, or Unknown. |
Identification attribute value consistency for weak members between datasets | Perform a query in the ARSystem database, for example, on the BMC_OperatingSystem class: SELECT Name, DatasetId, TokenId FROM BMC_CORE_BMC_OperatingSystem GROUP BY Name, DatasetId, TokenId; | View the results to verify that the Name value is used consistent across datasets. For example, if the Name attribute is used to identify OperatingSystems, it is important that both data sources provide the same value of the operating system name. Displaying the query results on classes of interest will keep the results list short enough to scan. |
Sequence of operations | Select an example of a CI duplicated in the golden dataset which is merged from two source datasets with identical values for identification attributes. Check whether the CreateDate of both CIs in the golden dataset are later than the Modified Date of both CIs in the source datasets. This indicates that both the CIs were identified before they were merged to the golden dataset. | Avoid this situation by scheduling reconciliation jobs appropriately so identification and merging of the first source dataset is complete, before the second reconciliation job runs. |