Datasets to partition data
Datasets can also represent the following aspects:
- Intended vs. actual configurations for verification.
- Production, test, or obsolete data.
- Multitenancy, such as data from different companies.
- Subsets like departments or regions.
Each CI or relationship in BMC Helix CMDB must belong to a dataset, identified by a unique DatasetId. Instances of the same CI in different datasets share a common reconciliation ID for consistency across contexts.
Default datasets in BMC Helix CMDB
This section describes the default datasets provided by BMC Helix CMDB, BMC Helix ITSM: Asset Management, and BMC Discovery products. If you use a non-BMC Discovery product, use Atrium Integratorto import data.
The following image represents a few data sources and the datasets that they populate:
The most important dataset is the golden or production dataset. By default, that is the BMC Asset (BMC.ASSET) dataset. This dataset is treated as the single source of truth for the CIs in your organization. The other datasets are a staging area for the data that the organization's other tools and applications refer to collect about information in the environment. The following table shows lists of some of the common datasets and the recommended purpose:
The following table provides the details of the default BMC Helix CMDB datasets used by discovery products:
Scenario: How Apex Global partitions data into datasets
Overlay datasets in BMC Helix CMDB
Use overlay datasets in BMC Helix CMDB for the following tasks:
- Make changes in a separate partition without overwriting your production data.
- See your changes in context with the unchanged portions of your data.
- Avoid duplicating your entire production dataset.
- Create multiple overlay datasets that reuse one set of reconciliation definitions for merging each into the production dataset.
How overlays work
You create an overlay dataset by specifying an existing regular dataset (the underlying dataset), and it starts empty. When you request a CI in the overlay dataset, it first checks the overlay and retrieves the CI from the underlying dataset if it isn’t in the overlay. Also, when you send a request for CIs to the underlying dataset, the CIs are retrieved from that dataset and not from the overlay dataset.
When you modify a CI in the overlay dataset for the first time, the overlay copies it from the underlying dataset and applies your modifications to the overlay version. You can also create new CIs directly in the overlay dataset, which do not exist in the underlying dataset.
When you request a modified or newly created CI from the overlay dataset, it retrieves it from the overlay. If you request an unmodified CI, the overlay retrieves it from the underlying dataset. This approach keeps the underlying dataset unchanged while letting you make modifications and additions in the overlay.
The following figure illustrates queries made to an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID:
Deleting instances from an overlay dataset
If you delete a CI from an overlay dataset, the behavior depends on where the CI exists. The CI is deleted according to the following conditions:
- If the CI exists in the overlay dataset, it is removed from the overlay dataset but remains unchanged in the underlying dataset.
- If the CI exists only in the underlying dataset, the deletion applies to that dataset and removes the CI entirely.
You can also mark a CI as deleted in the overlay dataset, regardless of where it exists. In this case, the overlay dataset creates or updates the CI as a deleted entry, leaving the underlying dataset unaffected.
Instance ID and identity in overlay datasets
When you modify a CI (instance) for the first time in an overlay dataset, it is created in the overlay with the same reconciliation identity as the CI in the underlying dataset but is assigned a new instance ID.
If the underlying CI has not been identified at the time of modification in the overlay dataset, it will not have a reconciliation identity in both datasets. When you later identify and merge the datasets, your Identify rules should match these CIs correctly, ensuring they share the same reconciliation identity.
If you keep the changes you modeled in an overlay dataset, you can merge them into the underlying regular dataset.