Datasets to partition data
The primary purpose of datasets is to partition data according to the providers of that data. Each discovery application that you use should store the data that it discovers in a separate dataset. Data in a separate database that you import to BMC Configuration Management Database (BMC Helix CMDB) through Atrium Integrator should be stored in a separate dataset.
You can also use datasets for other methods of partitioning data. For example, you could use datasets to represent production data or obsolete data. Your datasets do not all need to contain different versions of the same CIs and relationships. For example, you could use datasets to hold:
- Subsets of your overall data, such as departments or regions
- Data from different companies for multitenancy
- Test data
A dataset can contain only one instance of a given CI. An instance of that CI might also exist in other datasets to represent the CI in the contexts of those datasets. Instances representing the same CI or relationship across datasets share the same reconciliation identity, or reconciliation ID.
Each CI and relationship in BMC Helix CMDB must reside in a dataset, meaning that they have a DatasetId attribute that must contain a value.
BMC datasets in BMC Helix CMDB
This section describes the default datasets provided by BMC Helix CMDB, BMC Helix ITSM: Asset Management, and BMC Discovery products. If you use a non-BMC Discovery product, use Atrium Integratorto import its data.
Default BMC Helix CMDB datasets used by BMC Discovery products
Scenario: How Calbro Services partitions data into datasets
Overlay datasets in BMC Helix CMDB
BMC Helix CMDB offers overlay datasets, which enable you to:
- Make changes in a separate partition without overwriting your production data.
- See your changes in context with the unchanged portions of your data.
- Avoid duplicating your entire production dataset.
- Create multiple overlay datasets that reuse one set of reconciliation definitions for merging each into the production dataset.
How overlays work
When you create an overlay dataset, you must specify an existing regular dataset for it to overlay. Although an overlay dataset starts out empty like any other dataset, any request for an instance in the overlay dataset passes through the overlay dataset and returns that instance from the underlying dataset.
When you modify an instance in the overlay dataset the first time, it is copied there from the underlying dataset with your modifications. You can also create instances in the overlay dataset. The underlying dataset still holds the unmodified versions of its original instances, but it does not hold the newly created instances. A request to the overlay dataset for a new or modified instance returns that instance from the overlay dataset, and a request to the overlay dataset for an unmodified instance returns it from the underlying dataset.
The following figure illustrates queries against an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID.
Query to an overlay dataset
Result of deleting instances from an overlay dataset
If you attempt to delete from an overlay dataset an instance that actually exists there, the instance is deleted only from the overlay dataset and remains in the underlying dataset. If you attempt to delete from an overlay dataset an instance that exists only in the underlying dataset, the instance is deleted from the underlying dataset.
You can mark an instance as deleted regardless of whether it already exists in the overlay dataset. In either case, this results in an instance in the overlay dataset that is marked as deleted.
Instance ID and identity in overlay datasets
When an instance is first created in an overlay dataset as the result of a modification, it retains the reconciliation identity of the instance in the underlying dataset, but is assigned a new instance ID.
If the underlying instance has not yet been identified when it is modified in the overlay dataset, the instance has no reconciliation identity in either dataset. This is not a problem. When you eventually identify and merge the two datasets, your Identify rules should be able to match these instances so that they receive the same identity.
If you decide to keep the changes that you modeled in an overlay dataset, you can merge them into the underlying regular dataset.