Datasets to partition data

Partitioning is dividing your configuration data into subsets, each representing a logical group of configuration items (CIs) and relationships. In BMC Helix CMDB, these partitions are called datasets. The same real-world object or relationship can be represented by instances in more than one dataset. For example, different discovery applications can create CI and relationship instances in different datasets. You can later merge those instances into a single production dataset.

This is important for the goal of verifying and correcting configuration records against the infrastructure. You can create one dataset representing your intended configuration, then use a discovery application to create another dataset representing your actual configuration, and verify the former against the latter.

BMC datasets in BMC Helix CMDB

This section describes the default datasets provided by BMC Helix CMDB, BMC Helix ITSM: Asset Management, and BMC Discovery products. If you use a non-BMC Discovery product, use Atrium Integratorto import its data.

Default BMC Helix CMDB datasets used by BMC Discovery products

Scenario: How Calbro Services partitions data into datasets

Scenario

Calbro Services needs to discover various hardware and software, bring relevant information from the payroll services, compare this data and put preferred pieces of information in the production dataset.

Calbro Services uses the BMC BladeLogic Client Automation product to discover the desktop and laptop computer systems, (including hardware and software) used by Calbro Services employees. This information is stored in the BMC Configuration Import dataset of BMC Helix CMDB.

Calbro Services also uses BMC Discovery to discover information about the servers, software, and other devices used to deliver banking information to Calbro Services customers. This data is stored in the BMC.ADDM dataset.

Lastly, Calbro Services uses a third-party discovery tool to collect information about the equipment that supports the corporate payroll services. Calbro Services uses Atrium Integrator to bring relevant data from the payroll database into BMC Helix CMDB. The administrator creates a new dataset named Calbro Payroll specifically for this information.

Because some of the instances in these different datasets might represent the same real-world CIs, the administrator configures BMC Helix CMDB reconciliation jobs to compare those datasets against each other and put the preferred pieces of information in the production BMC Asset dataset.

Overlay datasets in BMC Helix CMDB

BMC Helix CMDB offers overlay datasets, which enable you to:

Make changes in a separate partition without overwriting your production data.
See your changes in context with the unchanged portions of your data.
Avoid duplicating your entire production dataset.
Create multiple overlay datasets that reuse one set of reconciliation definitions for merging each into the production dataset.

Warning

Overlay dataset functionality applies only to BMC Helix CMDB API clients. If you use the BMC Atrium Core Console or the class forms to view or modify instances in an overlay dataset, you receive unpredictable results and can compromise data integrity.

How overlays work

When you create an overlay dataset, you must specify an existing regular dataset for it to overlay. Although an overlay dataset starts out empty like any other dataset, any request for an instance in the overlay dataset passes through the overlay dataset and returns that instance from the underlying dataset.

When you modify an instance in the overlay dataset the first time, it is copied there from the underlying dataset with your modifications. You can also create instances in the overlay dataset. The underlying dataset still holds the unmodified versions of its original instances, but it does not hold the newly created instances. A request to the overlay dataset for a new or modified instance returns that instance from the overlay dataset, and a request to the overlay dataset for an unmodified instance returns it from the underlying dataset.

Note

Requests made to the underlying dataset always return instances from that dataset, never from an overlay dataset.

The following figure illustrates queries against an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID.

Query to an overlay dataset

Result of deleting instances from an overlay dataset

If you attempt to delete from an overlay dataset an instance that actually exists there, the instance is deleted only from the overlay dataset and remains in the underlying dataset. If you attempt to delete from an overlay dataset an instance that exists only in the underlying dataset, the instance is deleted from the underlying dataset.

You can mark an instance as deleted regardless of whether it already exists in the overlay dataset. In either case, this results in an instance in the overlay dataset that is marked as deleted.

Instance ID and identity in overlay datasets

When an instance is first created in an overlay dataset as the result of a modification, it retains the reconciliation identity of the instance in the underlying dataset, but is assigned a new instance ID.

If the underlying instance has not yet been identified when it is modified in the overlay dataset, the instance has no reconciliation identity in either dataset. This is not a problem. When you eventually identify and merge the two datasets, your Identify rules should be able to match these instances so that they receive the same identity.

Warning

For each modification that you make to an instance before it is identified, an instance is created in the overlay dataset. You should identify instances before modifying them a second time in the overlay dataset.

If you decide to keep the changes that you modeled in an overlay dataset, you can merge them into the underlying regular dataset.