Datasets to partition data

Partitioning is dividing your configuration data into subsets, each representing a logical group of configuration items (CIs) and relationships. In BMC CMDB, these partitions are called datasets. The same real-world object or relationship can be represented by instances in more than one dataset. For example, different discovery applications can create CI and relationship instances in different datasets. You can later merge those instances into a single production dataset.

This is important for the goal of verifying and correcting configuration records against the infrastructure. You can create one dataset representing your intended configuration, then use a discovery application to create another dataset representing your actual configuration, and verify the former against the latter.

Managing data sources and datasets in BMC CMDB

Overlay datasets

The primary purpose of datasets is to partition data according to the providers of that data. Each discovery application that you use should store the data that it discovers in a separate dataset. Data in a separate database that you import to BMC Configuration Management Database (BMC CMDB) through Atrium Integrator should be stored in a separate dataset.

You can also use datasets for other methods of partitioning data. For example, you could use datasets to represent production data or obsolete data. Your datasets do not all need to contain different versions of the same CIs and relationships. For example, you could use datasets to hold:

Subsets of your overall data, such as departments or regions
Data from different companies for multitenancy
Test data

A dataset can contain only one instance of a given CI. An instance of that CI might also exist in other datasets to represent the CI in the contexts of those datasets. Instances representing the same CI or relationship across datasets share the same reconciliation identity, or reconciliation ID.

Each CI and relationship in BMC CMDB must reside in a dataset, meaning that they have a DatasetId attribute that must contain a value.

BMC datasets in BMC CMDB

This section describes the default datasets provided by BMC CMDB, Remedy Asset Management, and BMC Discovery products. If you use a non-BMC Discovery product, use Atrium Integratorto import its data.

Default BMC CMDB datasets used by BMC Discovery products

Data created by	Dataset name	Dataset ID	Purpose
BMC CMDB	BMC Asset	BMC.ASSET	Production dataset, which is the dataset that you treat as your single source of reference and that you use to make business decisions.
BMC CMDB	BMC Sample	BMC.SAMPLE	A safe place to do testing.
Remedy Asset Management	BMC.ASSET.SANDBOX	BMC.ASSET.SANDBOX	If Remedy Asset Management is configured with a sandbox dataset, CIs that you manually create or modify flow through the sandbox dataset; otherwise, CIs go directly into the production dataset.
BMC BladeLogic Client Automation	BMC Configuration Import	BMC.IMPORT.CONFIG	Import CIs and relationships from the BMC BladeLogic Client Automation database for reconciliation.
BMC Discovery	(User-defined)	BMC.ADDM	Import CIs and relationships from the BMC Discovery data store for reconciliation.

Scenario: How Calbro Services partitions data into datasets

Scenario

Calbro Services needs to discover various hardware and software, bring relevant information from the payroll services, compare this data and put preferred pieces of information in the production dataset.

Calbro Services uses the BMC BladeLogic Client Automation product to discover the desktop and laptop computer systems, (including hardware and software) used by Calbro Services employees. This information is stored in the BMC Configuration Import dataset of BMC CMDB.

Calbro Services also uses BMC Discovery to discover information about the servers, software, and other devices used to deliver banking information to Calbro Services customers. This data is stored in the BMC.ADDM dataset.

Lastly, Calbro Services uses a third-party discovery tool to collect information about the equipment that supports the corporate payroll services. Calbro Services uses Atrium Integrator to bring relevant data from the payroll database into BMC CMDB. The administrator creates a new dataset named Calbro Payroll specifically for this information.

Because some of the instances in these different datasets might represent the same real-world CIs, the administrator configures BMC CMDB reconciliation jobs to compare those datasets against each other and put the preferred pieces of information in the production BMC Asset dataset.

Overlay datasets in BMC CMDB

BMC CMDB offers overlay datasets, which enable you to:

Make changes in a separate partition without overwriting your production data.
See your changes in context with the unchanged portions of your data.
Avoid duplicating your entire production dataset.
Create multiple overlay datasets that reuse one set of reconciliation definitions for merging each into the production dataset.

Warning

Overlay dataset functionality applies only to BMC CMDB API clients. If you use the BMC Atrium Core Console or the class forms to view or modify instances in an overlay dataset, you receive unpredictable results and can compromise data integrity.

How overlays work

When you create an overlay dataset, you must specify an existing regular dataset for it to overlay. Although an overlay dataset starts out empty like any other dataset, any request for an instance in the overlay dataset passes through the overlay dataset and returns that instance from the underlying dataset.

When you modify an instance in the overlay dataset the first time, it is copied there from the underlying dataset with your modifications. You can also create instances in the overlay dataset. The underlying dataset still holds the unmodified versions of its original instances, but it does not hold the newly created instances. A request to the overlay dataset for a new or modified instance returns that instance from the overlay dataset, and a request to the overlay dataset for an unmodified instance returns it from the underlying dataset.

Note

Requests made to the underlying dataset always return instances from that dataset, never from an overlay dataset.

The following figure illustrates queries against an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID.

Query to an overlay dataset

Result of deleting instances from an overlay dataset

If you attempt to delete from an overlay dataset an instance that actually exists there, the instance is deleted only from the overlay dataset and remains in the underlying dataset. If you attempt to delete from an overlay dataset an instance that exists only in the underlying dataset, the instance is deleted from the underlying dataset.

You can mark an instance as deleted regardless of whether it already exists in the overlay dataset. In either case, this results in an instance in the overlay dataset that is marked as deleted.

Instance ID and identity in overlay datasets

When an instance is first created in an overlay dataset as the result of a modification, it retains the reconciliation identity of the instance in the underlying dataset, but is assigned a new instance ID.

If the underlying instance has not yet been identified when it is modified in the overlay dataset, the instance has no reconciliation identity in either dataset. This is not a problem. When you eventually identify and merge the two datasets, your Identify rules should be able to match these instances so that they receive the same identity.