Datasets to partition data

Partitioning in BMC Helix CMDB organizes configuration data into subsets called datasets, each representing logical groups of Configuration Items (CIs) and relationships. The same real-world object or relationship can appear in multiple datasets, such as when different discovery tools create CI instances in separate datasets. These can later be merged into a single production dataset for validation and correction.

Datasets primarily serve to partition data by source. For example, discovery tools store their discovered data in separate datasets. Imported data through Atrium Integrator resides in its own dataset.

Default datasets in BMC Helix CMDB

This section describes the default datasets provided by BMC Helix CMDB, BMC Helix ITSM: Asset Management, and BMC Discovery products. If you use a non-BMC Discovery product, use Atrium Integrator to import data.

The following image represents a few data sources and the datasets that they populate:

dataset partition.jpg

The most important dataset is the golden or production dataset. By default, it is the BMC Asset (BMC.ASSET) dataset. This dataset is treated as the single source of truth for the CIs in your organization. The other datasets serve as a staging area for data that the tools and applications of your organization use to gather information about the environment. The following table shows lists of some of the common datasets and the recommended purpose:

The following table provides the details of the default BMC Helix CMDB datasets used by discovery products:

Data created by	Dataset name	Dataset ID	Purpose
BMC Helix CMDB	BMC Asset	BMC.ASSET	Important: Do not update the production dataset directly. This is the dataset that you treat as your single source of reference and that you use to make business decisions. The consuming applications consume the CIs from this dataset.
BMC Helix CMDB	BMC Sample	BMC.SAMPLE	This is a staging dataset. A safe place to do testing.
BMC Helix ITSM: Asset Management	BMC.ASSET.SANDBOX	BMC.ASSET.SANDBOX	If BMC Helix ITSM: Asset Management is configured with a sandbox dataset, CIs that you manually create or modify flow through the sandbox dataset; otherwise, CIs go directly into the production dataset.
BMC Client Management	BMC Configuration Import	BMC.IMPORT.CONFIG	Import CIs and relationships from the BMC Client Management database for reconciliation.
BMC Discovery	BMC.ADDM	BMC.ADDM	Import CIs and relationships from the BMC Discovery data store for reconciliation.
BMC Helix CMDB	BMC Atrium Explorer Asset	BMC.AE.ASSET	This is a staging dataset. When a user edits a CI in BMC.ASSET dataset (production dataset), CMDB creates a sandbox for that user. For example, if your user name is Chris, then BMC.AE.SB.Chris dataset is created for internal use. When the user promotes the CI changes, the changes are copied to BMC.ASSET dataset and the entries are deleted from this internal dataset.
BMC Helix CMDB	BMC Atrium Explorer	BMC.AE	This dataset reconciles data from sandbox datasets to BMC.ASSET. Atrium Explorer - Identification and Merge reconciliation job of this dataset is used to reconcile data from sandbox datasets to BMC.ASSET.
BMC Helix CMDB	BMC.SANDBOX.DSM	BMC.SANDBOX.DSM	When a relationship is created by using Dynamic Service Modeling (DSM) based on a given qualification, the relationship is first created in the BMC.SANDBOX.DSM dataset. The relationship is later reconciled to the BMC.ASSET dataset.

Scenario: How Apex Global partitions data into datasets

Scenario

Apex Global needs to discover various hardware and software, bring relevant information from the payroll services, compare this data and put preferred pieces of information in the production dataset.

Apex Global uses the BMC Client Management product to discover the desktop and laptop computer systems (including hardware and software) used by Apex Global employees. This information is stored in the BMC Configuration Import dataset of BMC Helix CMDB.

Apex Global also uses BMC Discovery to discover information about the servers, software, and other devices used to deliver banking information to Apex Global customers. This data is stored in the BMC.ADDM dataset.

Lastly, Apex Global uses a third-party discovery tool to collect information about the equipment that supports the corporate payroll services. Apex Global uses Atrium Integrator to bring relevant data from the payroll database into BMC Helix CMDB. The administrator creates a new dataset named Calbro Payroll specifically for this information.

Because some of the instances in these different datasets might represent the same real-world CIs, the administrator configures BMC Helix CMDB reconciliation jobs to compare those datasets against each other and put the preferred pieces of information in the production BMC Asset dataset.

Apex Global gathers and consolidates the following data from various data discovery sources to effectively manage information about hardware, software, and payroll services:

Employee Devices—Apex Global uses the BMC Client Management product to discover desktop and laptop systems, including their hardware and software.
This information is stored in the BMC Configuration Import dataset within BMC Helix CMDB.
IT Infrastructure—To manage servers, softwares, and devices information used in its IT infrastructure, Apex Global uses BMC Helix Discovery.
This data is stored in the BMC.ADDM dataset.
Payroll Services—For corporate payroll services, a third-party discovery tool collects information about supporting equipment. Relevant data from the payroll database is imported into BMC Helix CMDB by using Atrium Integrator, and a dedicated dataset named AG Payroll is created for this purpose.

The administrator configures reconciliation jobs in BMC Helix CMDB for consistency and to eliminate duplication. These jobs compare data across the different datasets and consolidate the preferred information into the production BMC.ASSET dataset.

Overlay datasets in BMC Helix CMDB

Use overlay datasets in BMC Helix CMDB for the following tasks:

Making changes in a separate partition without overwriting your production data.
Viewing your changes in context with the unchanged portions of your data.
Avoiding duplication of the entire production dataset.
Creating multiple overlay datasets that reuse one set of reconciliation definitions for merging each dataset into the production dataset.

Warning

Overlay dataset functionality applies only to BMC Helix CMDB API clients.

How overlays work

You create an overlay dataset by specifying an existing regular dataset (the underlying dataset), and it starts empty. When you request a CI from the overlay dataset, it first checks the overlay and retrieves the CI from the underlying dataset if it isn’t in the overlay. Also, when you send a request for CIs to the underlying dataset, the CIs are retrieved from the underlying dataset and not from the overlay dataset.

When you modify a CI in the overlay dataset for the first time, the overlay copies it from the underlying dataset and applies your modifications to the overlay version. You can also create new CIs directly in the overlay dataset, which do not exist in the underlying dataset.

When you request a modified or newly created CI from the overlay dataset, it retrieves the CI from the overlay. If you request an unmodified CI, the overlay retrieves it from the underlying dataset. This approach keeps the underlying dataset unchanged while letting you make modifications and additions in the overlay.

The following figure illustrates queries made to an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID:

Overlay datasets.jpg

Deleting instances from an overlay dataset

If you delete a CI instance from an overlay dataset, the behavior depends on where the CI exists. The CI is deleted according to the following conditions:

If the CI exists in the overlay dataset, it is removed from the overlay dataset but remains unchanged in the underlying dataset.
If the CI exists only in the underlying dataset, the deletion applies to that dataset and the CI is removed entirely.

You can also mark a CI as deleted in the overlay dataset, regardless of where it exists. In this case, the overlay dataset creates or updates the CI as a deleted entry, leaving the underlying dataset unaffected.

Instance ID and identity in overlay datasets

When you modify a CI instance for the first time in an overlay dataset, it is created in the overlay with the same reconciliation identity as the CI in the underlying dataset but is assigned a new instance ID.

If the underlying CI has not been identified at the time of modification in the overlay dataset, it will not have a reconciliation identity in both datasets. When you later identify and merge the datasets, your Identify rules should match these CIs correctly, ensuring they share the same reconciliation identity.

Warning

For each modification that you make to an instance before it is identified, an instance is created in the overlay dataset. You must identify instances before modifying them a second time in the overlay dataset.

If you keep the changes you modeled in an overlay dataset, you can merge them into the underlying regular dataset.