Datasets to partition data


Partitioning in BMC Helix CMDB organizes configuration data into subsets called datasets, each representing logical groups of Configuration Items (CIs) and relationships. The same real-world object or relationship can appear in multiple datasets, such as when different discovery tools create CI instances in separate datasets. These can later be merged into a single production dataset for validation and correction.

Datasets primarily serve to partition data by source. For example, discovery tools store their  discovered data in separate datasets. Imported data through Atrium Integrator resides in its own dataset.

Datasets can also represent the following aspects:

  • Intended vs. actual configurations for verification.
  • Production, test, or obsolete data.
  • Multitenancy, such as data from different companies.
  • Subsets like departments or regions.

Each CI or relationship in BMC Helix CMDB must belong to a dataset, identified by a unique DatasetId. Instances of the same CI in different datasets share a common reconciliation ID for consistency across contexts.


Default datasets in BMC Helix CMDB

This section describes the default datasets provided by BMC Helix CMDB, BMC Helix ITSM: Asset Management, and BMC Discovery products. If you use a non-BMC Discovery product, use Atrium Integratorto import data. 

The following image represents a few data sources and the datasets that they populate:

dataset partition.jpg

The most important dataset is the golden or production dataset. By default, that is the BMC Asset (BMC.ASSET) dataset. This dataset is treated as the single source of truth for the CIs in your organization. The other datasets are a staging area for the data that the organization's other tools and applications refer to collect about information in the environment. The following table shows lists of some of the common datasets and the recommended purpose:

The following table provides the details of the default BMC Helix CMDB datasets used by discovery products:

Scenario: How Apex Global partitions data into datasets

Scenario

Apex Global needs to discover various hardware and software, bring relevant information from the payroll services, compare this data and put preferred pieces of information in the production dataset.

Apex Global uses the BMC Client Management product to discover the desktop and laptop computer systems, (including hardware and software) used by Apex Global employees. This information is stored in the BMC Configuration Import dataset of BMC Helix CMDB.

Apex Global also uses BMC Discovery to discover information about the servers, software, and other devices used to deliver banking information to Apex Global customers. This data is stored in the BMC.ADDM dataset.

Lastly, Apex Global uses a third-party discovery tool to collect information about the equipment that supports the corporate payroll services. Apex Global uses Atrium Integrator to bring relevant data from the payroll database into BMC Helix CMDB. The administrator creates a new dataset named Calbro Payroll specifically for this information.

Because some of the instances in these different datasets might represent the same real-world CIs, the administrator configures BMC Helix CMDB reconciliation jobs to compare those datasets against each other and put the preferred pieces of information in the production BMC Asset dataset.

Apex Global gathers and consolidates the following data from various discovery data sources to effectively manage information about hardware, software, and payroll services. 

  • Employee Devices—Apex Global uses the BMC Client Management product to discover desktop and laptop systems, including their hardware and software.
    This information is stored in the BMC Configuration Import dataset within BMC Helix CMDB.
  • IT Infrastructure—To manage servers, softwares, and devices information used in its IT infrastructure, Apex Global uses BMC Helix Discovery.
    This data is stored in the BMC.ADDM dataset.
  • Payroll Services—For corporate payroll services, a third-party discovery tool collects information about supporting equipment. Relevant data from the payroll database is imported into BMC Helix CMDB using Atrium Integrator, and a dedicated dataset named "AG Payroll" is created for this purpose.

The administrator configures reconciliation jobs in BMC Helix CMDB for consistency and to eliminate duplication. These jobs compare data across the different datasets and consolidate the preferred information into the production BMC.ASSET dataset.


Overlay data
sets in BMC Helix CMDB

Use overlay datasets in BMC Helix CMDB for the following tasks:

  • Make changes in a separate partition without overwriting your production data.
  • See your changes in context with the unchanged portions of your data.
  • Avoid duplicating your entire production dataset.
  • Create multiple overlay datasets that reuse one set of reconciliation definitions for merging each into the production dataset.
Warning

Overlay dataset functionality applies only to BMC Helix CMDB API clients.

How overlays work

You create an overlay dataset by specifying an existing regular dataset (the underlying dataset), and it starts empty. When you request a CI in the overlay dataset, it first checks the overlay and retrieves the CI from the underlying dataset if it isn’t in the overlay. Also, when you send a request for CIs to the underlying dataset, the CIs are retrieved from that dataset and not from the overlay dataset.

When you modify a CI in the overlay dataset for the first time, the overlay copies it from the underlying dataset and applies your modifications to the overlay version. You can also create new CIs directly in the overlay dataset, which do not exist in the underlying dataset.

When you request a modified or newly created CI from the overlay dataset, it retrieves it from the overlay. If you request an unmodified CI, the overlay retrieves it from the underlying dataset. This approach keeps the underlying dataset unchanged while letting you make modifications and additions in the overlay.


The following figure illustrates queries made to an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID:

Overlay datasets.jpg

Deleting instances from an overlay dataset

If you delete a CI from an overlay dataset, the behavior depends on where the CI exists. The CI is deleted according to the following conditions:

  • If the CI exists in the overlay dataset, it is removed from the overlay dataset but remains unchanged in the underlying dataset.
  • If the CI exists only in the underlying dataset, the deletion applies to that dataset and removes the CI entirely.

You can also mark a CI as deleted in the overlay dataset, regardless of where it exists. In this case, the overlay dataset creates or updates the CI as a deleted entry, leaving the underlying dataset unaffected.

Instance ID and identity in overlay datasets

When you modify a CI (instance) for the first time in an overlay dataset, it is created in the overlay with the same reconciliation identity as the CI in the underlying dataset but is assigned a new instance ID.  

If the underlying CI has not been identified at the time of modification in the overlay dataset, it will not have a reconciliation identity in both datasets. When you later identify and merge the datasets, your Identify rules should match these CIs correctly, ensuring they share the same reconciliation identity.

Warning

For each modification that you make to an instance before it is identified, an instance is created in the overlay dataset. You should identify instances before modifying them a second time in the overlay dataset.

If you keep the changes you modeled in an overlay dataset, you can merge them into the underlying regular dataset.


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*