Datasets to partition data


Partitioning in  organizes configuration data into subsets called datasets, each representing logical groups of Configuration Items (CIs) and relationships. The same real-world object or relationship can appear in multiple datasets, such as when different discovery tools create CI instances in separate datasets. These can later be merged into a single production dataset for validation and correction.

Datasets primarily serve to partition data by source. For example, discovery tools store their  discovered data in separate datasets. Imported data through

Some content is unavailable due to permissions.

resides in its own dataset.

Datasets can also represent the following aspects:

  • Intended vs. actual configurations for verification.
  • Production, test, or obsolete data.
  • Multitenancy, such as data from different companies.
  • Subsets like departments or regions.

Each CI or relationship in  must belong to a dataset, identified by a unique DatasetId. Instances of the same CI in different datasets share a common reconciliation ID for consistency across contexts.

Default datasets in 

This section describes the default datasets provided by , , and 

Some content is unavailable due to permissions.

products. If you use a non-

Some content is unavailable due to permissions.

product, use

Some content is unavailable due to permissions.

to import data. 

The following image represents a few data sources and the datasets that they populate:

dataset partition.jpg

The most important dataset is the golden or production dataset. By default, it is the BMC Asset (BMC.ASSET) dataset. This dataset is treated as the single source of truth for the CIs in your organization. The other datasets serve as a staging area for data that the tools and applications of your organization use to gather information about the environment. The following table shows lists of some of the common datasets and the recommended purpose:

The following table provides the details of the default  datasets used by discovery products:

Data created by

Dataset name

Dataset ID

Purpose

BMC Asset

BMC.ASSET

Important: Do not update the production dataset directly.

This is the dataset that you treat as your single source of reference and that you use to make business decisions.

The consuming applications consume the CIs from this dataset.

BMC Sample

BMC.SAMPLE

This is a staging dataset.

A safe place to do testing.

BMC.ASSET.SANDBOX

BMC.ASSET.SANDBOX

If  is configured with a sandbox dataset, CIs that you manually create or modify flow through the sandbox dataset; otherwise, CIs go directly into the production dataset.

BMC Client Management

BMC Configuration Import

BMC.IMPORT.CONFIG

Import CIs and relationships from the BMC Client Management database for reconciliation.

Some content is unavailable due to permissions.

BMC.ADDM

BMC.ADDM

Import CIs and relationships from the

Some content is unavailable due to permissions.

data store for reconciliation.

BMC Atrium Explorer Asset

BMC.AE.ASSET

This is a staging dataset.

When a user edits a CI in BMC.ASSET dataset (production dataset), CMDB creates a sandbox for that user. For example, if your user name is Chris, then BMC.AE.SB.Chris dataset is created for internal use. When the user promotes the CI changes, the changes are copied to BMC.ASSET dataset and the entries are deleted from this internal dataset.

BMC Atrium Explorer

BMC.AE

This dataset reconciles data from sandbox datasets to BMC.ASSET. Atrium Explorer - Identification and Merge reconciliation job of this dataset is used to reconcile data from sandbox datasets to BMC.ASSET.

BMC.SANDBOX.DSM

BMC.SANDBOX.DSM

When a relationship is created by using Dynamic Service Modeling (DSM) based on a given qualification, the relationship is first created in the BMC.SANDBOX.DSM dataset. The relationship is later reconciled to the BMC.ASSET dataset.

Scenario: How Apex Global partitions data into datasets

Scenario

Apex Global needs to discover various hardware and software, bring relevant information from the payroll services, compare this data and put preferred pieces of information in the production dataset.

Apex Global uses the BMC Client Management product to discover the desktop and laptop computer systems (including hardware and software) used by Apex Global employees. This information is stored in the BMC Configuration Import dataset of .

Apex Global also uses

Some content is unavailable due to permissions.

to discover information about the servers, software, and other devices used to deliver banking information to Apex Global customers. This data is stored in the BMC.ADDM dataset.

Lastly, Apex Global uses a third-party discovery tool to collect information about the equipment that supports the corporate payroll services. Apex Global uses

Some content is unavailable due to permissions.

to bring relevant data from the payroll database into . The administrator creates a new dataset named Calbro Payroll specifically for this information.

Because some of the instances in these different datasets might represent the same real-world CIs, the administrator configures  reconciliation jobs to compare those datasets against each other and put the preferred pieces of information in the production BMC Asset dataset.

Apex Global gathers and consolidates the following data from various data discovery sources to effectively manage information about hardware, software, and payroll services: 

  • Employee Devices—Apex Global uses the BMC Client Management product to discover desktop and laptop systems, including their hardware and software.
    This information is stored in the BMC Configuration Import dataset within .
  • IT Infrastructure—To manage servers, softwares, and devices information used in its IT infrastructure, Apex Global uses

    Some content is unavailable due to permissions.

    .
    This data is stored in the BMC.ADDM dataset.
  • Payroll Services—For corporate payroll services, a third-party discovery tool collects information about supporting equipment. Relevant data from the payroll database is imported into  by using

    Some content is unavailable due to permissions.

    , and a dedicated dataset named AG Payroll is created for this purpose.

The administrator configures reconciliation jobs in  for consistency and to eliminate duplication. These jobs compare data across the different datasets and consolidate the preferred information into the production BMC.ASSET dataset.


Overlay datasets in 

Use overlay datasets in  for the following tasks:

  • Making changes in a separate partition without overwriting your production data.
  • Viewing your changes in context with the unchanged portions of your data.
  • Avoiding duplication of the entire production dataset. 
  • Creating multiple overlay datasets that reuse one set of reconciliation definitions for merging each dataset into the production dataset.
Warning

Overlay dataset functionality applies only to  API clients.

How overlays work

You create an overlay dataset by specifying an existing regular dataset (the underlying dataset), and it starts empty. When you request a CI from the overlay dataset, it first checks the overlay and retrieves the CI from the underlying dataset if it isn’t in the overlay. Also, when you send a request for CIs to the underlying dataset, the CIs are retrieved from the underlying dataset and not from the overlay dataset.

When you modify a CI in the overlay dataset for the first time, the overlay copies it from the underlying dataset and applies your modifications to the overlay version. You can also create new CIs directly in the overlay dataset, which do not exist in the underlying dataset.

When you request a modified or newly created CI from the overlay dataset, it retrieves the CI from the overlay. If you request an unmodified CI, the overlay retrieves it from the underlying dataset. This approach keeps the underlying dataset unchanged while letting you make modifications and additions in the overlay.

The following figure illustrates queries made to an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID:

Overlay datasets.jpg

Deleting instances from an overlay dataset

If you delete a CI instance from an overlay dataset, the behavior depends on where the CI exists. The CI is deleted according to the following conditions:

  • If the CI exists in the overlay dataset, it is removed from the overlay dataset but remains unchanged in the underlying dataset.
  • If the CI exists only in the underlying dataset, the deletion applies to that dataset and the CI is removed entirely.

You can also mark a CI as deleted in the overlay dataset, regardless of where it exists. In this case, the overlay dataset creates or updates the CI as a deleted entry, leaving the underlying dataset unaffected.

Instance ID and identity in overlay datasets

When you modify a CI instance for the first time in an overlay dataset, it is created in the overlay with the same reconciliation identity as the CI in the underlying dataset but is assigned a new instance ID.  

If the underlying CI has not been identified at the time of modification in the overlay dataset, it will not have a reconciliation identity in both datasets. When you later identify and merge the datasets, your Identify rules should match these CIs correctly, ensuring they share the same reconciliation identity.

Warning

For each modification that you make to an instance before it is identified, an instance is created in the overlay dataset. You must identify instances before modifying them a second time in the overlay dataset.

If you keep the changes you modeled in an overlay dataset, you can merge them into the underlying regular dataset.

 

 

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*