Datasets to partition data
Partitioning is dividing your configuration data into subsets, each representing a logical group of configuration items (CIs) and relationships. In BMC Atrium Core, these partitions are called datasets. The same real-world object or relationship can be represented by instances in more than one dataset. For example, different discovery applications can create CI and relationship instances in different datasets. You can later merge those instances into a single production dataset.
This is important for the goal of verifying and correcting configuration records against the infrastructure. You can create one dataset representing your intended configuration, then use a discovery application to create another dataset representing your actual configuration, and verify the former against the latter.
The primary purpose of datasets is to partition data according to the providers of that data. Each discovery application that you use should store the data that it discovers in a separate dataset. Data in a separate database that you import to BMC Configuration Management Database (BMC CMDB) through Atrium Integrator should be stored in a separate dataset.
You can also use datasets for other methods of partitioning data. For example, you could use datasets to represent production data or obsolete data. Your datasets do not all need to contain different versions of the same CIs and relationships. For example, you could use datasets to hold:
- Subsets of your overall data, such as departments or regions
- Data from different companies for multitenancy
- Test data
A dataset can contain only one instance of a given CI. An instance of that CI might also exist in other datasets to represent the CI in the contexts of those datasets. Instances representing the same CI or relationship across datasets share the same reconciliation identity, or reconciliation ID.
Each CI and relationship in BMC CMDB must reside in a dataset, meaning that they have a
DatasetId attribute that must contain a value.
BMC datasets in BMC CMDB
This section describes the default datasets provided by BMC CMDB, BMC Asset Management, and BMC discovery products. If you use a non-BMC discovery product, use Atrium Integrator to import its data.
Default BMC CMDB datasets used by BMC discovery products
Data created by
Production dataset, which is the dataset that you treat as your single source of reference and that you use to make business decisions.
A safe place to do testing.
BMC Asset Management
If BMC Asset Management is configured with a sandbox dataset, CIs that you manually create or modify flow through the sandbox dataset; otherwise, CIs go directly into the production dataset.
BMC BladeLogic Client Automation
BMC Configuration Import
Import CIs and relationships from the BMC BladeLogic Client Automation database for reconciliation.
Import CIs and relationships from the BMC Discovery data store for reconciliation.
Scenario: How Calbro Services partitions data into datasets
Calbro Services needs to discover various hardware and software, bring relevant information from the payroll services, compare this data and put preferred pieces of information in the production dataset.
Calbro Services uses the BMC BladeLogic Client Automation product to discover the desktop and laptop computer systems, (including hardware and software) used by Calbro Services employees. This information is stored in the BMC Configuration Import dataset of BMC CMDB.
Calbro Services also uses BMC Discovery to discover information about the servers, software, and other devices used to deliver banking information to Calbro Services customers. This data is stored in the BMC.ADDM dataset.
Lastly, Calbro Services uses a third-party discovery tool to collect information about the equipment that supports the corporate payroll services. Calbro Services uses Atrium Integrator to bring relevant data from the payroll database into BMC CMDB. The administrator creates a new dataset named Calbro Payroll specifically for this information.
Because some of the instances in these different datasets might represent the same real-world CIs, the administrator configures BMC CMDB reconciliation jobs to compare those datasets against each other and put the preferred pieces of information in the production BMC Asset dataset.
sets in BMC CMDBOverlay data
BMC CMDB offers overlay datasets, which enable you to:
- Make changes in a separate partition without overwriting your production data.
- See your changes in context with the unchanged portions of your data.
- Avoid duplicating your entire production dataset.
- Create multiple overlay datasets that reuse one set of reconciliation definitions for merging each into the production dataset.
Overlay dataset functionality applies only to BMC CMDB API clients. If you use the BMC Atrium Core Console or the class forms to view or modify instances in an overlay dataset, you receive unpredictable results and can compromise data integrity.
How overlays work
When you create an overlay dataset, you must specify an existing regular dataset for it to overlay. Although an overlay dataset starts out empty like any other dataset, any request for an instance in the overlay dataset passes through the overlay dataset and returns that instance from the underlying dataset.
When you modify an instance in the overlay dataset the first time, it is copied there from the underlying dataset with your modifications. You can also create instances in the overlay dataset. The underlying dataset still holds the unmodified versions of its original instances, but it does not hold the newly created instances. A request to the overlay dataset for a new or modified instance returns that instance from the overlay dataset, and a request to the overlay dataset for an unmodified instance returns it from the underlying dataset.
Requests made to the underlying dataset always return instances from that dataset, never from an overlay dataset.
The following figure illustrates queries against an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID.
Query to an overlay dataset
Result of deleting instances from an overlay dataset
If you attempt to delete from an overlay dataset an instance that actually exists there, the instance is deleted only from the overlay dataset and remains in the underlying dataset. If you attempt to delete from an overlay dataset an instance that exists only in the underlying dataset, the instance is deleted from the underlying dataset.
You can mark an instance as deleted regardless of whether it already exists in the overlay dataset. In either case, this results in an instance in the overlay dataset that is marked as deleted.
Instance ID and identity in overlay datasets
When an instance is first created in an overlay dataset as the result of a modification, it retains the reconciliation identity of the instance in the underlying dataset, but is assigned a new instance ID.
If the underlying instance has not yet been identified when it is modified in the overlay dataset, the instance has no reconciliation identity in either dataset. This is not a problem. When you eventually identify and merge the two datasets, your Identify rules should be able to match these instances so that they receive the same identity.
For each modification that you make to an instance before it is identified, an instance is created in the overlay dataset. You should identify instances before modifying them a second time in the overlay dataset.
If you decide to keep the changes that you modeled in an overlay dataset, you can merge them into the underlying regular dataset.