Grouping CIs into datasets
This section aligns with Stage 3, Step 17, "Task 5. Group CIs into datasets," of the Step-by-Step Guide to Building a CMDB. A dataset is a collection of CIs and relationships for a given purpose. Together they form a picture of some state, time, or configuration.
The primary purpose of datasets is to partition data according to the providers of that data. Each discovery application that you use should store the data that it discovers in a separate dataset. Data in a separate database that you import to BMC Atrium CMDB through Atrium Integrator should be stored in a separate dataset.
You can also use datasets for other methods of partitioning data. For example, you could use datasets to represent production data or obsolete data. Your datasets do not all need to contain different versions of the same CIs and relationships. For example, you could use datasets to hold:
- Subsets of your overall data, such as departments or regions
- Data from different companies for multitenancy
- Test data
A dataset can contain only one instance of a given CI. An instance of that CI might also exist in other datasets to represent the CI in the contexts of those datasets. Instances representing the same CI or relationship across datasets share the same reconciliation identity, or reconciliation ID.
Each CI and relationship in BMC Atrium CMDB must reside in a dataset, meaning that they have a DatasetId attribute that must contain a value.
This topic discusses the following:
BMC datasets in BMC Atrium CMDB
This section describes the default datasets provided by BMC Atrium CMDB, BMC Asset Management, and BMC discovery products. If you use a non-BMC discovery product, use Atrium Integrator to import its data.
Default BMC Atrium CMDB datasets used by BMC discovery products
Data created by
BMC Atrium CMDB
Production dataset, which is the dataset that you treat as your "single source of reference" and that you use to make business decisions.
BMC Atrium CMDB
A safe place to do testing.
BMC Asset Management
If BMC Asset Management is configured with a sandbox dataset, CIs that you manually create or modify flow through the sandbox dataset; otherwise, CIs go directly into the production dataset.
BMC BladeLogic Client Automation
BMC Configuration Import
Import CIs and relationships from the BMC BladeLogic Client Automation database for reconciliation.
BMC Atrium Discovery and Dependency Mapping
Import CIs and relationships from the BMC Atrium Discovery and Dependency Mapping data store for reconciliation.
How Calbro Services partitions data into datasets
Calbro Services uses the BMC BladeLogic Client Automation product to discover the desktop and laptop computer systems, (including hardware and software) used by Calbro Services employees. This information is stored in the BMC Configuration Import dataset of BMC Atrium CMDB.
Calbro Services also uses BMC Atrium Discovery and Dependency Mapping to discover information about the servers, software, and other devices used to deliver banking information to Calbro Services customers. This data is stored in the BMC.ADDM dataset.
Lastly, Calbro Services uses a third-party discovery tool to collect information about the equipment that supports the corporate payroll services. Calbro Services uses Atrium Integrator to bring relevant data from the payroll database into BMC Atrium CMDB. The administrator creates a new dataset named Calbro Payroll specifically for this information.
Because some of the instances in these different datasets might represent the same real-world CIs, the administrator configures BMC Atrium CMDB reconciliation jobs to compare those datasets against each other and put the preferred pieces of information in the production BMC Asset dataset.
Overlay datasets in BMC Atrium CMDB
BMC Atrium CMDB offers overlay datasets, which enable you to:
- Make changes in a separate partition without overwriting your production data.
- See your changes in context with the unchanged portions of your data.
- Avoid duplicating your entire production dataset.
- Create multiple overlay datasets that reuse one set of reconciliation definitions for merging each into the production dataset.
Overlay dataset functionality applies only to BMC Atrium CMDB API clients. If you use the BMC Atrium Core Console or the class forms to view or modify instances in an overlay dataset, you receive unpredictable results and can compromise data integrity.
How overlays work
When you create an overlay dataset, you must specify an existing regular dataset for it to overlay. Although an overlay dataset starts out empty like any other dataset, any request for an instance in the overlay dataset passes through the overlay dataset and returns that instance from the underlying dataset.
When you modify an instance in the overlay dataset the first time, it is copied there from the underlying dataset with your modifications. You can also create instances in the overlay dataset. The underlying dataset still holds the unmodified versions of its original instances, but it does not hold the newly created instances. A request to the overlay dataset for a new or modified instance returns that instance from the overlay dataset, and a request to the overlay dataset for an unmodified instance returns it from the underlying dataset.
Requests made to the underlying dataset always return instances from that dataset, never from an overlay dataset.
The following figure illustrates queries against an overlay dataset, one for a modified instance and one for an unmodified instance. Notice that the modified instance shares the same reconciliation ID with its unmodified counterpart, but not the same instance ID.
Query to an overlay dataset
Use an overlay dataset to make changes during a day, and then reconcile it into your production dataset at the end of the day when the change requests for them are approved.
Result of deleting instances from an overlay dataset
If you attempt to delete from an overlay dataset an instance that actually exists there, the instance is deleted only from the overlay dataset and remains in the underlying dataset. If you attempt to delete from an overlay dataset an instance that exists only in the underlying dataset, the instance is deleted from the underlying dataset.
You can mark an instance as deleted regardless of whether it already exists in the overlay dataset. In either case, this results in an instance in the overlay dataset that is marked as deleted.
Instance ID and identity in overlay datasets
When an instance is first created in an overlay dataset as the result of a modification, it retains the reconciliation identity of the instance in the underlying dataset, but is assigned a new instance ID.
If the underlying instance has not yet been identified when it is modified in the overlay dataset, the instance has no reconciliation identity in either dataset. This is not a problem. When you eventually identify and merge the two datasets, your Identify rules should be able to match these instances so that they receive the same identity.
For each modification that you make to an instance before it is identified, an instance is created in the overlay dataset. You should identify instances before modifying them a second time in the overlay dataset.
If you decide to keep the changes that you modeled in an overlay dataset, you can merge them into the underlying regular dataset.
Controlling client write access to datasets
By default, all BMC Atrium CMDB clients can create, modify, and delete instances in a dataset. However, you can choose to restrict this write access to one or more specific clients: BMC Impact Solutions Publishing Server, BMC Impact Model Designer, and the Reconciliation Engine. When you do this, BMC Atrium CMDB users cannot write to the dataset with a browser. You can also set a dataset to have no write access whatsoever.
Consider restricting write access to your production dataset. By allowing only the Reconciliation Engine to write to your production dataset, you prevent unauthorized changes to your single source of reference. Changes then must be made to other datasets and then reconciled to the production dataset.