Best practices for managing datasets
A dataset in BMC CMDB is used to capture data from different data sources. It is recommended that each source has its own dataset. This simplifies the process of managing the data during later processes such as normalization and reconciliation.
- Do not update the production dataset directly. The production dataset is the result of the reconciliation process.
- Do not delete any dataset (either production or source).
- Data that comes from a particular data source is written to its own dataset, the best practice recommends that a dataset is writable only by that data source which populates data into it. For example, the data brought in by BMC Application Discovery and Dependency Mapping (ADDM) is written to BMC.ADDM dataset.
- Though it is possible to update datasets manually, it is recommended that datasets are updated through automation. By following the best practice of having a single source per dataset you improve the data integrity within your CMDB. Access to the data is read-only for everyone, you can configure to restrict the write ability only to the data source.
- It is recommended not to delete datasets (either source or production). Source datasets give you the information from where that data has been collected or discovered, this information could prove vital in the long run. You can delete a dataset only in a situation where immediately after creating a dataset you realize some settings are incorrect. Such a dataset should not have any information (should be blank) and can be deleted.
- The dataset names are configurable and you should give them the appropriate, easily identifiable names.
- Use datasets primarily to represent different data providers, but you can use datasets to represent other types or groupings of data, such as test data, obsolete data, or data for different companies or organizations for multitenancy.
General best practices
- Make sure that each data provider has its own import dataset.
- The BMC.ASSET dataset is the default production dataset.
- You should also note what dataset is your production, or golden, dataset so that you can plan your normalization and reconciliation jobs.
- Use the production dataset as a master dataset to identify duplicate CIs, matching attributes for the CI in the production dataset with the CIs in the imported datasets.
- The production dataset can be the target dataset in a merge activity so that the CIs are updated to keep the production dataset current and accurate.
- Do not normalize the production dataset because you should normalize CIs before identifying and merging them.
In cases where you need to merge more than one dataset at time, you might want to create an intermediate dataset for merging. You should create a regular dataset instead of an overlay dataset for a data provider.
Performance and maintenance
- For better performance and to minimize impact on users of the production dataset, BMC recommends that you merge one import or discovered dataset at a time with the production dataset.
- You might want to merge multiple source datasets in separate jobs to an intermediate dataset and then merge the intermediate dataset with the production dataset.
- BMC recommends that you plan your datasets in such a way that you never have to delete them. Deleting datasets can have huge repercussions on the jobs and the CMDB.
- A dataset can contain only one instance of a given CI. An instance of that CI might also exist in other datasets to represent the CI in the contexts of those datasets. Instances representing the same CI or relationship across datasets share the same reconciliation identity, or reconciliation ID.
- When you create a dataset, you give it both a name and an ID. The naming convention for dataset IDs is as follows, and should be written using all capital letters:
- <VENDOR_NAME> is the name of the company whose product provides or consumes data from the dataset. If it is a site-specific dataset, it should have a vendor name of SITE.
- <PURPOSE> is the purpose of the dataset, for example, ASSET, IMPORT, or ARCHIVE.
- <VENDOR_SPECIFIC_PRODUCT> is the product or functionality area within a purpose.
For example, BMC Asset Management uses the BMC.ASSET dataset ID.
Advanced dataset handling
In an environment where you have the need to run reconciliation frequently due to a constant turnaround in your business you can use some advance dataset handling practices. You can improve your data integrity by using a dataset hierarchy
This means that you start with multiple source datasets such as dataset1, dataset2, dataset3, dataset4, and so on. These datasets are reconciled into datasets such as dataset1.a, dataset2.b, dataset3.c, dataset4.d and so on. These datasets (1.a, 2., 3,c) are then reconciled into a pre-production dataset. After the data is reconciled in the pre-production, depending on your requirement you should reconcile the pre-production to the production dataset.
This simplifies the merging and data management, thus improving the data integrity.