Best practices for managing datasets
A dataset is a logical grouping of data in BMC Helix CMDB and is used to capture data from different data sources. It is recommended that each source has its own dataset. This simplifies the process of managing the data during later processes such as normalization and reconciliation.
Data that comes from a particular data source is written to its own dataset. We recommend that a dataset is writable to only that data source which populates data into it. For example, the data brought in by BMC Discovery is written to the BMC.ADDM dataset.
- Do not update the production dataset directly. The production dataset is the result of the reconciliation process.
- Do not delete any dataset (either production or source).
- Though it is possible to update datasets manually, it is recommended that datasets are updated through automation. By following the best practice of having a single source per dataset you improve the data integrity within yourCMDB. Access to the data is read-only for everyone though you can configure to restrict the write ability only to the data source.
- It is recommended not to delete datasets (either source or production). Source datasets give you the information from where that data has been collected or discovered, and this information could prove vital in the long run. You can delete a dataset only in a situation where immediately after creating a dataset, you realize some settings are incorrect. Such a dataset, which does not have any information (that is, blank) can be deleted.
- The dataset names are configurable and you should give them the appropriate, easily identifiable names.
- You must use datasets primarily to represent different data providers. However, you can also use datasets to represent other types or groupings of data, such as test data, obsolete data, or data for different companies or organizations for multitenancy.
General best practices
- Make sure that each data provider has its own import dataset.
- The BMC.ASSET dataset is the default production dataset.
- Make a note of your production or golden dataset name so that you can plan your normalization and reconciliation jobs.
- Use the production dataset as a master dataset to identify duplicate CIs.
You can use this master dataset to match attributes for the CI in the production dataset with the CIs in the imported datasets.
- You can make the production dataset as the target dataset in a merge activity, so that the CIs are updated to keep the production dataset current and accurate.
- Do not normalize the production dataset. You must normalize CIs before identifying and merging them.
In cases where you need to merge more than one dataset at a time, you might want to create an intermediate dataset for merging.
You should create a regular dataset as an intermediate dataset instead of an overlay dataset for a data provider.
Performance and maintenance
- For better performance and to minimize impact on users of the production dataset, we recommend that you merge one import dataset or discovered dataset at a time with the production dataset.
- You might want to merge multiple source datasets in separate jobs to an intermediate dataset and then merge the intermediate dataset with the production dataset.
- We recommend that you plan your datasets in such a way that you never have to delete them. Deleting datasets can have huge repercussions on the jobs and the CMDB.
- A dataset can contain only one instance of a given CI. An instance of that CI might also exist in other datasets to represent the CI in the contexts of those datasets. Instances representing the same CI or relationship across datasets share the same reconciliation identity, or reconciliation ID.
- When you create a dataset, you give it both a name and an ID. The naming convention for dataset IDs is as follows, and should be written using all capital letters:
- <VENDOR_NAME> is the name of the company whose product provides or consumes data from the dataset. If it is a site-specific dataset, it should have a vendor name of SITE.
- <PURPOSE> is the purpose of the dataset, for example, ASSET, IMPORT, or ARCHIVE.
- <VENDOR_SPECIFIC_PRODUCT> is the product or functionality area within a purpose.
For example, BMC Helix ITSM: Asset Management uses the BMC.ASSET dataset ID.
Using dataset hierarchy
In an environment where you need to run reconciliation frequently due to a constant turnaround in your business, you can use dataset hierarchy to improve your data integrity.
Using dataset hierarchy involves reconciling the datasets by dividing them into different layers. For example, you have multiple source datasets such as dataset1, dataset2, dataset3, dataset4, and so on. These datasets are first reconciled into datasets such as dataset1.a, dataset2.b, dataset3.c, dataset4.d and so on. These datasets (1.a, 2., 3,c) are then reconciled into a pre-production dataset. After the data is reconciled in the pre-production, depending on your requirement you should reconcile the pre-production to the production dataset.
This simplifies merging and data management, thus improving data integrity.