Understanding entity identification and lookup

This topic explains the importance of entity lookup in BMC Helix Continuous Optimization. It describes the concepts of single and multiple lookup,and shared entity catalogs..

This topic contains the following sections:

What is entity lookup?

When BMC Helix Continuous Optimization connectors (ETLs) collect capacity-management metrics from a monitored environment (data source), they need to uniquely identify the entities for which they collect the metrics. The ETLs use certain properties to uniquely identify the entities. For example, the ETLs may use system name or asset ID to identify systems.

Before loading metrics into BMC Helix Continuous Optimization, the ETL goes through a checklist of entities for which metrics were loaded during any previous ETL runs. This activity is called entity lookup. The checklist of previously-loaded entities is called an entity catalog. Entity lookup helps the ETL distinguish data for the same entity between different runs of the ETL. 

In case of multiple runs of the same ETL, entity recognition helps the ETL determine when to retain or overwrite old metrics. If you run multiple ETLs on the same source, entity lookup helps the ETLs relate various metrics by them with the same entity.

For simplicity, let us assume the ETLs are able to uniquely identify systems by their color. Then as shown in the following illustration, the ETLs would be able to accurately co-relate various metrics with the correct entity. 

The concept of entity lookup becomes more interesting when multiple ETLs use the same entity catalog but different properties or fields to identify an entity.

To extend the above simple example, imagine if ETLs were able to uniquely recognize identify entities by shape, size or hatching. The following sections describe these concepts in more technical terms.

What is lookup information; Who configures it?

ETLs use one or more properties to uniquely identify entities. Such properties are collectively referred to as lookup information. Based on business logic, the ETL developer determines the property or properties that the ETL must use for entity lookup. If the capacity planner plans to run more than one ETL in the same environment, the developer ensures both entities use the same method to uniquely identify entities. As a BMC Helix Continuous Optimization user, you need not necessarily know how exactly this works. You must, however, enable lookup during ETL configuration.

As an entity developer, however, you will need to understand simple and multiple lookup, and strong and weak entries in a multiple lookup scenario. These concepts are discussed in detail, in the following section.

Entity lookup based on property DS_SYSNM

This section discusses the concept of entity lookup by using the example of  'system' as an entity,  DS_SYSNM as the property or lookup information. The following concepts, however, apply to all kinds of entities.

Recognizing systems during different runs

An ETL needs to follow some convention to identify systems, so that each time it runs, it can tell which systems in its data source were seen before, and which ones are new.

If the data source contains any obvious unique identification information, then the connector needs to only create a unique string, such as DS_SYSNM, to uniquely identify the system.

In some cases, the data source uses different pieces of information, all of which need to exactly match. For example, a system may be identified by its asset identifier and database name.

The ETL developer carefully constructs the DS_SYSNM string from these pieces in a unique and systematic manner.

When loading data, the BMC Helix Continuous Optimization ETL infrastructure automatically checks to see if another system with the same DS_SYSNM is already loaded by that ETL instance. If so, BMC Helix Continuous Optimization will automatically associate the new data with the existing system. If not, BMC Helix Continuous Optimization will create a new entry for the system. All the systems loaded by the ETL instance are tracked in an entity catalog. You can view the entity catalog for any ETL instance in the Admin section of the console.

There are separate lookup tables for entities of type System, Business Driver, and Domain. Each table lists the unique identifier string for each entity of that type (for example, DS_SYSNM for Systems).


This DS_SYSNM field is different from the SYSNM field that is shown as the "Name" of the System in the Helix Capacity Optimization Console. The latter is a convenient, recognizable name for interacting with BMC Helix Continuous Optimization and need not be unique. For example, a server might have a simple, short name, while an ETL may identify it uniquely by using its fully qualified domain name (FQDN) as DS_SYSNM.

You can control whether BMC Helix Continuous Optimization should use only the name of the host as the DS_SYSNM string for comparisons, or whether it should also consider the Internet domain name (FQDN), using an option in the "Loader configuration" section of the configuration. (See Handling ETL lookup name). By default, the Internet domain name (for example, .bmc.com) is automatically included.

Sharing lookup between different ETLs

Often, you need to import information about the same entity from two different data sources. For example, a monitoring system might have performance data about a set of computers, while an asset management database may have administrative information about the same computers.

In BMC Helix Continuous Optimization, this implies that two different ETLs need to load information about the same entities. It is likely that the two different ETLs run on different schedules. As a result, one of the ETLs would be the first to load metrics for an entity. The other ETL will need to use lookup the entity catalog before loading its metrics for the same entity.

You can configure two or more different ETLs to load the same set of entities. You should configure them to share lookup. When a set of connector instances are configured to share lookup tables, BMC Helix Continuous Optimization will consider all the systems loaded by any of them when deciding whether a system is new or an already existing one.

In order to share lookup between two different connectors, the two connectors need to use compatible conventions for attributes to use for identifying entities. BCO provides connector developers with several mechanisms to specify attributes to identify entities loaded earlier, and also to match entities loaded by other connectors. If the connectors are not completely compatible, then sometimes duplicate entities can be loaded in BCO. If this happens, then manual reconciliation may be needed in order to discard one of them.

Single lookup and multiple lookup

An entity like a system can be identifed by a string like DS_SYSNM to establish the name of the system. This works only if the string DS_SYSNM uniquely corresponds to the name of the system in each ETL. The ETL developer needs to construct the DS_SYSNM string using the exact same method. Only then can shared lookup tables or entity catalogs work.

Sometimes,  the different data sources for the different connectors do not have the same unique naming convention to identify the entities. For example, an asset database may use an asset ID to identify systems, while a performance-monitoring system may identify the systems by their FQDN

In this case, the two ETLs may not be able to use DS_SYSNM to uniquely identify a system. BMC Helix Continuous Optimization, therefore, offers a multiple lookup option, where the ETLs look up different fields (properties) in a particular sequence to find an entity match.

For example, the VMware ETL ident

To facilitate multiple lookup, some of the out-of-the-box ETLs from BMC Helix Continuous Optimization have a field called _COMPATIBILITY, apart from the default field DS_SYSNM. If you write a custom ETL that you want to integrate with an out of the box ETL like the VMware ETL, then it is recommended that you populate the _COMPATIBILITY field.

Developers write advanced ETL in a way that allows for multiple ways to identify entities, so that lookup can be shared even with other connectors that don't have access to all the identifying information for their entities.

Strong and weak lookup entries

If different data sources have different sets of identity information for an entity, it makes sense to allow for this possibility by defining several different methods of identifying entities. Each method is called a "lookup entry", which is a set of named fields.

In a Simple lookup, as described above, the connector specifies only a single lookup field named DS_SYSNM. But for multilple lookups, the ETL can specify any number of lookup fields with respective names. A set of lookup fields forms one lookup entry. An entry match is confirmed only if all of the named fields in it match the corresponding values loaded into BMC Helix Continuous Optimization, earlier. This scheme is backward compatible with existing ETLs that use Simple lookup. For "Simple Lookup", the ETL specifies DS_SYSNM as the property name, and the loader translates this to a lookup entry with a single field named DEFAULT.

There are two kinds of lookup entries that can be defined by the ETL:

  • strong lookup entry is one that is used by this ETL to identify entities it loaded earlier. These entities are looked up in its own lookup table, in a way similar to DS_SYSNM. There can be one or more strong lookup entries defined by an ETL. If there are more than one, then they are defined in a sequence. The first entry in the sequence is tried first.
  • weak lookup entry is used by an ETL, not to identify entities it loaded earlier, but to match entities loaded by other ETLs. Multiple weak lookup entries are also defined in a sequence, and they are tried in that sequence.

Both strong and weak lookup entries are checked in sequence with '## is a OR' and '&& is a AND'. That means if the first combination of lookup fields match, the process of checking stops and remaining combinations are not checked. 

Following is an example of a multiple look entry, with strong and weak entries defined. 


Review the following flow chart to understand how lookup entries are checked and how conflicts are handled. This multiple lookup facility makes it possible for different ETLs to correctly identify the same entities from different data sources.

To handle the conflicts properly, both weak and strong entries should be checked. Consider the following use cases in which UUID represents a strong lookup field, and name represents a weak lookup field.

Case 1

The incoming entity that the ETL wants to import weakly matches the existing entity, and the two have at least one strong lookup field combination in common. 


Though the weak field matches, the ETL creates a new entity to populate the data and the entity lookup fields, because the existing entity has a different UUID.

Case 2

The incoming entity that the ETL wants to import weakly matches the existing entity, and there are no strong lookup field combinations in common.


As the weak field matches, and there is no conflict on strong lookup fields, the ETL can populate the data and the entity lookup fields on the existing entity.

Integrating lookup for two ETLs

You can find steps for a  common ETL integration example - VMware ETL and a custom CMDB ETL in the knowledge article number 000025065 (Support logon ID required).

Was this page helpful? Yes No Submitting... Thank you