Understanding entity identification and lookup
This topic explains the importance of entity lookup in TrueSight Capacity Optimization (CO). It describes the concepts of single and multiple lookup,and shared entity catalogs..
This topic contains the following sections:
What is entity lookup?
When TrueSight Capacity Optimization connectors (ETLs) collect capacity-management metrics from a monitored environment (data source), they need to uniquely identify the entities for which they collect the metrics. The ETLs use certain properties to uniquely identify the entities. For example, the ETLs may use system name or asset ID to identify systems.
Before loading metrics into TrueSight Capacity Optimization, the ETL goes through a checklist of entities for which metrics were loaded during any previous ETL runs. This activity is called entity lookup. The checklist of previously-loaded entities is called an entity catalog. Entity lookup helps the ETL distinguish data for the same entity between different runs of the ETL.
In case of multiple runs of the same ETL, entity recognition helps the ETL determine when to retain or overwrite old metrics. If you run multiple ETLs on the same source, entity lookup helps the ETLs relate various metrics by them with the same entity.
For simplicity, let us assume the ETLs are able to uniquely identify systems by their color. Then as shown in the following illustration, the ETLs would be able to accurately co-relate various metrics with the correct entity.
The concept of entity lookup becomes more interesting when multiple ETLs use the same entity catalog but different properties or fields to identify an entity.
To extend the above simple example, imagine if ETLs were able to uniquely recognize identify entities by shape, size or hatching. The following sections describe these concepts in more technical terms.
What is lookup information; Who configures it?
ETLs use one or more properties to uniquely identify entities. Such properties are collectively referred to as lookup information. Based on business logic, the ETL developer determines the property or properties that the ETL must use for entity lookup. If the capacity planner plans to run more than one ETL in the same environment, the developer ensures both entities use the same method to uniquely identify entities. As a TrueSight Capacity Optimization user, you need not necessarily know how exactly this works. You must, however, enable lookup during ETL configuration.
As an entity developer, however, you will need to understand simple and multiple lookup, and strong and weak entries in a multiple lookup scenario. These concepts are discussed in detail, in the following section.
Entity lookup based on property
This section discusses the concept of entity lookup by using the example of 'system' as an entity, DS_SYSNM as the property or lookup information. The following concepts, however, apply to all kinds of entities.
Recognizing systems during different runs
An ETL needs to follow some convention to identify systems, so that each time it runs, it can tell which systems in its data source were seen before, and which ones are new.
If the data source contains any obvious unique identification information, then the connector needs to only create a unique string, such as
DS_SYSNM, to uniquely identify the system.
In some cases, the data source uses different pieces of information, all of which need to exactly match. For example, a system may be identified by its asset identifier and database name.
The ETL developer carefully constructs the
DS_SYSNM string from these pieces in a unique and systematic manner.
When loading data, the TrueSight Capacity Optimization ETL infrastructure automatically checks to see if another system with the same
DS_SYSNM is already loaded by that ETL instance. If so, TrueSight Capacity Optimization will automatically associate the new data with the existing system. If not, TrueSight Capacity Optimization will create a new entry for the system. All the systems loaded by the ETL instance are tracked in an entity catalog. You can view the entity catalog for any ETL instance in the Admin section of the console.
There are separate lookup tables for entities of type System, Business Driver, and Domain. Each table lists the unique identifier string for each entity of that type (for example,
DS_SYSNM for Systems).
DS_SYSNM field is different from the
SYSNM field that is shown as the "Name" of the System in the TrueSight Capacity Optimization console. The latter is a convenient, recognizable name for interacting with TrueSight Capacity Optimization and need not be unique. For example, a server might have a simple, short name, while an ETL may identify it uniquely by using its fully qualified domain name (FQDN) as
You can control whether TrueSight Capacity Optimization should use only the name of the host as the
DS_SYSNM string for comparisons, or whether it should also consider the Internet domain name (FQDN), using an option in the "Loader configuration" section of the configuration. (See Handling ETL lookup name). By default, the Internet domain name (for example,
.bmc.com) is automatically included.
Sharing lookup between different ETLs
Often, you need to import information about the same entity from two different data sources. For example, a monitoring system might have performance data about a set of computers, while an asset management database may have administrative information about the same computers.
In TrueSight Capacity Optimization, this implies that two different ETLs need to load information about the same entities. It is likely that the two different ETLs run on different schedules. As a result, one of the ETLs would be the first to load metrics for an entity. The other ETL will need to use lookup the entity catalog before loading its metrics for the same entity.
You can configure two or more different ETLs to load the same set of entities. You should configure them to share lookup. When a set of connector instances are configured to share lookup tables, TrueSight Capacity Optimization will consider all the systems loaded by any of them when deciding whether a system is new or an already existing one.
In order to share lookup between two different connectors, the two connectors need to use compatible conventions for attributes to use for identifying entities. BCO provides connector developers with several mechanisms to specify attributes to identify entities loaded earlier, and also to match entities loaded by other connectors. If the connectors are not completely compatible, then sometimes duplicate entities can be loaded in BCO. If this happens, then manual reconciliation may be needed in order to discard one of them.
Single lookup and multiple lookup
An entity like a system can be identifed by a string like DS_SYSNM to establish the name of the system. This works only if the string DS_SYSNM uniquely corresponds to the name of the system in each ETL. The ETL developer needs to construct the
DS_SYSNM string using the exact same method. Only then can shared lookup tables or entity catalogs work.
Sometimes, the different data sources for the different connectors do not have the same unique naming convention to identify the entities. For example, an asset database may use an asset ID to identify systems, while a performance-monitoring system may identify the systems by their FQDN
In this case, the two ETLs may not be able to use
DS_SYSNM to uniquely identify a system. TrueSight Capacity Optimization, therefore, offers a multiple lookup option, where the ETLs look up different fields (properties) in a particular sequence to find an entity match.
For example, the VMware ETL ident
To facilitate multiple lookup, some of the out-of-the-box ETLs from TrueSight Capacity Optimization have a field called _COMPATIBILITY, apart from the default field DS_SYSNM. If you write a custom ETL that you want to integrate with an out of the box ETL like the VMware ETL, then it is recommended that you populate the _COMPATIBILITY field.
Developers write advanced ETL in a way that allows for multiple ways to identify entities, so that lookup can be shared even with other connectors that don't have access to all the identifying information for their entities.
Strong and weak lookup entries
If different data sources have different sets of identity information for an entity, it makes sense to allow for this possibility by defining several different methods of identifying entities. Each method is called a "lookup entry", which is a set of named fields.
In a Simple lookup, as described above, the connector specifies only a single lookup field named
DS_SYSNM. But for multilple lookups, the ETL can specify any number of lookup fields with respective names. A set of lookup fields forms one lookup entry. An entry match is confirmed only if all of the named fields in it match the corresponding values loaded into TrueSight Capacity Optimization, earlier. This scheme is backward compatible with existing ETLs that use Simple lookup. For "Simple Lookup", the ETL specifies
DS_SYSNM as the property name, and the loader translates this to a lookup entry with a single field named
There are two kinds of lookup entries that can be defined by the ETL:
- A strong lookup entry is one that is used by this ETL to identify entities it loaded earlier. These entities are looked up in its own lookup table, in a way similar to
DS_SYSNM. There can be one or more strong lookup entries defined by an ETL. If there are more than one, then they are defined in a sequence. The first entry in the sequence is tried first.
- A weak lookup entry is used by an ETL, not to identify entities it loaded earlier, but to match entities loaded by other ETLs. Multiple weak lookup entries are also defined in a sequence, and they are tried in that sequence.
Following is an example of a multiple look entry, with strong and weak entries defined
DS_SYSNM=HOSTNAME#vl-pun-bcm-dv20##PARENT_VCNAME#BCM-VCENTER##NAME#vl-pun-bcm-dv20##PARENT_HOSTNAME#pe-pun-bco-dv05.bmc.com##_COMPATIBILITY_#564da903-e29e-976a-6970-6164020b0d3b##UUID#564da903-e29e-976a-6970-6164020b0d3b##VMW_VMREF#vm-3093##PARENT_VCUUID#63BA5246-A098-407F-B797-89E5E1B145D4; STRONGLOOKUPFIELDS=PARENT_VCUUID&&VMW_VMREF##PARENT_VCNAME&&VMW_VMREF WEAKLOOKUPFIELDS=UUID##HOSTNAME##PARENT_HOSTNAME&&NAME##PARENT_VCNAME&&PARENT_CLUSTERNAME&&NAME##NAME
The loader's entity lookup process tries all the strong lookup entries first, then all the weak lookup entries. This means that if there are no other ETLs loading the same entities, only the strong lookup entries are tried.
Whenever a match is found, TrueSight Capacity Optimization maintains all the lookup entry values that are loaded by all the ETLs sharing lookup. Entries tend to match other connectors, only if the other ETLs have access to the same fields, and provided they supply these fields in weak lookup entries.
"Weak" lookup entries are never tested on the ETL's own lookup table; they are used only to search the shared lookup table when no strong lookup combination has been found.
When a weak lookup entry is found in the shared lookup table, the loader has found a "tentative" match for an existing CO entity. The loader will then verify all of the strong lookup field combinations populated by the ETL again on the tentative entity, to ensure that different strong entries don't match the same CO entity. The match is positive only when all the strong entries uniquely identify the entity.
This multiple lookup facility makes it possible for different ETLs to correctly identify the same entities from different data sources.