ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:
Follow these guidelines when sizing and scaling ETL Engine servers:
ETL Engine configuration
No. of connectors
Samples per day
2 CPU cores @ 2GHz, 4 GB RAM
8 GB free
4 CPU cores@ 2 GHz, 8 GB RAM
16 GB free
For more information, refer to the following sections:
The default values above allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period setting for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust the above numbers accordingly.
You need additional disk space for special activities:
For the above special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).
Follow these guidelines when scheduling connectors to avoid congestion:
You can monitor and control the internal processing of collected data by viewing statistics such as processing throughput and time in the Administration section of the BMC TrueSight Capacity Optimization Console. Data needs to be processed by the warehousing engine before it can be analyzed, modeled, and reported in the Console.
The guidelines above are for scheduled connectors, which are periodically started by the scheduler as batch jobs and run as separate processes. For service connectors, which run continuously within the same process as the scheduler, there are stricter guidelines as follows.
Unlike local ETL Engines, which load data into the data warehouse by connecting directly to the Oracle database, remote ETL Engines use a different method to load data into the data warehouse. The sizing and scalability characteristics of BMC TrueSight Capacity Optimization installations with data being loaded from remote ETL Engines are therefore different. In general, a remote ETL Engine uses additional resources on the Data hub machine. To decide when to use a remote ETL Engine, see Guidelines on using a remote ETL engine.
Remote ETL Engines use a two-step process to load data into the data warehouse:
This two-step process involves use of the following resources:
Even if there are sufficient resources for all of the above, the overall process also takes longer to finish than for local ETL Engines.
If you have enough data being transferred from remote ETL Engines to require more than one remote ETL Engine, you should provide additional resources at the Data hub.
The following table provides some guidelines for the Data hub when you use remote ETL Engines:
Data hub resource
40 IOPS additional for each remote ETL Engine
320 MB additional for each remote ETL Engine
CPU and memory
Memory 160 MB for each remote ETL Engine, up to a maximum of 8 GB.
See also Setting the Data hub JVM heap size.
Database connection pool
Add two connections for each remote ETL Engine, up to 60 total; see section on database connection pool below
Be careful when modifying the heap size of the JVM. Wrong settings may cause unpredictable and hard-to-diagnose failures.
Remove the character preceding the following statements to uncomment them, and replace 1024m before restarting the Data hub service.
#DATAHUB_HEAP_SIZE="1024m" #export DATAHUB_HEAP_SIZE
When modifying the database connection pool size, verify that the Oracle database also has corresponding sizes for concurrent sessions.
<max-pool-size>40</max-pool-size>element with a new value, and Save the
Ensure that you are edit the setting only for the CaplanDHDS data source.
The BMC TrueSight Capacity Optimization ETL Engine is a server that runs connectors that populate data from external data sources to the Data Warehouse database.
An ETL Engine can be configured in two ways:
The same connectors can run in both types of ETL Engine. The choice between local and remote is purely a deployment choice. Selection criteria explains how to choose the appropriate configuration for a BMC TrueSight Capacity Optimization ETL Engine.
For more information, refer to the following sections:
In most cases, a local ETL Engine is recommended because it offers a higher throughput connection to BMC TrueSight Capacity Optimization, to load high volumes of data.
A remote ETL Engine provides two advantages to connectors running in it:
However, a remote ETL Engine introduces overhead for robust transmission, thus reducing the total throughput obtained.
The remote ETL Engine option should be adopted only in the following cases:
A remote ETL Engine is not needed just because the data source is remote. The name "local" in a local ETL Engine does not imply that it must be on the same LAN as the BMC TrueSight Capacity Optimization database. As long as the ETL Engine can reach the BMC TrueSight Capacity Optimization LAN to populate the data (see details below), a local ETL Engine can be used.
You can always use multiple local ETL Engines over different LAN segments to drive higher volumes.
A local ETL Engine requires TCP visibility of the BMC TrueSight Capacity Optimization database server and application server from the ETL Engine machine; specifically TCP port 1521 for the database (default Oracle installations) and TCP port 8280 (default Data hub port) for the BMC TrueSight Capacity Optimization Application Server.
A remote ETL Engine can be configured to use either:
To install a remote ETL Engine, you must forward a port on your external firewall to your BMC TrueSight Capacity Optimization Data hub external communication port, to expose it to the remote ETL.
You will be asked to provide external Data hub and port parameters during the configuration procedure. Configure both firewall and Data hub if you plan to have a remote ETL installation.