ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:
- The required data processing throughput in samples per day. This value is the multiplication of number of managed entities and the average number of samples collected for each entity in a day.
- The number of connector instances (tasks) scheduled on the ETL Engine.
Use the sizing and scalability guidelines to determine the hardware capacity for ETL Engine servers according to your environment size:
ETL Engine configuration
|Number of servers|
Disk space per server
Number of tasks per server
Up samples per day
2 CPU cores @ 2GHz, 8 GB RAM
8 + 50 GB
|20||5 million||2.4 Mbit/s|
4 CPU cores@ 2 GHz, 16 GB RAM
8 + 100 GB
|4 CPU cores@ 2 GHz, 16 GB RAM||3||8 + 100 GB||40||50 million||7.2 Mbit/s|
|4 CPU cores@ 2 GHz, 16 GB RAM||6||8 + 100 GB||40||100 million||14.4 Mbit/s|
|4 CPU cores@ 2 GHz, 16 GB RAM||15||8 + 100 GB||40||250 million||36 Mbit/s|
For more information, refer to the following sections:
Guidelines for disk space
The default values allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period setting for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust these numbers accordingly.
You need additional disk space for special activities:
- Bulk import of data, for example, for addition of new data sources with historical data.
- Recovery import of data when a data source stops for a day or two for any reason and has to be recovered.
For these special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).
Guidelines for scheduling connectors on ETL Engines
Follow these guidelines when scheduling connectors to avoid congestion. These scheduled connectors are periodically started by the scheduler as batch jobs and run as separate processes.
- Do not configure a single connector instance to populate more than two million samples in each run. You can configure multiple connector instances to divide the work into smaller chunks, or you can configure multiple ETL Engine servers to manage the higher volume, when the data source can support it.
- Schedule connectors so that no more than one connector instance is running on any CPU core at any given time. Take into account the amount of time required to execute each connector. For example, if an ETL Engine server is configured with two CPU cores, ensure that no more than two connectors are running at any given time.
- If you are trying to scale up by increasing the size of the ETL Engine machine, then consider that certain types of connectors require significantly more memory than other connectors. For example, connectors written in Java might require twice as much memory, or more, than other types of connectors.
- Avoid congestion at the warehousing engine. Do not import data in large volumes, such as when recovering historical data. Split this data into smaller chunks.
- A single ETL Engine can run no more than four service connectors.
- Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, this means that running two service connectors will require the heap size to be increased to 2 GB.
- The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer.
You can monitor and control the internal processing of collected data by viewing statistics such as processing throughput and time in the this topic. Data needs to be processed by the warehousing engine before it can be analyzed, modeled, and reported in the console.
Guidelines for service connectors
For service connectors, which run continuously within the same process as the scheduler, there are stricter guidelines as follows.
- A single ETL Engine computer can run no more than four service connectors.
- Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, running two service connectors will require the heap size to be increased to 2 GB.
- The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer. If you need to add more than two service connectors, use the larger 8 GB RAM size ETL Engine computer.
Modifying the heap size of the ETL Engine scheduler
- In the installation directory on the ETL Engine computer, open the customenvpre.sh file for editing.
- In the #SCHEDULER section, search for the following statements:
- If the statements are preceded by a '#' character, remove it from both to uncomment them, and modify the number (1024m) to the new heap size.
- Restart the scheduler.