System requirements for the Remote ETL Engine
Supported operating systems
Computers or virtual machines that host the Remote ETL Engine components must be running on one of the following supported x86_64 architecture operating systems:
Sizing and scalability guidelines for the Remote ETL Engine
When using the Gateway Server, you must install it on the same computer where the Remote ETL Engine is installed. We recommend using a dedicated Remote ETL Engine for processing data from the Gateway Server and a separate Remote ETL Engine for processing data from other on-premises ETLs. The use of separate Remote ETL Engines ensures efficient data processing.
Review the following guidelines to estimate the required disk, memory, and processor capacity:
ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:
- The required data processing throughput in samples per day. This value is the multiplication of number of managed entities and the average number of samples collected for each entity in a day.
- The number of connector instances (tasks) scheduled on the ETL Engine.
Use the sizing and scalability guidelines to determine the hardware capacity for ETL Engine servers according to your environment size:
To understand the current memory requirements of your Remote ETL Engine, you can install the Helix Continuous Optimization Agent on the Remote ETL Engine server and review the REAL_MEM_UTIL metric.
Disk space guidelines
The default values allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust the sizing and scalability numbers accordingly.
You need additional disk space for special activities:
- Bulk import of data, for example, for addition of new data sources with historical data.
- Recovery import of data when a data source stops for a day or two for any reason and has to be recovered.
For these special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).
Scheduling connectors on ETL Engines guidelines
Follow these guidelines when scheduling connectors to avoid congestion. These scheduled connectors are periodically started by the scheduler as batch jobs and run as separate processes.
- Do not configure a single connector instance to populate more than two million samples in each run. You can configure multiple connector instances to divide the work into smaller chunks, or you can configure multiple ETL Engine servers to manage the higher volume, when the data source can support it.
- Schedule connectors so that no more than one connector instance is running on any CPU core at any given time. Take into account the amount of time required to execute each connector. For example, if an ETL Engine server is configured with two CPU cores, ensure that no more than two connectors are running at any given time.
- If you are trying to scale up by increasing the size of the ETL Engine machine, then consider that certain types of connectors require significantly more memory than other connectors. For example, connectors written in Java might require twice as much memory, or more, than other types of connectors.
- Avoid congestion at the warehousing engine. Do not import data in large volumes, such as when recovering historical data. Split this data into smaller chunks.
- A single ETL Engine can run no more than four service connectors.
- Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, this means that running two service connectors will require the heap size to be increased to 2 GB.
- The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer.
Service connectors guidelines
For service connectors, which run continuously within the same process as the scheduler, there are stricter guidelines as follows.
- A single ETL Engine computer can run no more than four service connectors.
- Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, running two service connectors will require the heap size to be increased to 2 GB.
- The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer. If you need to add more than two service connectors, use the larger 8 GB RAM size ETL Engine computer.
To modify the heap size of the ETL Engine scheduler
- In the installation directory on the ETL Engine computer, open the customenvpre.sh file for editing.
- In the #SCHEDULER section, search for the following statements:
#SCHEDULER_HEAP_SIZE="1024m"#export SCHEDULER_HEAP_SIZE - If the statements are preceded by a '#' character, remove it from both to uncomment them, and modify the number (1024m) to the new heap size.
- Restart the scheduler.