Planning for data collection in the on-premises environment
This section describes how to plan for the installation of Remote ETL Engine and related components that are required to collect data from your environment. If you want to use Capacity Agents for data collection, plan for the deployment of Gateway Server.
When using the Gateway Server, you must install it on the same computer where the Remote ETL Engine is installed. BMC recommends using a dedicated Remote ETL Engine for processing data from the Gateway Server and a separate Remote ETL Engine for processing data from other on-premises ETLs. The use of separate Remote ETL Engines ensures efficient data processing.
Ensure that the Linux computer where you plan to install the Remote ETL Engine meets the following requirements:
Hardware requirements
- CPU - 2 processors
- RAM - 8 GB
- Disk - 58 GB
Supported operating systems
Computers or virtual machines that host the Remote ETL Engine components must be running one of the following supported operating systems. Only x86_64 architecture is supported for the operating systems.
Operating system | Version | Required libraries |
---|---|---|
Red Hat Enterprise Linux | 7.5 and later | perl-5.16.3, libXtst, liberation-sans-fonts, libXrender |
8.0 and later | perl-5.16.3, libXtst, liberation-sans-fonts, libXrender | |
Oracle Linux | 7.5 and later | perl-5.16.3, libXtst, liberation-sans-fonts |
CentOS | 7.5 and later | perl-5.16.3, libXtst, liberation-sans-fonts |
SUSE Linux Enterprise Server | 11 SP4 | perl-5.10.0, perl-libwww-perl |
Review the following guidelines to estimate the required disk, memory, and processor capacity:
ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:
- The required data processing throughput in samples per day. This value is the multiplication of number of managed entities and the average number of samples collected for each entity in a day.
- The number of connector instances (tasks) scheduled on the ETL Engine.
ETL Engine configuration | Number of servers | Disk space per server | Number of tasks per server | Up samples per day |
---|---|---|---|---|
2 CPU cores @ 2GHz, 8 GB RAM | 1 | 8 + 50 GB | 20 | 5 million |
4 CPU cores@ 2 GHz, 16 GB RAM | 2 | 8 + 100 GB | 40 | 25 million |
4 CPU cores@ 2 GHz, 16 GB RAM | 3 | 8 + 100 GB | 40 | 50 million |
For more information, refer to the following sections:
Guidelines for disk space
The default values allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period setting for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust these numbers accordingly.
You need additional disk space for special activities:
- Bulk import of data, for example, for addition of new data sources with historical data.
- Recovery import of data when a data source stops for a day or two for any reason and has to be recovered.
For these special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).
Guidelines for scheduling connectors on ETL Engines
Follow these guidelines when scheduling connectors to avoid congestion. These scheduled connectors are periodically started by the scheduler as batch jobs and run as separate processes.
- Do not configure a single connector instance to populate more than two million samples in each run. You can configure multiple connector instances to divide the work into smaller chunks, or you can configure multiple ETL Engine servers to manage the higher volume, when the data source can support it.
- Schedule connectors so that no more than one connector instance is running on any CPU core at any given time. Take into account the amount of time required to execute each connector. For example, if an ETL Engine server is configured with two CPU cores, ensure that no more than two connectors are running at any given time.
- If you are trying to scale up by increasing the size of the ETL Engine machine, then consider that certain types of connectors require significantly more memory than other connectors. For example, connectors written in Java might require twice as much memory, or more, than other types of connectors.
- Avoid congestion at the warehousing engine. Do not import data in large volumes, such as when recovering historical data. Split this data into smaller chunks.
- A single ETL Engine can run no more than four service connectors.
- Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, this means that running two service connectors will require the heap size to be increased to 2 GB.
- The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer.
Note
You can monitor and control the internal processing of collected data by viewing statistics such as processing throughput and time in the this topic. Data needs to be processed by the warehousing engine before it can be analyzed, modeled, and reported in the console.
Guidelines for service connectors
For service connectors, which run continuously within the same process as the scheduler, there are stricter guidelines as follows.
- A single ETL Engine computer can run no more than four service connectors.
- Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, running two service connectors will require the heap size to be increased to 2 GB.
- The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer. If you need to add more than two service connectors, use the larger 8 GB RAM size ETL Engine computer.
Modifying the heap size of the ETL Engine scheduler
- In the installation directory on the ETL Engine computer, open the customenvpre.sh file for editing.
- In the #SCHEDULER section, search for the following statements:
#SCHEDULER_HEAP_SIZE="1024m"#export SCHEDULER_HEAP_SIZE - If the statements are preceded by a '#' character, remove it from both to uncomment them, and modify the number (1024m) to the new heap size.
- Restart the scheduler.
Ensure that the Linux computer where you plan to install the Remote ETL Engine and Gateway Server meets the following requirements:
Hardware requirements
- CPU - 2 processors
- RAM - 8 GB
- Disk - 218 GB
The Gateway Server supports x86 and x86-64 architecture.
Supported operating systems
The computer that hosts the Remote ETL Engine and Gateway Server must be running one of the following operating systems. Only x86_64 architecture is supported for the operating systems.
Operating system | Version | Required libraries |
---|---|---|
Red Hat Enterprise Linux | 7.6 and later | perl-5.16.3, libXtst, liberation-sans-fonts, libXrender |
8.0 and later | perl-5.16.3, libXtst, liberation-sans-fonts, libXrender | |
Oracle Linux | 7.6 and later | perl-5.16.3, libXtst, liberation-sans-fonts |
CentOS | 7.6 and later | perl-5.16.3, libXtst, liberation-sans-fonts |
SUSE Linux Enterprise Server | 11 SP4 12 SP4 and later 15 SP1 and later | perl-5.10.0, perl-libwww-perl |
1 - Ensure that a compatible Korn shell is installed on a Red Hat Enterprise Linux 6.x system. The Korn shell is not installed by default
.
Other requirements
- Ensure that the available temporary disk space is greater than 500 MB. The Gateway Server installer uses the following environment variables in the listed order to access this space:
- $IATEMPDIR environment variable
- /tmp
- Your home directory
- On Linux systems, ensure that at least 2 MB of free space is reserved for the
/etc
file system on all the managed systems, and you have the execute (x) permission for the/etc
directory. - Ensure that TCP/IP is installed on the Gateway Server computer and the managed systems that run Capacity Agents. The Gateway Server uses the TCP/IP protocol to communicate with these managed systems.
- Install the pcron utility that is required to schedule Manager runs simultaneously.
If you want to use remote data repositories on a UNIX network file system, ensure that the rpc.lockd and rpc.statd NFS lock manager daemons run on both the client and server computers.
For Linux systems, ensure that the installation directory must be on a standard Linux filesystem such as ext3 or a Global File System (GFS) for high availability deployments (Active/Passive servers). Installation on a CIFS share mounted as a filesystem is not supported. You can use CIFS share only for the shared repository.
Review the following guidelines to estimate the required disk, memory, and processor capacity:
Remote ETL Engine
ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:
- The required data processing throughput in samples per day. This value is the multiplication of number of managed entities and the average number of samples collected for each entity in a day.
- The number of connector instances (tasks) scheduled on the ETL Engine.
Guidelines for disk space
The default values allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period setting for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust these numbers accordingly.
You need additional disk space for special activities:
- Bulk import of data, for example, for addition of new data sources with historical data.
- Recovery import of data when a data source stops for a day or two for any reason and has to be recovered.
For these special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).
Gateway Server
The following factors affect the sizing drivers:
- Size of your environment
- Number of Agents from which data is collected
- Data retention period
Data retention
The data collected by Capacity Agents (in the UDR format) is periodically transferred to the Gateway Server where it is automatically processed into hourly intervals and saved in text files called VIS files. BMC recommends the following guidelines for the retention period of UDR and VIS files:
- Retain the VIS files of three months. These files are useful if you need to recover data. For more information, see Recovering data.
- For UDR and VIS files, keep three months of data. Out of this data, the oldest two months data can be compressed to save space.
The following table provides the recommendations for the hardware capacity according to the environment size.
Size of your environment | Number of servers | Processor cores per server | RAM (4GB/core) per server | IOPS per server | Number of tasks per server | Storage (in GB) per server | ||
---|---|---|---|---|---|---|---|---|
Installation | Remote ETL Engine | Gateway Server | ||||||
Small (up to 1000 servers) | 1 | 2 | 8 | 60 | 20 | 8 | 50 | 160 |
Medium (up to 5000 servers) | 2 | 4 | 16 | 150 | 40 | 8 | 100 | 395 |
Large (up to 10000 servers) | 3 | 4 | 16 | 200 | 40 | 8 | 100 | 525 |
The calculations for the Gateway Server storage requirements are based on the following assumptions:
- Metric resolution - 60 minutes
- Processing window - 4 hours
- UDR data retention - 1 month
- VIS data retention - 3 months
- UDR spill interval - 15 minutes
Review the following topics to plan for the Capacity Agent installation:
Comments