Planning for data collection in the on-premises environment

This section describes how to plan for the installation of Remote ETL Engine and related components that are required to collect data from your environment. If you want to use Capacity Agents for data collection, plan for the deployment of BMC Helix Client Gateway and Gateway Server.

When using the Gateway Server, you must install it on the same computer where the Remote ETL Engine is installed. Also, it is a best practice to install BMC Helix Client Gateway on the same computer.

BMC recommends using a dedicated Remote ETL Engine for processing data from the Gateway Server and a separate Remote ETL Engine for processing data from other on-premises ETLs. The use of separate Remote ETL Engines ensures efficient data processing.

Remote ETL Engine

Prerequisite

Ensure that the Helix system user who runs the Remote ETL Engine has permissions to use the system crontab file:

If the host has a pcron.deny policy, ensure that the Helix user is not included in it.
If the host has a cron.allow policy, add the Helix user to it.

Ensure that the Linux computer where you plan to install the Remote ETL Engine meets the following requirements:

System requirements for the Remote ETL Engine

Hardware requirements

CPU - 2 processors
RAM - 8 GB
Disk - 58 GB

Supported operating systems

Computers or virtual machines that host the Remote ETL Engine components must be running one of the following supported operating systems. Only x86_64 architecture is supported for the operating systems.

Operating system	Version	Required libraries
Red Hat Enterprise Linux	6.8 and later	perl-5.10.1, libXtst, open-sans-fonts.noarch
	7.5 and later	perl-5.16.3, libXtst, liberation-sans-fonts, libXrender
	8.0 and later	perl-5.16.3, libXtst, liberation-sans-fonts, libXrender
Oracle Linux	6.8 and later	perl-5.10.1, libXtst, dejavu-sans-fonts.noarch
Oracle Linux	7.5 and later	perl-5.16.3, libXtst, liberation-sans-fonts
CentOS	6.8 and later	perl-5.10.1, libXtst, open-sans-fonts.noarch
CentOS	7.5 and later	perl-5.16.3, libXtst, liberation-sans-fonts
SUSE Linux Enterprise Server	12 SP4 and later 15 SP1 and later	perl-libwww-perl

Review the following guidelines to estimate the required disk, memory, and processor capacity:

Sizing and scalability guidelines for the Remote ETL Engine

ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:

The required data processing throughput in samples per day. This value is the multiplication of number of managed entities and the average number of samples collected for each entity in a day.
The number of connector instances (tasks) scheduled on the ETL Engine.

Use the sizing and scalability guidelines to determine the hardware capacity for ETL Engine servers according to your environment size:

ETL Engine configuration	Number of servers	Disk space per server	Number of tasks per server	Up samples per day	Bandwidth
2 CPU cores @ 2GHz, 8 GB RAM	1	8 + 50 GB	20	5 million	2.4 Mbit/s
4 CPU cores@ 2 GHz, 16 GB RAM	2	8 + 100 GB	40	25 million	4.8 Mbit/s
4 CPU cores@ 2 GHz, 16 GB RAM	3	8 + 100 GB	40	50 million	7.2 Mbit/s
4 CPU cores@ 2 GHz, 16 GB RAM	6	8 + 100 GB	40	100 million	14.4 Mbit/s
4 CPU cores@ 2 GHz, 16 GB RAM	15	8 + 100 GB	40	250 million	36 Mbit/s

For more information, refer to the following sections:

Guidelines for disk space
Guidelines for scheduling connectors on ETL Engines
Guidelines for service connectors

Guidelines for disk space

The default values allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period setting for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust these numbers accordingly.

You need additional disk space for special activities:

Bulk import of data, for example, for addition of new data sources with historical data.
Recovery import of data when a data source stops for a day or two for any reason and has to be recovered.

For these special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).

Guidelines for scheduling connectors on ETL Engines

Follow these guidelines when scheduling connectors to avoid congestion. These scheduled connectors are periodically started by the scheduler as batch jobs and run as separate processes.

Do not configure a single connector instance to populate more than two million samples in each run. You can configure multiple connector instances to divide the work into smaller chunks, or you can configure multiple ETL Engine servers to manage the higher volume, when the data source can support it.
Schedule connectors so that no more than one connector instance is running on any CPU core at any given time. Take into account the amount of time required to execute each connector. For example, if an ETL Engine server is configured with two CPU cores, ensure that no more than two connectors are running at any given time.
If you are trying to scale up by increasing the size of the ETL Engine machine, then consider that certain types of connectors require significantly more memory than other connectors. For example, connectors written in Java might require twice as much memory, or more, than other types of connectors.
Avoid congestion at the warehousing engine. Do not import data in large volumes, such as when recovering historical data. Split this data into smaller chunks.
A single ETL Engine can run no more than four service connectors.
Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, this means that running two service connectors will require the heap size to be increased to 2 GB.
The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer.

Note

You can monitor and control the internal processing of collected data by viewing statistics such as processing throughput and time in the this topic. Data needs to be processed by the warehousing engine before it can be analyzed, modeled, and reported in the console.

Guidelines for service connectors

For service connectors, which run continuously within the same process as the scheduler, there are stricter guidelines as follows.

A single ETL Engine computer can run no more than four service connectors.
Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, running two service connectors will require the heap size to be increased to 2 GB.
The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer. If you need to add more than two service connectors, use the larger 8 GB RAM size ETL Engine computer.

Modifying the heap size of the ETL Engine scheduler

In the installation directory on the ETL Engine computer, open the customenvpre.sh file for editing.
In the #SCHEDULER section, search for the following statements:
#SCHEDULER_HEAP_SIZE="1024m"#export SCHEDULER_HEAP_SIZE
If the statements are preceded by a '#' character, remove it from both to uncomment them, and modify the number (1024m) to the new heap size.
Restart the scheduler.

Dedicated Remote ETL Engine and Gateway Server

Ensure that the Linux computer where you plan to install the Remote ETL Engine and Gateway Server meets the following requirements:

System requirements

Hardware requirements

CPU - 2 processors
RAM - 8 GB
Disk - 218 GB

The Gateway Server supports x86 and x86-64 architecture.

Supported operating systems

The computer that hosts the Remote ETL Engine and Gateway Server must be running one of the following operating systems. Only x86_64 architecture is supported for the operating systems.

Operating system	Version	Required libraries
Red Hat Enterprise Linux	6.8 and later	perl-5.10.1, libXtst, open-sans-fonts.noarch
	7.5 and later	perl-5.16.3, libXtst, liberation-sans-fonts, libXrender
	8.0 and later	perl-5.16.3, libXtst, liberation-sans-fonts, libXrender
Oracle Linux	6.8 and later	perl-5.10.1, libXtst, dejavu-sans-fonts.noarch
Oracle Linux	7.5 and later	perl-5.16.3, libXtst, liberation-sans-fonts
CentOS	6.8 and later	perl-5.10.1, libXtst, open-sans-fonts.noarch
CentOS	7.5 and later	perl-5.16.3, libXtst, liberation-sans-fonts
SUSE Linux Enterprise Server	12 SP4 and later 15 SP1 and later	perl-libwww-perl

¹ - Ensure that a compatible Korn shell is installed on a Red Hat Enterprise Linux 6.x system. The Korn shell is not installed by default
.

Other requirements

Ensure that the available temporary disk space is greater than 500 MB. The Gateway Server installer uses the following environment variables in the listed order to access this space:
- $IATEMPDIR environment variable
- /tmp
- Your home directory
On Linux systems, ensure that at least 2 MB of free space is reserved for the /etc file system on all the managed systems, and you have the execute (x) permission for the /etc directory.
Ensure that TCP/IP is installed on the Gateway Server computer and the managed systems that run Capacity Agents. The Gateway Server uses the TCP/IP protocol to communicate with these managed systems.
Install the pcron utility that is required to schedule Manager runs simultaneously.
If you want to use remote data repositories on a UNIX network file system, ensure that the rpc.lockd and rpc.statd NFS lock manager daemons run on both the client and server computers.
For Linux systems, ensure that the installation directory must be on a standard Linux filesystem such as ext3 or a Global File System (GFS) for high availability deployments (Active/Passive servers). Installation on a CIFS share mounted as a filesystem is not supported. You can use CIFS share only for the shared repository.

Review the following guidelines to estimate the required disk, memory, and processor capacity:

Sizing and scalability guidelines

Remote ETL Engine

ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:

The required data processing throughput in samples per day. This value is the multiplication of number of managed entities and the average number of samples collected for each entity in a day.
The number of connector instances (tasks) scheduled on the ETL Engine.

Guidelines for disk space

The default values allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period setting for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust these numbers accordingly.

You need additional disk space for special activities:

Bulk import of data, for example, for addition of new data sources with historical data.
Recovery import of data when a data source stops for a day or two for any reason and has to be recovered.

For these special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).

Gateway Server

The following factors affect the sizing drivers:

Size of your environment
Number of Agents from which data is collected
Data retention period

Data retention

The data collected by Capacity Agents (in the UDR format) is periodically transferred to the Gateway Server where it is automatically processed into hourly intervals and saved in text files called VIS files. BMC recommends the following guidelines for the retention period of UDR and VIS files:

Keep about three months' worth of VIS files. These files are useful if you need to "recover" data. For more information, see Recovering data.
For UDR and VIS files, keep three months of data. Out of this data, the oldest two months data can be compressed to save space.

The following table provides the recommendations for the hardware capacity according to the environment size.

Size of your environment	Number of servers	Processor cores per server	RAM (4GB/core) per server	IOPS per server	Number of tasks per server	Storage (in GB) per server
Size of your environment	Number of servers	Processor cores per server	RAM (4GB/core) per server	IOPS per server	Number of tasks per server	Installation	Remote ETL Engine	Gateway Server
Small (up to 1000 servers)	1	2	8	60	20	8	50	160
Medium (up to 5000 servers)	2	4	16	150	40	8	100	395
Large (up to 10000 servers)	3	4	16	200	40	8	100	525

The calculations for the Gateway Server storage requirements are based on the following assumptions:

Metric resolution - 60 minutes
Processing window - 4 hours
UDR data retention - 1 month
VIS data retention - 3 months
UDR spill interval - 15 minutes

Capacity Agent

Review the following topics to plan for the Capacity Agent installation:

The following diagram shows some example configurations for the on-premises components and how they interact with BMC Helix Capacity Optimization:

Arrows in this diagram represent the direction in which the connection is made to open the ports.

Depending on the data that you are importing, you can use one or more of the three possible VM configurations (VM1, VM2, or VM3). For example, if you want to import data for Remote ETL, Gateway Server, you need VM1 configuration. To connect to only Remote ETL to import data, use VM3 configuration.

Planning for data collection in the on-premises environment

Prerequisite

Hardware requirements

Supported operating systems

Guidelines for disk space

Guidelines for scheduling connectors on ETL Engines

Guidelines for service connectors

Modifying the heap size of the ETL Engine scheduler

Hardware requirements

Supported operating systems

Other requirements

Remote ETL Engine

Guidelines for disk space

Gateway Server

Data retention

Comments