Planning for data collection in the on-premises environment

This section describes how to plan for the installation of Remote ETL Engine and related components that are required to collect data from your environment. If you want to use Capacity Agents for data collection, plan for the deployment of BMC Helix Client Gateway and Gateway Server.

When using the Gateway Server, you must install it on the same computer where the Remote ETL Engine is installed. Also, it is a best practice to install BMC Helix Client Gateway on the same computer. 

BMC recommends using a dedicated Remote ETL Engine for processing data from the Gateway Server and a separate Remote ETL Engine for processing data from other on-premises ETLs. The use of separate Remote ETL Engines ensures efficient data processing. 

Remote ETL Engine

Prerequisite

Ensure that the Helix system user who runs the Remote ETL Engine has permissions to use the system crontab file:

  • If the host has a pcron.deny policy, ensure that the Helix user is not included in it.
  • If the host has a cron.allow policy, add the Helix user to it.

Ensure that the Linux computer where you plan to install the Remote ETL Engine meets the following requirements:

Hardware requirements

  • CPU - 2 processors
  • RAM - 8 GB
  • Disk - 58 GB

Supported operating systems

Computers or virtual machines that host the Remote ETL Engine components must be running one of the following supported operating systems. Only x86_64 architecture is supported for the operating systems.

Operating system

Version

Required libraries

Red Hat Enterprise Linux

6.8 and later

perl-5.10.1, libXtst, open-sans-fonts.noarch
7.5 and laterperl-5.16.3, libXtst, liberation-sans-fonts, libXrender
8.0 and laterperl-5.16.3, libXtst, liberation-sans-fonts, libXrender

Oracle Linux

6.8 and later

perl-5.10.1, libXtst, dejavu-sans-fonts.noarch
7.5 and laterperl-5.16.3, libXtst, liberation-sans-fonts

CentOS

6.8 and later

perl-5.10.1, libXtst, open-sans-fonts.noarch
7.5 and laterperl-5.16.3, libXtst, liberation-sans-fonts

SUSE Linux Enterprise Server

12 SP4 and later

15 SP1 and later

perl-libwww-perl

Review the following guidelines to estimate the required disk, memory, and processor capacity:

ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:

  • The required data processing throughput in samples per day. This value is the multiplication of number of managed entities and the average number of samples collected for each entity in a day.
  • The number of connector instances (tasks) scheduled on the ETL Engine.
Use the sizing and scalability guidelines to determine the hardware capacity for ETL Engine servers according to your environment size:

ETL Engine configuration

Number of servers

Disk space per server

Number of tasks per server

Up samples per day

Bandwidth

2 CPU cores @ 2GHz, 8 GB RAM

1

8 + 50 GB

205 million2.4 Mbit/s

4 CPU cores@ 2 GHz, 16 GB RAM

2

8 + 100 GB

40

25 million

4.8 Mbit/s
4 CPU cores@ 2 GHz, 16 GB RAM38 + 100 GB4050 million7.2 Mbit/s
4 CPU cores@ 2 GHz, 16 GB RAM68 + 100 GB40100 million14.4 Mbit/s
4 CPU cores@ 2 GHz, 16 GB RAM158 + 100 GB40250 million36 Mbit/s

For more information, refer to the following sections:

Guidelines for disk space

The default values allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period setting for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust these numbers accordingly.

You need additional disk space for special activities:

  • Bulk import of data, for example, for addition of new data sources with historical data.
  • Recovery import of data when a data source stops for a day or two for any reason and has to be recovered.

For these special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).

Guidelines for scheduling connectors on ETL Engines

Follow these guidelines when scheduling connectors to avoid congestion. These scheduled connectors are periodically started by the scheduler as batch jobs and run as separate processes.

  • Do not configure a single connector instance to populate more than two million samples in each run. You can configure multiple connector instances to divide the work into smaller chunks, or you can configure multiple ETL Engine servers to manage the higher volume, when the data source can support it.
  • Schedule connectors so that no more than one connector instance is running on any CPU core at any given time. Take into account the amount of time required to execute each connector. For example, if an ETL Engine server is configured with two CPU cores, ensure that no more than two connectors are running at any given time.
  • If you are trying to scale up by increasing the size of the ETL Engine machine, then consider that certain types of connectors require significantly more memory than other connectors. For example, connectors written in Java might require twice as much memory, or more, than other types of connectors.
  • Avoid congestion at the warehousing engine. Do not import data in large volumes, such as when recovering historical data. Split this data into smaller chunks.
  • A single ETL Engine can run no more than four service connectors.
  • Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, this means that running two service connectors will require the heap size to be increased to 2 GB.
  • The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer.

Note

You can monitor and control the internal processing of collected data by viewing statistics such as processing throughput and time in the this topic. Data needs to be processed by the warehousing engine before it can be analyzed, modeled, and reported in the console.

Guidelines for service connectors

For service connectors, which run continuously within the same process as the scheduler, there are stricter guidelines as follows.

  • A single ETL Engine computer can run no more than four service connectors.
  • Service connectors use heap memory in the scheduler process, so increase the heap size by 512 MB for each service connector. For example, if the default heap size is 1 GB, running two service connectors will require the heap size to be increased to 2 GB.
  • The total heap size of the scheduler should remain within half the total memory of the ETL Engine computer. If you need to add more than two service connectors, use the larger 8 GB RAM size ETL Engine computer.

Modifying the heap size of the ETL Engine scheduler

  1. In the installation directory on the ETL Engine computer, open the customenvpre.sh file for editing.
  2. In the #SCHEDULER section, search for the following statements:
    #SCHEDULER_HEAP_SIZE="1024m"#export SCHEDULER_HEAP_SIZE
  3. If the statements are preceded by a '#' character, remove it from both to uncomment them, and modify the number (1024m) to the new heap size.
  4. Restart the scheduler.
Dedicated Remote ETL Engine and Gateway Server

Ensure that the Linux computer where you plan to install the Remote ETL Engine and Gateway Server meets the following requirements:

Hardware requirements

  • CPU - 2 processors
  • RAM - 8 GB
  • Disk - 218 GB

The Gateway Server supports x86 and x86-64 architecture.

Supported operating systems

The computer that hosts the Remote ETL Engine and Gateway Server must be running one of the following operating systems. Only x86_64 architecture is supported for the operating systems.

Operating system

Version

Required libraries

Red Hat Enterprise Linux

6.8 and later

perl-5.10.1, libXtst, open-sans-fonts.noarch
7.5 and laterperl-5.16.3, libXtst, liberation-sans-fonts, libXrender
8.0 and laterperl-5.16.3, libXtst, liberation-sans-fonts, libXrender

Oracle Linux

6.8 and later

perl-5.10.1, libXtst, dejavu-sans-fonts.noarch
7.5 and laterperl-5.16.3, libXtst, liberation-sans-fonts

CentOS

6.8 and later

perl-5.10.1, libXtst, open-sans-fonts.noarch
7.5 and laterperl-5.16.3, libXtst, liberation-sans-fonts

SUSE Linux Enterprise Server

12 SP4 and later

15 SP1 and later

perl-libwww-perl


1
- Ensure that a compatible Korn shell is installed on a Red Hat Enterprise Linux 6.x system. The Korn shell is not installed by default
.

Other requirements

  • Ensure that the available temporary disk space is greater than 500 MB. The Gateway Server installer uses the following environment variables in the listed order to access this space:
    • $IATEMPDIR environment variable
    • /tmp
    • Your home directory
  • On Linux systems, ensure that at least 2 MB of free space is reserved for the /etc file system on all the managed systems, and you have the execute (x) permission for the /etc directory.
  • Ensure that TCP/IP is installed on the Gateway Server computer and the managed systems that run Capacity Agents. The Gateway Server uses the TCP/IP protocol to communicate with these managed systems.
  • Install the pcron utility that is required to schedule Manager runs simultaneously.
  • If you want to use remote data repositories on a UNIX network file system, ensure that the rpc.lockd and rpc.statd NFS lock manager daemons run on both the client and server computers.

  • For Linux systems, ensure that the installation directory must be on a standard Linux filesystem such as ext3 or a Global File System (GFS) for high availability deployments (Active/Passive servers). Installation on a CIFS share mounted as a filesystem is not supported. You can use CIFS share only for the shared repository.

Review the following guidelines to estimate the required disk, memory, and processor capacity:

Remote ETL Engine

ETL Engine servers can be scaled horizontally and vertically. The major sizing drivers for ETL Engine servers are:

  • The required data processing throughput in samples per day. This value is the multiplication of number of managed entities and the average number of samples collected for each entity in a day.
  • The number of connector instances (tasks) scheduled on the ETL Engine.

Guidelines for disk space

The default values allow for ten days of temporary files and log files accumulated during the normal day-to-day population activity. The default period setting for the File System Cleaner system task is ten days. If you increase this period for any reason, adjust these numbers accordingly.

You need additional disk space for special activities:

  • Bulk import of data, for example, for addition of new data sources with historical data.
  • Recovery import of data when a data source stops for a day or two for any reason and has to be recovered.

For these special activities, estimate additional capacity using the number of anticipated additional samples per day. Temporary files and logs from these samples will remain on the disk for ten days (or whatever the File System Cleaner system task period is set to).

Gateway Server

The following factors affect the sizing drivers:

  • Size of your environment
  • Number of Agents from which data is collected
  • Data retention period

Data retention

The data collected by Capacity Agents (in the UDR format) is periodically transferred to the Gateway Server where it is automatically processed into hourly intervals and saved in text files called VIS files. BMC recommends the following guidelines for the retention period of UDR and VIS files:

  • Keep about three months' worth of VIS files. These files are useful if you need to "recover" data. For more information, see Recovering data.
  • For UDR and VIS files, keep three months of data. Out of this data, the oldest two months data can be compressed to save space.

The following table provides the recommendations for the hardware capacity according to the environment size.

Size of your environmentNumber of servers

Processor cores per server

RAM (4GB/core) per server

IOPS per serverNumber of tasks per serverStorage (in GB) per server
InstallationRemote ETL EngineGateway Server

Small

(up to 1000 servers)

1286020850160

Medium

(up to 5000 servers)

2416150408100395

Large

(up to 10000 servers)

3416200408100525

The calculations for the Gateway Server storage requirements are based on the following assumptions:

  • Metric resolution - 60 minutes
  • Processing window - 4 hours
  • UDR data retention - 1 month
  • VIS data retention - 3 months
  • UDR spill interval - 15 minutes
Capacity Agent

Review the following topics to plan for the Capacity Agent installation:

The following diagram shows some example configurations for the on-premises components and how they interact with BMC Helix Capacity Optimization:

Arrows in this diagram represent the direction in which the connection is made to open the ports.

Depending on the data that you are importing, you can use one or more of the three possible VM configurations (VM1, VM2, or VM3). For example, if you want to import data for Remote ETL, Gateway Server, you need VM1 configuration. To connect to only Remote ETL to import data, use VM3 configuration. 

Was this page helpful? Yes No Submitting... Thank you

Comments