Sizing and scalability considerations


The sizing baselines specified are based on the performance lab benchmark test results performed in BMC Helix’s test labs. You can use these baselines for your on-premises BMC Helix IT Operations Management (BMC Helix ITOM) deployment. 

The following applications were tested for BMC Helix ITOM sizing considerations:

  • BMC Helix Dashboards
  • BMC Helix Intelligent Automation
  • BMC Helix Developer Tools
  • BMC Helix Log Analytics
  • BMC Helix Operations Management
  • BMC Helix Portal
  • BMC Helix Service Monitoring (BMC Helix AIOps)

Important

  • If you use a combination of some of the products such as BMC Helix Operations Management , BMC Helix Continuous Optimization , and BMC Helix IT Service Management in your environment, contact BMC Support for the sizing guidelines.

  • If you are deploying BMC Helix Operations Management in a multitenant environment, contact BMC Support for specific sizing guidelines.

BMC Helix’s performance testing is based on four different system usage profiles: compact, small, medium, and large.

ProfileDescription
CompactMinimal footprint for small-scale environments
SmallSuitable for limited production workloads
MediumRecommended for standard enterprise deployments
LargeFor high-scale, high-throughput environments

The compact is a special sizing that is the minimum requirement for a functional BMC Helix Platform  system. Compact systems are recommended only for POC systems, where resilience and system performance under load is not a consideration. All compact systems cited on this page are non-high-availability deployments for BMC Helix Operations Management and the BMC Discovery . We recommend the compact sizing for a POC because it is a single-replica deployment.

If your usage exceeds the maximum numbers for the large sizing, contact BMC Support for guidance on how to size your infrastructure.

ParametersCompactSmallMediumLarge

Total number of devices.
Includes PATROL Agents and/or custom devices.

Important: Make sure that you do not exceed the number of monitored instances per device. 

100

 

3000

 

7500

 

15000

 

Monitored instances (Other sources, such as Prometheus and Rest API)

1000

100000

250000

500000

Number of monitored instances per device

10

33

33

33

Monitored attributes (Other sources, such as Prometheus and Rest API)

10000

600000

1500000

3000000

Number of Attributes per device

100

200

200

200

Events per day (Alarm, Anomalies and External Events)

5000

30000

75000

1500000

Configuration policies

100

1000

1500

10000

Number of policies per device

1

2

2

2

Number of groups up to

50

1500

2500

4500

Number of concurrent users

5

50

100

150

BMC Helix AIOps

 

 

 

 

Services

25

500

1000

3000

Situations

10

300

500

1000

BMC Helix Continuous Optimization

 

 

 

 

Ingestion of samples per day in million

50 mil

50 mil

100 mil

500 mil

BMC Helix Log Analytics

 

 

 

 

Log ingestion per day

Important: BMC has certified 250 connectors for a single tenant.

500MB

30GB

100GB

250GB

Number of Logstash

1

5

10

50

Number of days for retention

3

3

3

3

  • Number of MIs (Monitored Instances) per device:
    The values shown are based on internal standard benchmarks and may appear consistent across Small, Medium, and Large, but in practice, the number of MIs per device varies from agent to agent depending on the type and complexity of the monitored device.
    As a reference, an agent can support up to 40,000 MIs. The current number (33 MIs per device) is a standardized baseline we use internally for sizing calculations, and it reflects a conservative estimate derived from real-world deployments.
  • Number of Attributes per device:
    Similar to MIs, this number is standardized based on internal data. While actual numbers may vary, 200 attributes per device is a safe average used for capacity planning purposes.
  • Configuration Policies:
    The configuration policies include:
    • Monitoring policies
    • Alarm policies
    • Event policies
    • Blackout policies
    • Multivariate policies
  • Example of a Monitoring Instance (MI):
    A Monitoring Instance refers to an individual metric or component being monitored—such as CPU usage, memory, disk partition, or interface status. For example, if a device has CPU, memory, and two disks being monitored, it would result in 4 MIs.

 

Kubernetes infrastructure sizing requirements

Compute requirements are the combined requirements of CPU, RAM, and Persistent Volume Disk requirements for the Kubernetes worker nodes.   

These compute requirements are shared between all the worker nodes in your Kubernetes cluste r. The worker nodes in your Kubernetes cluster must have CPU and RAM that matches or exceeds the total infrastructure sizing requirement plus the per worker node logging requirement. This is required to support the anticipated load for the benchmark sizing category for a BMC Helix IT Operations Management deployment.  

Considerations when building a Kubernetes cluster

There are several considerations when building a Kubernetes cluster regarding sizing before considering the application requirements. The application requirements are meant to be included in addition to your other resource requirements. This could include but not be limited to:

  • Kubernetes control plane nodes
  • Kubernetes management software requirements
  • Host operating system requirements
  • Additional software (for example: monitoring software) that is deployed on the cluster

It is important to refer to your distributors and vendors to make sure additional requirements are also included in any cluster planning.

Calculate Your Deployment Sizing

  1. Select your profile size (Compact, Small, Medium, Large).
  2. Identify the product components you plan to deploy.
  3. Use the sizing tables to sum the CPU and memory requirements for each component.
  4. If deploying multiple components, add their values together.
  5. Do not attempt to deduct shared infrastructure like BMC Helix Platform Common Services and Infra services, unless you have explicit sizing data for those components.

The sizing tables for BMC Helix Operations Management and BMC Helix Continuous Optimization are designed to reflect the full load of each product, including shared services. When combining BMC Helix Operations Management and BMC Helix Continuous Optimization, the total sizing already accounts for the infrastructure needed to support both products. Therefore, reducing or subtracting shared components may result in under-provisioning and is not recommended.

In such cases:

  • Use the larger profile between the two products.
  • Add the CPU and memory values from each table without deduction.
  • If your deployment is resource-constrained or highly customized, contact BMC Support for optimization guidance.

Note: The sizing tables are intentionally conservative to ensure performance and scalability. Overestimating slightly is preferable to underestimating.

Example 1: BMC Helix Operations Management and BMC Helix AIOps Deployment 

  • Profile: Medium
  • Components:
    • BMC Helix Operations Management + Intelligent Integration + Intelligent Automation (Core Stack): 67 cores, 370 GB
    • BMC Helix AIOps + AutoAnomaly Add-On: 22 cores, 157 GB
  • Total: 89 cores, 527 GB memory

This configuration excludes Log Analytics and BMC Helix Continuous Optimization. Use this when deploying the core BMC Helix IT Operations Management stack with BMC Helix AIOps capabilities.

Example 2: Full Stack Deployment with BHCO

  • Profile: Medium
  • Components:
    • BMC Helix Operations Management + BMC Helix Intelligent Integration + BMC Helix Intelligent Automation (Core Stack): 67 cores, 370 GB
    • BMC Helix AIOps + AutoAnomaly Add-On: 22 cores, 157 GB
    • BMC Helix Continuous Optimization (Standalone): 38 cores, 180 GB
  • Total127 cores707 GB memory

When deploying BMC Helix Operations Management and BMC Helix Continuous Optimization together, do not attempt to subtract shared infrastructure (BMC Helix Platform Common Services and Infra services). The sizing tables already account for the full load of each product. Use the larger profile and sum the values conservatively to ensure performance.

Kubernetes cluster requirements

The application must have specific hardware resources made available to it for successful deployment and operation. Any competing workloads (such as your Kubernetes management or monitoring software) on the cluster and host operating system requirements must be considered in addition to the BMC Helix IT Operations Management suite requirements when building your Kubernetes cluster.

The following table represents the minimum amount of computing resources that must be made available by the Kubernetes cluster to the BMC Helix IT Operations Management deployment:

Important

The total sizing does not include the requirements for BMC Discovery and BMC Helix Continuous Optimization ​​​​.

Deployment sizeCPU (Core)RAM (GB)
Compact28182
Small78401
Medium99580
Large266

1350

Core stack requirements

Important

The total core stack sizing includes BMC Helix Operations Management, BMC Helix Intelligent Automation, BMC Helix Intelligent Integration, BMC Helix Dashboards, BMC Helix Platform Common Services and Infra services.

This does not include the requirements for BMC Discovery, BMC Helix Continuous Optimization, BMC Helix AIOPs and BMC Helix Log Analytics​​​​.

Deployment sizeCPU (Core)RAM (GB)
Compact22112
Small56256
Medium67370
Large179

932

Important

This is the core requirement of your stack. If you plan to use addons, you must add additional sizing requirements as per the following addons table.

BMC Helix AIOPs and Autoanomaly add-ons

This must be added on top of BMC Helix Operations Management deployment.

Deployment sizeCPU (Core)RAM (GB)
Compact435
Small1298
Medium22157
Large62

326

BMC Helix Log Analytics add-ons

BMC Helix Log Analytics can be deployed:

  • As a standalone product, or
  • As an add-on to BMC Helix Operations Management deployment.
Deployment sizeCPU (Core)RAM (GB)
Compact235
Small1046
Medium1154
Large25

92

Sizing requirements for BMC Helix Continuous Optimization

Important

The total sizing includes the requirements of BMC Helix Continuous Optimization, BMC Helix Platform common Services and Infra services.

The following table provides the sizing requirements of BMC Helix Continuous Optimization standalone deployment. If BMC Helix Continuous Optimization is deployed alongside BHOM, ensure shared components (e.g., PCS, Infra) are counted only once.

Deployment size  CPU requests (Millicore)CPU limits (Millicore)MEM request (GB)MEM limit (GB)
Compact2040010405078.6205.7
Small54280180950172.2336.8
Medium67780263150309.1544.2
Large154180444300780.11525.1

​​​​

Kubernetes quotas

Quotas may be set up on the cluster namespaces to enforce maximum scheduled requests and limits. Any attempt to schedule additional workloads beyond configured quotas will result in Kubernetes preventing the scheduling which may complicate successful software operations in the namespace.

Important

To avoid issues related to scaling and consumption of microservices, it's important to follow recommended namespace quota settings based on your deployment size.

The following table shows the recommended settings to allow a BMC Helix IT Operations Management suite deployment:

Important

The total sizing does not include the requirements for BMC Discovery and BMC Helix Continuous Optimization ​​​​.

Deployment SizeCPU requests 
(Millicore)

CPU limits
(Millicore)

MEM requests
GB
MEM limits
GB
Compact29474162446188369
Small82274301220390686
Medium1028644044965881056
Large27368471764613781962

Kubernetes node requirements

Your cluster must maintain a minimum number of worker nodes to provide an HA-capable environment for the application data lakes.

To support the loss of worker nodes in your cluster you must provide extra worker nodes with resources equal to your largest worker node. This way, if a worker node goes down you will maintain the minimum number of resources required in the cluster to recover the application.
For example: If you have 4 nodes of 10 vCPU and 50GB RAM, you will need a 5th node of 10 vCPU and 50GB RAM to not have recovery impacted by the loss of one worker node.

Important

The total amount of vCPU and RAM resources selected for the worker nodes must match or exceed the required vCPU and RAM specified in the Kubernetes cluster sizing requirements.

Deployment SizeMinimum worker nodes available
Compact4
Small6
Medium6
Large9

Worker node disk requirements

Kubernetes worker nodes require the following free disk space allocation for container images:

Requirement

Value

Worker node system disk

At least 150 GB

Pod specifications

The BMC Helix ITOM Pod specifications spreadsheet provides detailed information for sizing your environment. Cluster architects can use the information to help determine the node sizes and cluster width.

Consider the following resource requirements of the largest pod:

  • In a large deployment, the largest pod requires 13 CPUs and 34 GB of RAM.
  • In a medium deployment, the largest pod requires 7 CPUs and 17 GB of RAM.
  • In a small deployment, the largest pod requires 7 CPUs and 8 GB of RAM.
  • In a compact deployment, the largest pod requires 3 CPUs and 7 GB of RAM.

When reviewing the specification spreadsheet, check the large replica counts to ensure that your cluster width is sufficient.

Persistent volume requirements

The high performance of Kubernetes Persistent Volume Disk is essential for the overall system performance. BMC supports a Bring-Your-Own-Storage class for Kubernetes persistent volumes.

Important

Your storage class for the Kubernetes persistent volumes must support volume expansion and dynamic provisioning.

The following tables show the disk requirements in GB:

Block Storage (GB)

Compact  2454
Small  4842
Medium  7102
Large  23242​​​

 Read Write Many Storage (GB)

Compact  91
Small  91
Medium  91
Large  91

 

We recommend that you use solid-state drive (SSD) with the following specifications:  

Block Storage SSD Recommendations

Specification  

Compact  

Small  

Medium  

Large  

Average latency  

< 100ms  

< 100ms  

< 100ms  

< 100ms  

Write throughput  

20 MB/s  

150 MB/s  

165 MB/s  

200 MB/s  

Read throughput  

100 MB/s  

800 MB/s  

1 GB/s  

1.2 GB/s  

IOPS Write  

1K  

3K  

3.2K  

3.5K  

IOPS Read  

3K  

10K  

11K  

12K  

RWM throughput and IOPS requirements:  

SpecificationValue
Read throughput10 MBPS 
Write throughput5 MBPS
IOPS Read 3K
IOPS Write1K

 

Sizing guidelines for the BMC Discovery 

Deployment Size

CPU

RAM (GB)

Disk (GB)

Number of servers per environment

Compact (not in high availability)

4

8

100

1

Small

16

32

300

3

Medium

16

32

500

3

Large

20

64

1000

5

For BMC Discovery sizing guidelines, refer to the Sizing and scalability considerations topic in the BMC Discovery documentation.

EFK Logging requirements

Deploying the Helix logging stack comes with additional hardware requirements that the Kubernetes cluster must be able to provide as well as the expected namespace quotas.

Deployment sizeCPU (Core)RAM (GB)PVC (with 3 days retention)
Compact219200 GB
Small619500 GB
Medium10201100 GB
Large12

31

2100 GB

EFK Logging Quota

Deployment size

CPU Requests
(Millicore)

MEM Requests (GB)CPU Limits (Millicore)MEM Limits (GB)
Compact16009.251150023
Small16009.251450023
Medium22009.251650023
Large728030.253250035

FluentBit Daemonset

The Helix logging stack utilizes FluentBit collectors as a daemonset to access logs on the worker nodes. Requirements for the collectors are in addition to the previous requirements and depend on the number of worker nodes that are in the cluster. Use the table below to determine the size of the pod in your deployment and multiply the requirements by the number of pods in your cluster. The cluster will additionally require the value of requests calculated.

Deployment sizeCPU Requests
(Millicore)
CPU Limits (Millicore)MEM Requests (GB)MEM Limits (GB)
Compact50600.150.18
Small50600.150.18
Medium2102500.150.18
Large2102500.280.32

​​​​

For example, to get the total quota value, multiply your worker node count with a value in Fluenbit Daemonset table, and add a value in the EFK Logging Quota table.

Assume that you have 4 worker nodes in your compact size cluster. Your total quota calculation will be:

4 * 50 + 2200 = 2400 m

Disaster recovery requirement

If you enable disaster recovery, you will need additional processor, memory, and disk space to operate successfully. The following guidance is based on using the default disaster recovery configurations. Any modification to these settings might impact the amount of disk storage that is necessary and must be recalculated.  
The following tables list the additional resources required in the Kubernetes cluster (per data center): 

Deployment size 

CPU (Core)  

RAM (GB)  

MinIO storage per PVC

Total MinIO storage requirement (4 PVC)

Compact  

6

30

900

3600

Small  

10 

38  

1050

4200

Medium  

11

49

2225

8900

Large  

12

62

10625

42501

Important

The values in the table provide information about the resources required to store data for a single day.

To retain data for more than a day, multiply the resource requirement (R) by the number of days (N) you want to retain it. 

For example, if you use a compact deployment size and want to keep PVC data for three days, you will need 9000 GB (3000 x 3).

The following tables list the additional recommendations to add to the namespace quotas (per data center):  

BMC Helix IT Operations Management Namespace Quotas (DR Additions)  

Deployment size  

 CPU Requests (Millicore)  

CPU Limits (Millicore)  

MEM Requests (GB)  

MEM Limits (GB)  

Compact  

6000  

26000

30

85  

Small  

10000  

30000  

38

86

Medium  

11000

36000  

49 

91

Large  

12000

55000  

62 

112

RPO and RTO measurements 

Recovery Point Objective (RPO) is the time-based measurement of tolerated data loss. Recovery Time Objecting (RTO) is the targeted duration between an event failure and the point where the operations resume.  

The following table lists the RPO and RTO measurements:

Deployment size

Recovery Point Objective (RPO) 

Recovery Time Objecting (RTO)

Loss in productivity

Compact

24 hours 

1 hour 30 minutes

3 hours

Small

24 hours 

1 hour 30 minutes

3 hours

Medium

24 hours

2 hours

4 hours 

Important

You will need the bootstrap timing when you start the first full backup. You must use the information from the next full backup to calculate the RPO numbers. Full backups are performed every 24 hours. 

The RPO, RTO, and backup storage size might vary based on the storage size of your stack's data lake.

Disaster recovery is a new feature, and the RPO and RTO are still being measured for large deployments.

We recommend you perform a trial run of your disaster recovery operation to give you personalized expectations of how your setup and environment will measure at RPO and RTO metrics.

Sizing requirement to enable automatic generation of anomaly events

The auto anomaly services are part of BMC Helix Operations Management and BMC Helix AIOps. 
For more information, see Autoanomalies in the BMC Helix Operations Management documentation.

If you enable automatic generation of anomaly events, you will need additional processor, memory, and disk space to operate successfully. Make sure you add these resources to your cluster.

The following tables list the additional resources required to configure automatic generation of anomaly events: 

Deployment size  

CPU (Core) requests 

 MEM (GB) Requests

  CPU (Cores) Limits

MEM (GB) Limits

PVC (GB)

Compact  

2

12

8

19

20

Small  

7

42

23

65

50

Medium  

11

77

37

128

150

Large  

34

162

78

233

300

 

Sizing considerations for migrating from PostgreSQL database 15.x to 17.x

To migrate data from PostgreSQL database 15.9 to 17.x you must run the PostgreSQL migration utility.

For the migration to be successful, in addition to the resources listed in this topic, the following processor, memory, and storage are required

Deployment size  

CPU (Core) request 

MEM (Gi) Requests  

CPU (Cores) Limits  

MEM (Gi) Limits

PVC (Gi)

Compact  

4

5

13

33

140

Small  

4

6

13

35

140

Medium  

4

6

21

34

195

Large  

7

8

60

115

250

You can reclaim the resources after the upgrade.

The following table gives information about the time to migrate data from PostgreSQL database 15.x to 17.x:

Deployment size  

Time taken (Minute) 

Compact  

7

Small  

11

Medium  

36

Large  

22

 

Sizing requirements to configure the Self-monitoring solution

  • To accommodate BMC Helix Monitoring Agents, the following additional resources are needed in the production cluster:
    • Memory: 7680Mi
    • CPU: 2500m
       

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*