Sizing and scalability considerations


The sizing baselines specified are based on the performance lab benchmark test results performed in BMC’s test labs. You can use these baselines for your on-premises BMC Helix IT Operations Management (BMC Helix ITOM) deployment. 

The following  applications were tested in the BMC test labs for  BMC Helix IT Operations Management  sizing considerations:

  • BMC Helix Continuous Optimization
  • BMC Helix Dashboards
  • BMC Helix Intelligent Automation
  • BMC Helix Developer Tools
  • BMC Helix Log Analytics
  • BMC Helix Operations Management
  • BMC Helix Portal
  • BMC Helix Service Monitoring (BMC Helix AIOps)


Important

  • If you use a combination of some of the products such as BMC Helix Operations Management , BMC Helix Continuous Optimization , and BMC Helix IT Service Management in your environment, contact BMC Support for the sizing guidelines.

  • If you are deploying BMC Helix Operations Management in a multitenant environment, contact BMC Support for specific sizing guidelines.

BMC’s performance testing is based on four different system usage profiles: compact, small, medium, and large. 
The compact is a special sizing that is the minimum requirement for a functional BMC Helix Platform   system. Compact systems are recommended only for POC systems, where resilience and system performance under load is not a consideration. All compact systems cited on this page are non-high-availability deployments for BMC Helix Operations Management and the BMC Discovery . We recommend the compact sizing for a POC because it is a single-replica deployment.

If your usage exceeds the maximum numbers for the large sizing, contact BMC Support for guidance on how to size your infrastructure.


Kubernetes infrastructure sizing requirements

Compute requirements are the combined requirements of CPU, RAM, and Persistent Volume Disk requirements for the Kubernetes worker nodes.   

These compute requirements are shared between all the worker nodes in your Kubernetes cluste r. The worker nodes in your Kubernetes cluster must have CPU and RAM that matches or exceeds the total infrastructure sizing requirement plus the per worker node logging requirement. This is required to support the anticipated load for the benchmark sizing category for a BMC Helix IT Operations Management deployment.  


Considerations when building a Kubernetes cluster

There are several considerations when building a Kubernetes cluster regarding sizing before considering the application requirements. The application requirements are meant to be included in addition to your other resource requirements. This could include but not be limited to:

  • Kubernetes control plane nodes
  • Kubernetes management software requirements
  • Host operating system requirements
  • Additional software (for example: monitoring software) that is deployed on the cluster

It is important to refer to your distributors and vendors to make sure additional requirements are also included in any cluster planning.


Kubernetes cluster requirements

The application must have specific hardware resources made available to it for successful deployment and operation. Any competing workloads (such as your Kubernetes management or monitoring software) on the cluster and host operating system requirements must be considered in addition to the BMC Helix IT Operations Management suite requirements when building your Kubernetes cluster.

The following table represents the minimum amount of computing resources that must be made available by the Kubernetes cluster to the BMC Helix IT Operations Management deployment:

Important

The total sizing does not include the requirements for BMC Discovery.


Kubernetes quotas

Quotas may be set up on the cluster namespaces to enforce maximum scheduled requests and limits. Any attempt to schedule additional workloads beyond configured quotas will result in Kubernetes preventing the scheduling which may complicate successful software operations in the namespace.

Important

To avoid issues related to scaling and consumption of microservices, it's important to follow recommended namespace quota settings based on your deployment size.

The following table shows the recommended settings to allow a BMC Helix IT Operations Management suite deployment:


Kubernetes node requirements

Your cluster must maintain a minimum number of worker nodes to provide an HA-capable environment for the application data lakes.

To support the loss of worker nodes in your cluster you must provide extra worker nodes with resources equal to your largest worker node. This way, if a worker node goes down you will maintain the minimum number of resources required in the cluster to recover the application.
For example: If you have 4 nodes of 10 vCPU and 50GB RAM, you will need a 5th node of 10 vCPU and 50GB RAM to not have recovery impacted by the loss of one worker node.

Important

The total amount of vCPU and RAM resources selected for the worker nodes must match or exceed the required vCPU and RAM specified in the Kubernetes cluster sizing requirements.


Worker node disk requirements

Kubernetes worker nodes require the following free disk space allocation for container images:

Requirement

Value

Worker node system disk

At least 150 GB


Pod specifications

The BMC Helix ITOM Pod specifications spreadsheet provides detailed information for sizing your environment. Cluster architects can use the information to help determine the node sizes and cluster width.

Consider the following resource requirements of the largest pod:

  • In a large deployment, the largest pod requires 13 CPUs and 34 GB of RAM.
  • In a medium deployment, the largest pod requires 7 CPUs and 17 GB of RAM.
  • In a small deployment, the largest pod requires 7 CPUs and 8 GB of RAM.
  • In a compact deployment, the largest pod requires 3 CPUs and 7 GB of RAM.

When reviewing the specification spreadsheet, check the large replica counts to ensure that your cluster width is sufficient.


Persistent volume requirements

The high performance of Kubernetes Persistent Volume Disk is essential for the overall system performance. BMC supports a Bring-Your-Own-Storage class for Kubernetes persistent volumes.

Important

Your storage class for the Kubernetes persistent volumes must support volume expansion and dynamic provisioning.

The following tables show the disk requirements in GB:

We recommend that you use solid-state drive (SSD) with the following specifications:  

RWM throughput and IOPS requirements:  


Sizing guidelines for the BMC Discovery

Category

CPU

RAM (GB)

Disk (GB)

Number of servers per environment

Compact (not in high
availability)

4

8

100

1

Small

8

32

300

3

Medium

8

32

500

3

Large

16

64

1,000

5


For 

BMC Discovery

 sizing guidelines, refer to the Sizing and scalability considerations

 topic in the 

BMC Discovery

 documentation.


Disaster recovery requirement

If you enable disaster recovery, you will need additional processor, memory, and disk space to operate successfully. The following guidance is based on using the default disaster recovery configurations. Any modification to these settings might impact the amount of disk storage that is necessary and must be recalculated.  
The following tables list the additional resources required in the Kubernetes cluster (per data center): 

Deployment size 

CPU (Core)  

RAM (GB)  

MinIO storage per PVC

Total MinIO storage requirement (4 PVC)

Compact  

6

30

900

3600

Small  

10 

38  

1050

4200

Medium  

11

49

2225

8900

Large  

12

62

10,625

42,501

Important

The values in the table provide information about the resources required to store data for a single day.

To retain data for more than a day, multiply the resource requirement (R) by the number of days (N) you want to retain it. 

For example, if you use a compact deployment size and want to keep PVC data for three days, you will need 9000 GB (3000 x 3).


The following tables list the additional recommendations to add to the namespace quotas (per data center):  

BMC Helix IT Operations Management Namespace Quotas (DR Additions)  

Deployment size  

CPU Requests (Millicore)  

CPU Limits (Millicore)  

MEM Requests (GB)  

MEM Limits (GB)  

Compact  

6,000  

26,000

30

85  

Small  

10,000  

30,000  

38

86

Medium  

11,000

36,000  

49 

91

Large  

12,000

55,000  

62 

112

RPO and RTO measurements 

Recovery Point Objective (RPO) is the time-based measurement of tolerated data loss. Recovery Time Objecting (RTO) is the targeted duration between an event failure and the point where the operations resume.  

The following table lists the RPO and RTO measurements:

Deployment size

Recovery Point Objective (RPO) 

Recovery Time Objecting (RTO)

Loss in productivity

Compact

24 hours

1 hour 30 minutes

3 hours

Small

24 hours

1 hour 30 minutes

3 hours

Medium

24 hours

2 hours

4 hours 


Important

You will need the bootstrap timing when you start the first full backup. You must use the information from the next full backup to calculate the RPO numbers. Full backups are performed every 24 hours. 

The RPO, RTO, and backup storage size might vary based on the storage size of your stack's data lake.

Disaster recovery is a new feature, and the RPO and RTO are still being measured for large deployments.

We recommend you perform a trial run of your disaster recovery operation to give you personalized expectations of how your setup and environment will measure at RPO and RTO metrics.


Sizing requirement to enable automatic generation of anomaly events

The auto anomaly services are part of BMC Helix Operations Management and BMC Helix AIOps.
For more information, see Autoanomalies in the BMC Helix Operations Management documentation.

If you enable automatic generation of anomaly events, you will need additional processor, memory, and disk space to operate successfully. Make sure you add these resources to your cluster.

The following tables list the additional resources required to configure automatic generation of anomaly events: 

Deployment size  

CPU (Core) requests 

 MEM (GB) Requests

  CPU (Cores) Limits

MEM (GB) Limits

PVC (GB)

Compact  

1

4

6

16

20

Small  

2

6

19

41

50

Medium  

6

11

33

99

150

Large  

10

18

71

197

300


Sizing guidelines to upgrade BMC Helix ITOM

When you upgrade to BMC Helix ITOM version 24.3, consider the following additional resource requirements:

Sizing considerations to upgrade to OpenSearch 2.15

Starting with BMC Helix ITOM version 24.3, OpenSearch 2.15 is supported for improved security. The upgrade from OpenSearch 1.x to 2.15 is an in-place upgrade and does not require additional steps. However, it is important to back up your OpenSearch data before the upgrade.
For more information, see Backing-up-the-OpenSearch-1-x-data.
The following table lists the resources needed to back up the OpenSearch 1.x data: 

Deployment size 

CPU (Core) request 

MEM (Gi) requests  

CPU (Cores) limits  

MEM (Gi) limits

PVC (Gi)

Compact  

250m

256Mi

1

1Gi

100Gi

Small  

300m

256Mi

2

3Gi

300Gi

Medium  

300m

256Mi

4

6Gi

400Gi

Large  

300m

256Mi

6

8Gi

800Gi

After the upgrade, you can reclaim the resources when you run the backup cleanup utility.
The following table lists the time taken for backup:   

Important

The backup duration depends on the PVC size of the OpenSearch components at the time of backup.

Deployment size 

Backup time (Minutes) 

Migration time (Minutes) 

Compact  

30 

53

Small  

 107

 54

Medium  

 27 

 31

Large  

 69 

 30 


Sizing considerations for migrating from PostgreSQL database 12.9 to 15.x

To migrate data from PostgreSQL database 12.9 to 15.x you must run the  PostgreSQL migration utility.

For the migration to be successful, in addition to the resources listed in this topic, the following processor, memory, and storage are required

Deployment size  

CPU (Core) request 

MEM (Gi) Requests  

CPU (Cores) Limits  

MEM (Gi) Limits

PVC (Gi)

Compact  

4

5

13

33

140

Small  

4

6

13

35

140

Medium  

4

6

21

34

195

Large  

7

8

60

115

250

You can reclaim the resources after the upgrade.

The following table gives information about the time to migrate data from PostgreSQL database 12.9 to 15.x:

Deployment size  

Time taken (Minute) 

Compact  

14

Small  

13

Medium  

14

Large  

15

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*