Large-scale installations

Most customer environments are too large to be managed by the two-server infrastructure described in Simple-installations. Fortunately, you can add BMC Server Automation components to provide greater management capacity. This section describes the use of additional infrastructure to provide greater capacity for a BMC Server Automation installation.

Considerations for adding Application Server instances
Considerations for scaling the file server
Considerations for virtualized environments
Considerations for load balancing

Considerations for adding Application Server instances

To meet the demands of a larger data center, you can deploy additional Application Servers. Most commonly, you need to add job servers to provide support for a larger number of managed servers. In some cases, you might also need to deploy additional configuration (UI) servers to support a larger user population.

Increasing job throughput

To execute more jobs against more targets in a given period of time, it is usually necessary to increase the number of work item threads (WITs) available to execute jobs. The number of WITs is a configurable option of each job server, but the number of WITs per job server is normally limited by the amount of memory available in a single Application Server. In most cases, adding WITs means configuring another Application Server.

Frequently, a physical server has enough CPU and other resources to host several times the total number of WITs that can be run in a single Application Server. A rule of thumb is to install Application Servers on physical servers based on the assumption that each Application Server requires:

2 CPU cores
Physical memory sufficient for the Application Server process (4 GB for a 32-bit Application Server, and 8-10 GB for a 64-bit Application Server).

Under these guidelines, a typical eight-core server computer with sufficient memory can support three to four Application Servers. When figuring required RAM for the physical server, remember to allow memory for the OS and for other processes running on the computer, including the Application Server launcher.

For more detailed suggestions on memory and WIT settings for job servers, see Recommendations-for-Application-Servers-of-type-Job.

Support for more users

For environments that support a large user population, you might need to increase the number of configuration (UI) servers in the installation. The workload required to support a user varies widely, but as a starting point, BMC recommends the following:

Install one configuration (UI) server for every 50 concurrent logged-in users.
Expect as many as 20% of total users to be logged in at any one time.

In combination, these guidelines call for one configuration (UI) server for every 250 users.

Limits to growth

Neither the Oracle nor the SQL Server has a theoretical limit on the number of database connections that a database server can support. However, the actual physical resources available on the database server impose a practical limit on the number of database connections that a database server can maintain. This, in turn, limits the total number of Application Servers a particular BMC Server Automation implementation can support.

You can control the minimum and maximum number of database connections maintained by an Application Server by adjusting user-configurable settings for the various database connection pools. If you plan to establish an extremely large BMC Server Automation implementation, you should use the information in Database connection settings to estimate the total number of database connections required for the implementation. Then work with the local DBA and database vendor to ensure that the database server is capable of supporting that load.

Considerations for scaling the file server

The BMC Server Automation design requires a designated file server to host the files in the BMC Server Automation Depot. The file server is simply a server running the RSCD agent.

For environments in which all Application Servers are hosted on Linux or Oracle Solaris hosts, a network-attached storage (NAS) filer using NFS can act as a kind of virtual file server, a configuration that offers several benefits in terms of performance and scalability. In this configuration, a share exported by the filer is mounted at the same mount point on each computer that hosts an Application Server, making the share appear to be local storage for each Application Server. The file server is then defined to be localhost, and the file storage path is that on which the shared storage is mounted.

This configuration offers potentially improved performance because the NFS protocol used by the filer exhibits better performance over the network than does the NSH protocol. In addition, this configuration allows the use of clustered NAS servers, allowing for redundancy and higher performance.

Considerations for virtualized environments

When BMC Server Automation Application Servers are hosted in (guest) virtual machines (VMs) in a virtualized environment, you can either deploy multiple Application Servers in a single VM or spread the Application Servers across separate VMs.

In virtualized environments, BMC recommends deploying no more than two Application Servers instances in a single VM. In addition, for best performance, do the following:

Ensure that the VM that runs the Application Server has dedicated CPU and memory for the full allocation of CPU and memory.
Ensure the Application Server VMs run on hosts with sufficient resources and the hosts are not oversubscribed for CPU or memory.
If possible, run multiple Application Server VMs on separate host systems instead of running multiple Application Server VMs on the same host (dependent on host resources).

For information about Application Server performance and scalability, see Sizing-and-scalability-factors.

Considerations for load balancing

In large deployments involving multiple instances of some or all BMC Server Automation components, it might be necessary to provide load balancing services to ensure that the extra resources being applied are being utilized appropriately.

Job servers effectively perform their own load balancing, scheduling jobs and work items according to availability. No additional load balancing considerations are applicable for job servers.

Two strategies for load balancing are commonly applied for configuration (UI) servers:

In cases where the user population and behaviors support it, you can achieve a crude but effective load balancing simply by using a round-robin DNS for the AuthServiceURL defined in the client's authenticationProfile.xml.
For more homogeneous load balancing, you must add an external load balancer to the installation and use it to distribute the load across configuration (UI) servers. The BIG-IP product by F5 is a common choice for this purpose.

Load balancers for configuration (UI) servers

There are security implications of introducing load balancers for configuration (UI) servers. Because the load balancer appears as a "man in the middle" in the path between the client and the configuration and Authentication Servers, some adjustments in the security configuration of the installation are called for.

Authentication process

In the load balancer environment for configuration (UI) servers, clients first log in by contacting an Authentication Server through the load balancer. A dialog ensues which, if completed successfully, results in the delivery of a session credential from the Authentication Server to the client.

The session credential produced by the Authentication Server includes both the IP address of the client, and the URLs for the servers that the client is authorized to access. The presence of the load balancer means that, from the point of view of the Authentication Server, the client's IP address is that of the load balancer, unless the load balancer is configured to pass through the client's IP address.

Load balancer configuration

A BMC Server Automation UI client typically authenticates (just once) to an Authentication Server and then uses the returned session credentials to initiate several related connections to the configuration (UI) server. While it is not necessary for the Authentication Server and the configuration (UI) server to be the same Application Server, or even on the same physical device, it is important that all configuration (UI) server connections that are part of the same session actually complete to the same configuration (UI) server. The configuration (UI) server maintains a certain session-specific state. Spreading a single session's connections across multiple configuration (UI) servers might result in inconsistent data being returned to the client, or in a server not having enough state to fulfill a client request.

You can configure most load balancing devices to deal with this difficulty, by specifying that all connections from the same client be directed to the same configuration (UI) server. Depending on the vendor, this attribute is be known as connection stickiness or persistence. BMC recommends configuring persistence or stickiness based on the client IP address.

An open connection between a UI client and a configuration (UI) server might sit idle for an extended period while some long-running operation completes. This idle connection presents no difficulty in the direct-connection case. However, a load balancer might prematurely close such a connection because it concludes that it is no longer in use. To guard against this possibility, BMC recommends configuring the load balancer's connection timeout value to be at least one hour.

One thing to consider is whether the AppServiceURL even needs to be load balanced. If the initial authentication request is evenly balanced across the configuration (UI) servers, and the servers apply an AppServiceURL that directs the client to an application server, this is much the same as using the session persistence with the AppServiceURL pointing to the load balancer. So it is not necessary to send the AppServiceURL through the load balancer — only the AuthServiceURL needs to point to the load balancer. There is no session failover on the application server, so if a user has an object open and makes changes, and the application server that they are connected to dies, it is unlikely that they will be able to save the object against the other application server if they are redirected — it would depend on the specific operation and if it required any temporary files that would only exist on the application server. After you reach your final decision on whether to load balance the AppServiceURL, ensure that you adjust the relevant Application Server parameters accordingly, as discussed in Load balancer environments.

Load balancers for NSH proxy servers

You can use load balancers to spread workload among multiple NSH proxy servers. In this case, authentication is essentially the same as for the configuration (UI) server balancing case, but traffic after authentication is directed to NSH proxy servers rather than to configuration (UI) servers. (Of course, a given Application Server can act as both configuration (UI) server and NSH proxy server.) The same note about using the direct connection for the ProxyServiceURL applies here as well. After you reach your final decision on whether to load balance the ProxyServiceURL, ensure that you adjust the relevant Application Server parameters accordingly, as discussed in Load balancer environments.

Health check considerations

The load balancer typically checks the application service ports very frequently. This raises the following points for your consideration:

Such frequent checks can consume the available connections on the Application Server and prevent users from logging on or otherwise using the GUI. If that is the case, you can increase the check interval or you can increase the MaxClientContexts and MaxWorkerThreads settings through the blasadmin utility. Typically, a check interval of 5 seconds might cause this behavior in a default configuration. It is recommended to try raising the check interval to 10 or 15 seconds or higher.
Health checks result in a lot of noise in the Application Server logs. The messages might look similar to the following:
[03 Aug 2010 16:37:17,277] [Nsh-Proxy-Thread-1] [WARN] [Anonymous:Anonymous:10.10.40.34] [BLSSOPROXY] Connection closed by /10.10.40.34:51135 before pre-authentication handshake could be completed.
[03 Aug 2010 16:37:17,277] [Nsh-Proxy-Thread-1] [INFO] [Anonymous:Anonymous:10.10.40.34] [BLSSOPROXY] failure establishing session with proxy service
[03 Aug 2010 16:37:17,277] [Nsh-Proxy-Thread-1] [INFO] [Anonymous:Anonymous:10.10.40.34] [BLSSOPROXY] NSH Proxy Connection closed
To stop these messages from showing up, you can put the following entries in the log4j.properties file for the Application Server deployment:
# GUI Connections
log4j.logger.com.bladelogic.om.infra.auth.service.AuthSvcWorkerThread=ERROR
log4j.logger.com.bladelogic.om.infra.auth.service.AuthenticationServiceImpl=WARN
log4j.logger.com.bladelogic.om.infra.app.service.client.ClientConnectionManager=ERROR
log4j.logger.com.bladelogic.om.infra.mfw.net.BaseServerConnection=ERROR
log4j.logger.com.bladelogic.om.infra.mfw.net.ClientWorkerThread=ERROR
# NSH Proxy Connections
log4j.logger.com.bladelogic.om.infra.mfw.fw.NshProxyWorkerThread=ERROR
log4j.logger.com.bladelogic.om.infra.app.service.client.BaseNshProxyConnectionManager=ERROR
This will make the Application Server logs more readable. This will exclude some messages for normal client logons. Another option to consider is increasing the health check interval or reducing the frequency.