Deployment scenarios for availability, scalability, and maintainability


This topic describes the different network topology options for TrueSight Orchestration Platform:

Advantages and disadvantages for each topological model are described considering the following factors:

  • Availability
  • Performance and throughput
  • Scalability
  • Maintainability and cost

Single Host - Embedded Remedy SSO

In this network topology, you can have embedded Remedy Single Sign-On (RSSO or Remedy SSO), a repository and a configuration distribution peer (CDP). 

All components are required so it represents the most minimal view to initially consider. RSSO and the repository do not require large system resources, so this is a very good candidate for a development environment.

DeploymentScenario-SingleHost-1.png

The next model is a variation of the first. This provides high availability (HA) capability for only the authentication service.

DeploymentScenario-SingleHost-2.png

The next model is a variation of the first. Its intent is to illustrate the potential collocation of one very important component, the Operator Control Panel (OCP):

DeploymentScenario-SingleHost-3.png

The next model is a variation of the first. Its intent is to illustrate the potential collocation of the OCP and the Health and Value dashboards:

DeploymentScenario-SingleHost-4.png

The following table describes the advantages and disadvantages of using the Single host - Embedded RSSO model.

Factor

Advantages

Disadvantages

Maintainability

This is the simplest installation of TrueSight Orchestration and is most suitable for rapid evaluation and testing of the basic functionality of the CDP and its related components.

This topology is often used for a proof of concept (POC) or a demo machine. The emphasis of a single server installation is on getting it up and running quickly.

-

Performance

Even though it is not a primary objective in this scenario, the throughput is highest among all other possible topologies.

Targeted for development, testing, and some exceptional intranet environments. The resources are limited to a single machine, which are shared by all components.

The upshot is that there is virtually zero latency in this scenario since there are no other peers in which to communicate with the CDP.

All components compete for the shared resources (CPU, memory, network, and so on).

Since components influence each other, bottlenecks or ill-behaved components can be difficult to identify and remedy.

Therefore single peer topologies would be better served on a hardware platform that can be scaled up.

Availability

Not a primary objective in this scenario

This topology is a single point of failure.

For more information, see Wikipedia - Single point of failure

Cost

Low

-

High Availability - Embedded RSSO

Two host models provide high availability (HA) for authentication and CDP components.

HA can be provided using active-active or active-passive modes. All components can participate in active-active except repository. repository can only participate in active-passive mode. In the below figure, RSSO+repository in host - 2 shall be passive (cold standby). 

HA is provided for RSSO via the presence of RSSO in three active components. OCP and Dashboards can be configured along with CDP or in separate Tomcat  instance.

DeploymentScenario-TwoHost-1.png

A variation of a two host HA setup is shown below. Optional components OCP and Dashboards are installed in a separate Tomcat instance. These instances of OCP and Dashboards can be configured to communicate with the CDP on the local machine or in the second host.

DeploymentScenario-TwoHost-2.png

The following table describes the advantages and disadvantages of using the High availability - Embedded RSSO model:

Factor

Advantages

Disadvantages

Performance

Improved system resource utilization by distributing processing load across a number of server machines.

-

Availability

Elimination of the one CDP does not cause loss of any functionality.

A single peer failure introduces a single point of failure (SPOF)

Loss of Primary CDP will cause loss of ability to add new peers in the grid.

Maintainability

-

With the grid manager, it is easy to administer two peers in a grid from a single point. However, there is more installation and maintenance associated with the additional machines.

Cost

-

More machines

High Availability and Disaster Recovery

From a two-host set up to support HA requirements, the next requirement is to support disaster recovery (DR). To support DR, one assumes that there are at least two data centers and an HA setup in one data center is replicated or mirrored in the other data centers. In this case, one needs to make choices over using one grid or more than one grid. Choice depends on a few factors:

  1. Throughput requirements: TrueSight Orchestration components are designed for fault tolerance. There is trade off between being fault tolerant and being able to process a large set of jobs in a given time period. In the case of fault tolerance, job data is replicated among all active components before advancing jobs between states. This causes the overall throughput to be lower than when there is only one component (CDP) in the grid. If throughput requirement is more important than fault tolerance, the number of components in the grid should be reduced and more grids should be used.
  2. Receiving events from external event sources: If TSO receives events to triggers jobs, deploying multiple grids will cause the same event to be processed by more than one grid at the same time. As long as event data can be partitioned, multiple grids can be used; otherwise, only one grid should be used at any point in time.
  3. Active-Active or Active-Passive configuration - An active-active data center configuration means all TrueSight Orchestration components are active in all data centers and active-passive configuration means at any point in time, only one data center is active. In the case of active-active data center, loss of one data center will reduce the overall capacity of the TSO setup. In the case of active-passive setup, a loss of one data center will cause TSO components in the second data center to go-live but there is no loss of capacity.
  4. Geographical distance between data centers:  Job data needs to be replicated between TrueSight Orchestration components before jobs progress between states. If TSO components are distributed across data centers, because of network latency there will be a decrease in overall throughput.

Single Grid

The below figure shows an HA+DR solution. All TSO components belong to only one grid. The CDP in the second data center is an HA-CDP and repository is in cold standby. Only when datacenter 1 is unavailable, the repository service from the second data center 2 is started up. It is possible RSSO data requires migration from HA-CDP's RSSO instance to RSSO instance in repository. Migration is required only if RSSO is configured to use Local User Management as opposed to LDAP. This configuration is most suitable when the TSO set up is processing events from an event source.

  • Events can be received by 2 or more peers (CDP, HA-CDP and AP) but only one event is processed. 
  • Job throughput is influenced by network latency between data centers.

DeploymentScenario-FourHost-1.png

The following table describes the advantages and disadvantages of using the High availability and disaster recovery - Single grid model:

Factor

Advantages

Disadvantages

Availability

No SPOF

-

Maintenance

Any grid configuration change is done only once and it is propagated to all grid peers.

-

Performance

-

Overall throughput is decreased due to network latency between grid peers and need to synchronize more peers.

Cost

-

There are four machines used in this topology and additional software licenses.

Two Grids

The following figure shows a variation of the HA+DR solution. TSO components in data center 1 belong to one grid and TSO components in data center 2 belong to a second, identical grid. In each grid, the second repository is in cold standby. Only when first repository is unavailable in a datacenter, the second repository service in same data center is started up. RSSO data must be migrated to RSSO instance in standby repository. Migration is required only if RSSO is configured to use Local User Management as opposed to LDAP. In this configuration, it is best to configure RSSO as an external RSSO and configure it to use an external database.

  • This configuration is most suitable when the requirement is high throughput and both grids not receiving events from the same even source. 
  • Network latency between data centers is not a factor because the communication is within the same data center.
  • Incoming web service based requests (ORCA, REST or Legacy) can be serviced by TSO instances in one or both data centers.
  • A load balancer can be used to distribute load between different peers in the same grid and between data centers.

DeploymentScenario-FourHost-2.png

Factor

Advantages

Disadvantages

Availability

There exists no SPOF in this topology.

-

Scalability

If used in active-active configuration, the throughput is almost doubled.

-

Maintenance

-

Any grid configuration change is done at least twice. This disadvantage can be eliminated by utilizing grid management workflows and automating grid changes.

Performance

Overall throughput is increased due to decrease in network latency between grid peers and need to synchronize between less number of peers.

-

Cost

-

There are four machines used in this topology and additional software licenses.

Two Grids with External RSSO

The following figure shows a variation of the previous HA+DR solution. The RSSO component is configured as external RSSO and is backed by Tomcat clusters. Instances of OCP and Dashboard are not shown for convenience.

DeploymentScenario-FourHost-3.png

The following table describes the advantages and disadvantages of using the High availability and disaster recovery - Two grids model:

Factor

Advantages

Disadvantages

Availability

There exists no SPOF in this topology.

-

Performance

Overall throughput is increased due to decrease in network latency between grid peers and need to synchronize between less number of peers.

-

Scalability

If used in active-active configuration, the throughput is almost doubled.

-

Maintenance

-

  • Any grid configuration change is done at least twice. This disadvantage can be eliminated by utilizing grid management workflows and automating grid changes.
  • Additional configuration to enable Tomcat clusters and manual synchronization required between different instances of RSSO in different data centers.

Cost

-

There are four machines used in this topology and additional software licenses.

Three Grids - Hierarchical topology

The below figure shows a variation of the previous HA+DR solution. A third gird is used to automate configuration of TSO components in both data centers. This eliminates maintenance issues for having to perform configuration tasks more than once. Hierarchical topologies as below can be extended to more than one depth and are typically the largest and most complicated of all configurations however they can be the most flexible. In situations where there is a desire to utilize TrueSight Orchestration with a large number of machines, particularly machines that are geographically distributed and/or distributed across a relatively large number of network segments, one grid alone is usually inadequate. The logical alternative is to utilize multiple physical grids and consider them as one logical super grid. Super grids are the archetypal example of hierarchical topologies where a parent or root grid serves as the primary customer facing grid and occupies the highest point of the hierarchical topology under which some number of child grids exists, usually in remote locations on the network and not colocated with the parent grid. The child grids are the workhorses of the super grid. The vast majority of workflow processing is handled by the child grids. 

Hierarchical topologies are recommended when there is an expectation to deploy TrueSight Orchestration to service a large enterprise network with hundreds or even thousands of machines and target systems. They are also an excellent option for customers that have large networks housing disparate customers of their own and thus require automation capabilities across their own customer domains, etc.
The challenge that hierarchical topologies present is the fact that there is no direct master-slave relationship that inherently controls all of the grids as one living organism. On the contrary, the grids are disjointed, and every physical grid composed in a hierarchical topology are inconversant with respect to each other. Since these constituent grids share no direct or indirect connection with each other, it falls to the parent or root grid to overcome these deficiencies. The parent grid accomplishes this with its own high-level workflows. These workflows are specially designed to communicate with the child grids. This is can be accomplished using the REST adapter or SOAP adapter deployed in the parent grid which communicates with the CDP hosts deployed in each child grid.

Alternative protocols

REST and SOAP are but two of the options to consider. Another viable possibility worthy of consideration is the Java Message Service (JMS). Like SOAP, most modern JMS providers, such as ActiveMQ, are able to communicate through firewalls and utilize HTTP as a configurable transport if necessary. JMS has other advantages that general purpose SOAP does not such as transactions, durable messaging, and decoupled asynchronous communication.

There is no prescribed API or protocol to follow when stitching parent and child grids together for this purpose because what is needed depends largely on the customer perspective and target environment. The point being that a workflow developer must build and customize these high-ordered workflows for managing and controlling the super grid itself. In short, TrueSight Orchestration does not provide a super grid capability out-of-the-box and, as a consequence, it requires additional design and development to realize the benefits of a hierarchical topology.

The following table describes the advantages and disadvantages of using the High availability and disaster recovery - Three grids model:

Factor

Advantages

Disadvantages

Availability

There exists no SPOF in this topology.

-

Performance

Overall throughput is increased due to decrease in network latency between grid peers and need to synchronize between less number of peers.

-

Scalability

If used in active-active configuration, the throughput is almost doubled.

-

Maintenance

With 2 levels in the hierarchy, grid configuration changes are done only once.

As one increases the depth of the number of grids to be managed overall complexity increases.

Cost

-

There are six machines used in this topology and additional software licenses.

Separate Web Tier

The below figure shows configuration of a single host grid supporting multiple OCP services. When compared to a configuration where the CDP and the OCP are co-located on a single physical server, separation of these two components can be utilized to provide varying degrees of improvement in performance, process isolation, and redundancy. Additionally, the OCP can be deployed across a farm of web containers for high availability purposes for the OCP itself. It has been observed that the OCP running on a single web container can support as many as 500+ concurrent users for a single grid.

DeploymentScenario-WebTier-1.png

The following table describes the advantages and disadvantages of using a single web tier model:

Factor

Advantages

Disadvantages

Availability

The OCP could be deployed in a cluster of web containers operating on disparate machines provided a proper load balancer is used to load balance web requests across any number of OCP server instances. This is a common J2EE/JEE pattern for redundant deployments of servlet and JSP applications.

There still exists a SPOF in the single instance of the CDP. Should it fail, none of the clustered OCP applications will function properly.

Maintenance

OCP components may be reconfigured, or even replaced, without affecting the installation of the CDP on a separate machine. The converse is not true however.

-

Performance

By installing components on separate machines, each machine can be sized and configured to optimize the performance of each component (for example, the CDP will almost certainly require faster and more powerful machines than the OCP system).

-

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*