Remedy disaster recovery deployment
Disaster recovery is the ability to mitigate the failure of an entire data center without affecting the services. While high availability is aimed at ensuring uninterrupted services within a data center, disaster recovery must take care of failure of an entire data center. Depending on your organization's policies, you might choose to operate at a lower capacity during disaster recovery. Plan your primary site with enough redundancy to achieve high availability, so that the primary site can operate even in case of a limited failure and is not forced to use disaster recovery.
Disaster recovery can be achieved for the Remedy ITSM suite by replicating the production data center in another site and by using a load balancer. The two sites can either be co-located or at two separate geographic locations.
Apart from replicating the databases and the servers, you must also keep the replicated databases and the servers continuously updated by data synchronization and by applying all product updates to the servers. The replicated environment must be identical to the production environment.
In case of major failure in the production data center, all operations failover to the disaster recovery data center. All the configurations that you do in the servers are stored in the database through centralized configuration. When you replicate the database, all data and the configurations are also replicated. To apply the replicated configurations in the disaster recovery data center, you must use exactly the same configuration names in both production data center and the disaster recovery data center.
Use your database vendor's technology or any suitable third-party solution for database replication. Perform only one-way replication of the database. If you perform an operation in the disaster recovery data center while the production data center is up, do not replicate anything back to the production data center.
Disaster recovery deployment requirements
Consider the following requirements while planning disaster recovery for the Remedy ITSM suite:
- The disaster recovery installation must have the same product version and patch level as the production installation.
- If any configuration or file changes (such as applying patches) are made to the production data center, the same changes must be repeated on the disaster recovery data center.
- As the production system is used, all data changes must be replicated to the disaster recovery data center. These changes can be database changes or file system changes, depending on the product in use.
- Replicating data changes imposes additional demands on the resources in the production system. To keep these demands to a minimum, the replication schedule should be carefully considered. If continuous replication is needed, the production system must be given additional resources (CPU and memory) to reduce the performance impact.
- If a failover event occurs, consider recovery from the secondary location to the production data center.
Replication is required at different points to keep the disaster recovery data center installation up to date. Each component needs to be ready and available for the disaster recovery installation to start functioning. For example, if the production databases are being replicated every three hours, transaction information for the last three hours might not yet exist in the disaster recovery databases, and might need to be copied over and restored manually by using the database transaction logs. Another example can be when network changes need to be put in place manually before disaster recovery can take over.
Disaster recovery sample deployment
The following is a sample deployment option to achieve disaster recovery and auto-failover of Remedy.
The following example includes 10 Remedy AR System servers in two different locations (sites) along with load balancers. These 10 servers are distributed as five servers each in Site A (active site) and Site B (standby site). For database failover, we have leveraged the SQL Server's Always On Availability feature.
You must assign rankings for all servers in both your production and failover environment. Servers in the failover site must have lower ranking. For example, if you have five servers in the production environment, you can rank them from 1 to 5. For the failover site, rank the respective servers from 6 to 10.
The load balancers have the following rules:
|When all application servers go down at Site A, failover all Remedy components, applications, and web services to Site B.
|When all Web servers go down at Site A, failover all Remedy components, applications, and web services to Site B.
|When web services AND/OR Smart Reporting fails at Site A, the failing components will failover to Site B. However, they will not trigger the failover of applications or web components.
|When both applications and web servers become available at Site A, failover all Remedy components, applications, and web services back to Site A.
The following diagram illustrates the deployment architecture as per this example: