Recovering after a disaster
Before you begin
- Make sure that you have have prepared your environment for a disaster.
See Preparing-for-a-disaster. - Make sure that the database alias name points to the database host on the secondary cluster, which is replicated from the primary cluster database.
- Make sure that the DNS for the Service Management applications are switched to the secondary cluster.
Task 1: To restore BMC Helix Platform Common Services data on the secondary cluster
- If you had scaled down the application pods on your secondary cluster, perform the following steps to scale it up:
- Go to helix-on-prem-deployment-manager/utilities/disaster-recovery/dr-scale
Run the following command:
./product_scale.sh up
- Go to /helix-on-prem-deployment-manager/utilities/disaster-recovery/dr-configs.
In the disaster-recovery.config file, specify the values of the following parameters:
Parameter
Description
Example
BUCKET_NAME
Specify the name of the bucket used to back up data on MinIO.
BUCKET_NAME=helixdr-backup
SITE_NAME
Specify the name of the site from where you want to restore data.
SITE_NAME=India
NAMESPACE
Specify the namespace where you have installed BMC Helix Platform Common Services.
NAMESPACE=helix-cluster3
To restore data, run the following command:
./disaster-recovery.sh restoreAfter the standby site is operational:
- You can fix your production site and then fail back to it.
- You can make your standby site your new production site and create another standby site.
Post data restoration, if Victoria Metrics restarts it restores the older data. To prevent such unintended data restorations, run the following commands:
export FORCE_DISABLE_VM=1
./disaster-recovery.sh disable victoriametrics
Task 2: To restore BMC Helix Service Management data on the secondary cluster
When a failure occurs in the BMC Helix Service Management primary cluster, you must scale up the secondary cluster so that platform and application components are up and can take over the workload from the primary cluster. To scale up the cluster, use the HELIX_DR pipeline scale up mode.
- Log in to BMC Deployment Engine that is the Jenkins server.
- In your secondary cluster, copy the kubeconfig file located at HELM/<Jenkins node> to the ~/.kube folder.
- Log in to the Jenkins server by using the following URL:
http://<Jenkins server host name>:8080 - On the Jenkins server, select the HELIX_ONPREM_DEPLOYMENT pipeline.
- In the Build History, select the latest build and click Rebuild.
- In the INFRASTRUCTURE section, in the KUBECONFIG_CREDENTIAL parameter, specify Jenkins credential ID that contains the kubeconfig file for the secondary cluster.
To find the kubeconfig credential ID, go to http://<jenkinsurl>:8080/credentials. - In the CUSTOMER-INFO section, in the CLUSTER parameter, specify the name of your secondary cluster.Find the cluster from the kubeconfig file. The current-context value in the kubeconfig file is the cluster name.
- SUPPORT_ASSISTANT_TOOL and SIDECAR_SUPPORT_ASSISTANT_FPACK parameters to install the Support Assistant tool.
In the In the PRODUCT-DEPLOY section, select the HELIX_GENERATE_CONFIG and HELIX_DR options.
- Click Rebuild.
Task 3: To reindex Full Text Search (FTS)
Elasticsearch is used for FTS, and data is indexed on the Elasticsearch server. The data is backed up and saved in MinIO at scheduled intervals. When a failure occurs between the scheduled intervals, The data is not indexed for this time span. To index this data, you must reindex FTS.
- Log in to Mid Tier.
- In the AR System Management Console, select AR System Server Group Console > FTS Management.
- In the FTS Configuration panel, select Reindex > Server.
- Select the Reindex After date based on the last MinIO snapshot.
For example, a failure occurs at 2.30 pm and the last MinIO snapshot was created at 2.00 pm, select the snapshot that was created at 2.00 pm