This documentation supports an earlier version of BMC Helix IT Service Management on-premises deployment.To view the documentation for the latest version, select 23.3.04 from the Product version picker.

Recovering after a disaster


Restore your data on the secondary cluster when a disaster occurs. Restore data by scaling up the secondary cluster to recover from the failure without affecting the services.

Before you begin

  • Make sure that you have have prepared your environment for a disaster.
    See Preparing-for-a-disaster.
  • Make sure that the database alias name points to the database host on the secondary cluster, which is replicated from the primary cluster database.
  • Make sure that the DNS for the Service Management applications are switched to the secondary cluster.

Task 1: To restore BMC Helix Platform Common Services data on the secondary cluster

  1. If you had scaled down the application pods on your secondary cluster, perform the following steps to scale it up:
    1. Go to helix-on-prem-deployment-manager/utilities/disaster-recovery/dr-scale
    2. Run the following command:

       ./product_scale.sh  up 
  2. Go to /helix-on-prem-deployment-manager/utilities/disaster-recovery/dr-configs
  3. In the disaster-recovery.config file, specify the values of the following parameters:

    Parameter 

    Description 

    Example 

    BUCKET_NAME 

    Specify the name of the bucket used to back up data on MinIO. 

    BUCKET_NAME=helixdr-backup 

    SITE_NAME 

    Specify the name of the site from where you want to restore data.   

    SITE_NAME=India

    NAMESPACE 

    Specify the namespace where you have installed BMC Helix Platform Common Services.

    NAMESPACE=helix-cluster3  

  4. To restore data, run the following command: 

    ./disaster-recovery.sh restore 

    After the standby site is operational: 

    • You can fix your production site and then fail back to it. 
    • You can make your standby site your new production site and create another standby site. 
  5. Post data restoration, if Victoria Metrics restarts it restores the older data. To prevent such unintended data restorations, run the following commands:

    export FORCE_DISABLE_VM=1

    ./disaster-recovery.sh disable victoriametrics

Task 2: To restore BMC Helix Service Management data on the secondary cluster

When a failure occurs in the BMC Helix Service Management primary cluster, you must scale up the secondary cluster so that platform and application components are up and can take over the workload from the primary cluster. To scale up the cluster, use the HELIX_DR pipeline scale up mode.

  1. Log in to BMC Deployment Engine that is the Jenkins server.
  2. In your secondary cluster, copy the kubeconfig file located at HELM/<Jenkins node> to the ~/.kube folder.
  3. Log in to the Jenkins server by using the following URL:
    http://<Jenkins server host name>:8080
  4. On the Jenkins server, select the HELIX_ONPREM_DEPLOYMENT pipeline.
  5. In the Build History, select the latest build and click Rebuild.
  6. In the INFRASTRUCTURE section, in the KUBECONFIG_CREDENTIAL parameter, specify Jenkins credential ID that contains the kubeconfig file for the secondary cluster.
    To find the kubeconfig credential ID, go to http://<jenkinsurl>:8080/credentials.
  7. In the CUSTOMER-INFO section, in the CLUSTER parameter, specify the name of your secondary cluster.Find the cluster from the kubeconfig file. The current-context value in the kubeconfig file is the cluster name.
  8. In the INFRA-DEPLOY section, select the SUPPORT_ASSISTANT_TOOL and SIDECAR_SUPPORT_ASSISTANT_FPACK parameters to install the Support Assistant tool.
  9. In the In the PRODUCT-DEPLOY section, select the HELIX_GENERATE_CONFIG and HELIX_DR options.

    Important

    Do not select the SCALE_DOWN option.

  10. Click Rebuild.

Task 3: To reindex Full Text Search (FTS)

Elasticsearch is used for FTS, and data is indexed on the Elasticsearch server. The data is backed up and saved in MinIO at scheduled intervals. When a failure occurs between the scheduled intervals, The data is not indexed for this time span. To index this data, you must reindex FTS.

  1. Log in to Mid Tier.
  2. In the AR System Management Console, select AR System Server Group Console > FTS Management.
  3. In the FTS Configuration panel, select Reindex > Server.
  4. Select the Reindex After date based on the last MinIO snapshot.
    For example, a failure occurs at 2.30 pm and the last MinIO snapshot was created at 2.00 pm, select the snapshot that was created at 2.00 pm


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*