Preparing for a disaster
Disaster recovery deployment process
The following image shows the tasks to deploy BMC Helix Service Management in a secondary cluster for disaster recovery:
Before you begin
Verify that you complete the following tasks:
- Production and standby sites must be identical. They must have the same product version, namespace, and must be deployed with the same URLs, except the MinIO URLs (MINIO_API_LB_HOST and MINIO_LB_HOST).
- Production and standby should have the same resources.
- Set up namespaces in your secondary cluster that are the same as in your primary cluster.
- Installed BMC Helix Service Management in your primary cluster.
- Set up a cluster for disaster recovery that is your secondary cluster if it is not set up already.
- Installed BMC Helix Platform Common Services in both your primary and secondary clusters if they are not set up already.
- If your primary and secondary clusters use different kubeconfig files, make sure that you add separate kubeconfig credentials in Jenkins for your primary and secondary clusters.
- Make sure that the database alias name is the same for the database server in the primary and secondary clusters.
Helix disaster recovery pipeline modes
Use the disaster recovery deployment pipeline, HELIX_DR, to deploy BMC Helix Service Management in a secondary cluster. The HELIX_ONPREM_DEPLOYMENT pipeline runs the HELIX_DR pipeline during deployment.
The disaster recovery deployment pipeline, HELIX_DR, provides the following modes:
- Scale down—In this mode, the HELIX_DR pipeline deploys BMC Helix Innovation Suite and Service Management application components with zero replicas. Use this mode to synchronize the components in the primary and secondary clusters.
- Scale up—In this mode, the HELIX_DR pipeline deploys BMC Helix Innovation Suite and Service Management application components with replicas in the deployment input configuration file. Use this mode when a disaster occurs in the primary cluster.
Task 1: To add the HELIX_DR pipeline to the Jenkins server
- Log in to the Jenkins server by using the following URL:
http://<Jenkins server host name>:8080 - On the Jenkins home page, click New Item.
- In the Enter an item name field, enter HELIX_DR.
- Select Pipeline and click OK.
- Click the Pipeline tab.
- Enter the following information:
-
- From the Definition list, select Pipeline script from SCM.
- From the SCM list, select Git.
- Enter the Repository URL as the path of your local Git repository in the format ssh://git@<jenkins_server>/<path to itsm-on-premise-installer.git>.
Example: ssh://git@<Jenkins server host name>/home/git/Git_Repo/ITSM_REPO/itsm-on-premise-installer.git. - Enter the Git server credentials.
- Specify the script path as pipeline/jenkinsfile/HELIX_DR.jenkinsfile.
- Click Apply and then Save.
After the pipeline is created, make sure that the pipeline is selected from Jenkins home page. - Click Build Now.
The first build job fails because it needs to run the first time to load all the parameters of the pipeline script. - After the build job fails, select the pipeline name again from the Jenkins home page.
The Build Now option changes to Build With Parameters.
Task 2: To set up BMC Helix Platform Common Services on the secondary cluster
To set up BMC Helix Platform Common Services on the secondary cluster, perform the following steps:
- Configure the data backup on the primary cluster.
- Configure MinIO data replication
- Validate the data backup.
To configure data backup on the primary cluster
Always configure the first data backup when the load is low with less activity.
- Log in to in to the controller from where the Kubernetes cluster is accessible.
- Go to /helix-on-prem-deployment-manager/utilities/disaster-recovery/dr-configs.
In the disaster-recovery.config file, specify the values of the following parameters:
Parameter
Description
Example
BUCKET_NAME
Specify the name of the bucket used to back up data on MinIO.
Important:
Use the following conventions while assigning a bucket name:
- It can have 3 to 63 characters.
- It can have lowercase letters, numbers, and hyphens (-).
- It must begin and end with an alphabet or a number.
- It must not start with xn-- as it might get interpreted as a Punycode format; for example, xn–bucketname.
- It must not have uppercase letters, periods (.), and underscores (_).
- It must not have hyphens next to periods (.); for example, my-.bucket.com or my.-bucket.
- It must not end with a hyphen or -s3alias (-s3alias is reserved for the MinIO bucket access point alias name).
BUCKET_NAME=helixdr-backup
SITE_NAME
Specify a name to identify the site from where you want to back up data.
SITE_NAME=India
NAMESPACE
Specify the namespace where you have installed BMC Helix Platform Common Services.
NAMESPACE=helix-cluster3
DR_BACKUP_INTERVAL_IN_HOUR
Specify the backup interval in hours. You can set values between 1 to 24.
We recommend that you set the value of this parameter based on the size of your data.
The value that you specify defines the interval for backing up your data.For example, if you set the value of this parameter as 1 hour, data backup is performed at the start of every hour as per the cron schedule (0 */1 * * *).
If your current cluster time is 2:15 P.M. on November 2:- The first backup will occur at 3:00 P.M. This will be a complete data backup.
- Subsequent backups will occur at 4:00 P.M., 5:00 P.M., 6:00 P.M., and so on. These will be incremental backups.
- At 3:00 P.M. on November 3, a complete data backup will occur.
For example, if you set the value of this parameter as 3 hours, data backup is performed at the start of every third hour as per the cron schedule (0 */3 * * *).
If your current cluster time is 2:15 P.M. on November 2:- The first backup will occur at 3:00 P.M. This will be a complete data backup.
- Subsequent backups will occur at 6:00 P.M., 9:00 P.M., and so on. These will be incremental backups.
- At 3:00 P.M. on November 3, a complete data backup will occur.
Important:
- For Victoria Metrics, the default interval is one hour. You cannot modify the default interval.
- The following application data and configurations will be lost during the backup intervals:
- BMC Helix ITSM Insights—PPM jobs created and real-time incident correlations.
-
BMC Helix Dashboards
—Reports created or modified and report schedules.
- BMC Helix Single Sign-On—Data or configuration changes.
DR_BACKUP_INTERVAL_IN_HOUR=1
DR_MAX_BACKUP_TO_RETAIN
Specify the number of days for which you want to retain the backed-up data.
DR_MAX_BACKUP_TO_RETAIN=5
To back up data, run the following command:
./disaster-recovery.sh backup
To configure MinIO data replication from the primary cluster to the secondary cluster
Data replication is always from the MinIO on the primary site to the MinIO on the standby site. Typically, port 443 must be open on the standby site unless you decide to configure a different port on the standby site for MinIO.
- Log on to MinIO by using the URL that you set for the parameter MINIO_LB_HOST in the infra.config file.
Use the credentials you set during the deployment of BMC Helix Platform Common Services. - On the production site MinIO console, under Administrator, select Buckets.
- From the list of buckets, select the backup bucket.
In the disaster-recovery.config file, you must have set a name for the backup bucket by using the BUCKET_NAME parameter; for example, helixdr-backup.
- To enable versioning, perform the following steps:
- Click the pencil icon.
- Click the pencil icon.
-
- In the Versioning on Bucket dialog box, click Enable.
Current Status changes from Unversioned to Versioned.
- In the Versioning on Bucket dialog box, click Enable.
- To create a restore bucket and enable versioning, perform the following steps:
- On the standby site MinIO console, under Administrator, select Buckets.
- At the top-right corner of the console, click Create Bucket.
-
- In the Bucket Name box, type a name to identify the restore bucket; for example, helixdr-restore.
- To enable versioning, turn on the Versioning toggle.
-
- Click Create Bucket.
- Click Create Bucket.
- Add a replication rule to replicate data from the backup bucket (on the production site MinIO) to the restore bucket (on the standby site MinIO):
-
- On the production site MinIO console, under Administrator, select Buckets.
-
- From the list of buckets, select the backup bucket; for example, helixdr-backup.
- Go to the Replication tab and click the Add Replication Rule button.
- In the Set Bucket Replication dialog box, enter the following values:
- Target URL - The API end-point of the MinIO on the standby site.
This is the URL that you set for the parameter MINIO_API_LB_HOST in the infra.config file.
- Target URL - The API end-point of the MinIO on the standby site.
-
- Access Key - User name to access the standby site MinIO.
- Secret Key - Password to access the standby site MinIO.
- Target Bucket - Name of the restore bucket; for example, helixdr-restore.
- Leave the other values to their default and click Save to set the replication rule.
After you set the replication rule, data from the production site MinIO gets replicated onto the standby site MinIO.
Logs get saved in the helix-on-prem-deployment-manager/logs directory. You can check the logs to monitor the progress, success, or failure of the data back up process.
To validate the data backup
- Log in to the controller.
- To confirm the cronjobs were successfully configured, run the following command:
kubectl -n <itom namespace> get cronjob | grep dr- - After the backup jobs are completed on the controller machine, log in to the MinIO web console and go to the Object Browser.
- Go to the bucket that is configured to back up data; for example, helixdr-backup.
- Open the site folder (<SiteName>; for example, India) where you are backing up your data, and then open the backupStatus folder.
Open the backup.log file and validate that there are no errors.
Sample output for a successful backup:Backup Start Time : 2023-10-9 4:00.
Backup End Time : 2023-10-09 04:08:30.012419Download and open the last_backup.json file and validate that there are no errors.
Sample output for a successful backup:{"EVENTES": "{\"helixdr-backupeventes-1696766450\":\"fd02977e-032c-4898-a602-fac5684a64af\"}", "LOGES": "{\"helixdr-backuploges-1696766451\":\"9a3f0ee0-4b7e-4884-84a3-e7c3ebddec95\"}", "KAFKA": "20231009-040016", "VM": "hourly/2023-10-09:03", "VMAGG": "hourly/2023-10-09:03", "PG": "20231008-080003F_20231009-040004I", "ZOOKEEPER": "20231009-040014", "K8OBJS": "20231009-040005", "DRSREPO": "20231009-040005", "MINIO": "20231009-040004"}
(Optional) To scale down the standby site
To save resources, after configuring disaster recovery, you can scale down the application pods and keep only the data lake components running on the standby site.
Perform the following steps:
- Go to helix-on-prem-deployment-manager/utilities/disaster-recovery/dr-scale
Run the following command:
./product_scale.sh downData from the production site MinIO continues to be replicated onto the standby site MinIO but the applications will not run.
Task 3: To deploy BMC Helix Innovation Suite on the secondary cluster
Deploy BMC Helix Service Management on the secondary cluster by using the HELIX_DR pipeline scale down mode.
- Log in to BMC Deployment Engine that is the Jenkins server.
- In your secondary cluster, copy the kubeconfig file located at HELM/<Jenkins node> to the ~/.kube folder.
- Log in to the Jenkins server by using the following URL:
http://<Jenkins server host name>:8080 - On the Jenkins server, select the HELIX_ONPREM_DEPLOYMENT pipeline.
- In the Build History, select the latest build and click Rebuild.
- In the INFRASTRUCTURE section, in the KUBECONFIG_CREDENTIAL parameter, specify Jenkins credential ID that contains the kubeconfig file for the secondary cluster.
To find the kubeconfig credential ID, go to http://<jenkinsurl>:8080/credentials. - In the CUSTOMER-INFO section, in the CLUSTER parameter, specify the name of your secondary cluster.Find the cluster from the kubeconfig file. The current-context value in the kubeconfig file is the cluster name.
- clear the
- SUPPORT_ASSISTANT_TOOL check box.
In the PRODUCT-DEPLOY section, select the HELIX_GENERATE_CONFIG, HELIX_DR, and SCALE_DOWN options.
- Click Rebuild.
Where to go from here