Creating a disaster recovery cluster

The BMC Discovery Disaster Recovery (DR) solution is a lightweight utility that provides disaster recovery capabilities across clusters of supported versions of BMC Discovery. It runs on a cluster of appliances, makes incremental backups at midnight, and synchronizes those backups and a zip archive of configuration files to a standby cluster of BMC Discovery appliances. Once the transfer has started, the system checks every 15 minutes to make sure that the transferred data is still up to date.

If the source cluster experiences a loss of service, an operator can switch to the standby cluster using the synchronized backups and restore services.

Setting up the DR solution

The BMC Discovery DR solution was introduced with Technology Knowledge Update TKU 2025-Aug-1. The DR Solution utility automates the use of external tools and consists of the following utility: tw_dr_discovery

BMC Discovery version, TKU, and custom patterns

Before setting up DR, the BMC Discovery version must be the identical, and your TKU and any custom patterns must be the same on the primary and secondary clusters, and they must remain in sync while the DR solution is in place.

Required disk space

The DR solution uses a generational datastore and performs a weekly compaction. It is essential that sufficient free disk space at all times. In practice, a minimum of 60% free space is recommended to ensure compaction can run smoothly.

Setting up a DR user

The DR solution requires a BMC Discovery user to be defined with few privileges. To do this, you need to access the UI, log in, and then create a new group (disaster-recovery) with the following permissions:

model/datastore/admin
system/settings/read
system/settings/write

Create a DR user and assign it to the disaster-recovery group.

Warning

Do not use the system account for this purpose. The password for the designated user must be provided during script setup and is stored in clear text on disk (/utils/.dr_disco_pwdfile) to enable unattended execution with command line tools. The password file is created by the script if absent, and the permissions are limited to the tideway user with permissions "600". The path and name of this file can be customized during the setup procedure.

The following images show exerpts of the group and user pages:

You need to log in as that user while configuring DR.

Clusters

The following diagram shows the pairing we are setting up:

BMC Discovery Outposts

All BMC Discovery Outposts must be registered with both source and destination clusters.

If you already use BMC Discovery Outpost, they are already connected to your source cluster, and you must register them with the destination cluster.

Having all BMC Discovery Outposts registered on both clusters allows them to reconnect to the destination cluster after restore and switch over.

Important

If you skip this step, the BMC Discovery Outpost cannot reconnect to the destination cluster after restore, and all of the credentials on them are lost. You must manually re-enter them.

Time synchronization

Time synchronization settings and status are not included in the backup and restore process. To maintain consistency across clusters, ensure that the secondary cluster is manually configured to match the time synchronization settings of the primary cluster.

CyberArk integration

BMC Discovery provides a user interface to assist with the deployment of the CyberArk AAM service. However, CyberArk AAM is not included in the backup and restore process.

To ensure continued functionality across clusters, make sure that CyberArk AAM is installed and configured on the secondary cluster to match the setup of the primary cluster.

Run the setup

We recommend that you use the screen utility when using any long-running terminal application on a remote host.

When the upload is complete on all the source cluster appliances, run the following command on the first one, in this example, a1.company.com:

Connect to the first appliance of the source cluster as the tideway user, and run the setup command:

tw_dr_discovery --setup

Edit the configuration using the setup. The elements you are interested in the first time are:

Email SMTP
Email sender
Email recipient
BMC Discovery username. Do not use the system user. Rather, use the user that you created in the BMC Discovery UI.
Secondary cluster member: hostname to be b1.company.com (your secondary cluster appliance)
Save the configuration

Run the setup and accept all the following questions by answering "y" to:

Test the SSH connection to the secondary member
Create the configuration archive
Display the generations
- Change the datastore mode to generational
- Create a new generation (that is, an incremental backup)
Test the email address
Register the cron job for the backup
Register the cron job for the sync

Your setup should be complete and working at this point.

If not, run the tw_dr_discovery --setup again to test and modify any settings until your DR solution is satisfied.

Repeat the setup on the other source cluster appliances

The previous section was to set up the appliance a1.company.com, now repeat the steps for a2.company.com and a3.company.com.

To do this:

Download the etc/dr_discovery.json file generated by the setup on a1.company.com.
Edit the hostname and change it from b1.company.com to b2.company.com then upload the json file to the a2.company.com appliance.
Then change the hostname from b2.company.com to b3.company.com, and upload the json file to the a3.company.com appliance.

The following is a PowerShell example of the download. Replace and upload it to the remaining appliances of the source cluster.

sftp tideway@a1.company.com:/usr/tideway/etc/dr_discovery.json

((Get-Content -Path .\dr_discovery.json) -replace 'b1.company.com','b2.company.com') | Set-Content -Path .\dr_discovery.json

echo 'put tw_dr_discovery' | sftp tideway@a2.company.com:/usr/tideway/bin/

((Get-Content -Path .\dr_discovery.json) -replace 'b2.company.com','b3.company.com') | Set-Content -Path .\dr_discovery.json

echo 'put tw_dr_discovery' | sftp tideway@a3.company.com:/usr/tideway/bin/

Now connect to a2.company.com as the tideway user, by using SSH, enter the SSH password when prompted, and run the setup:

tw_dr_discovery --setup

This time, you are only interested in the following elements of the setup:

Test the SSH connection to the secondary member. Enter the SSH password when prompted.
Create the configuration archive.
Register the cron job for the back up.
Register the cron job for the sync.

Now connect to a3.company.com as the tideway user, by using SSH, enter the SSH password when prompted, and run the setup:

tw_dr_discovery --setup

This time, you are only interested in the following elements of the setup:

Test the SSH connection to the secondary member. Enter the SSH password when prompted.
Create the configuration archive.
Register the cron job for the back up.
Register the cron job for the sync.

The setup phase is complete, and the solution creates an incremental backup every night at midnight, and runs a sync every 15 minutes. The sync writes to /usr/tideway/var/tideway.db/data/datadir-ready. Make sure that you have sufficient disk space available and that the disk space monitor is set up.

Switch over

BMC Discovery Disaster Recovery solution supports manual switch to the secondary cluster using the most recent synchronized data from the primary cluster if it becomes unresponsive.

If the primary cluster is still operational

Before initiating the switch over, it is crucial to stop data synchronization from the primary to the secondary cluster to prevent conflicts or data inconsistencies.

On each member of the primary cluster, run the following command:
tw_dr_discovery --setup
When prompted:
Do you want to disable the sync cron? [y/N]:
Respond with y to disable the synchronization cron job.

Restore on the secondary cluster

We recommend that you use the screen utility when using any long-running terminal application on a remote host.

Connect to any appliance in the secondary cluster (e.g., b1.company.com) via SSH.
Run the restore command:
tw_dr_discovery --restore
Follow the interactive prompts:
- The utility will connect to other secondary appliances via SSH.
- SSH passwords will be requested (not stored); SSH keys will be used afterward.
- A list of available backups will be displayed (most recent at the top).
- Confirm the backup to restore

Automated restore steps

Once the backup is selected, the following steps will be executed automatically:

Stop services
Backup current configuration and data
Deploy configuration files
Deploy the selected backup
Repeat the same steps on all other appliances in the secondary cluster

Restart services

After the restore process is complete, services are restarted across the secondary cluster.

Cleanup

After completion and data validation, we recommend that you remove rollback files to free up disk space. The cleanup operation requires Technology Knowledge Update TKU 2025-Oct-1

On one of the restored cluster members—ideally the one on which the restore command was executed—run the command:

tw_dr_discovery --cleanup

After confirmation, the cleanup process of the cluster will begin. A summary of the reclaimed space will be displayed and logged.

The cleanup operation is safe, as it only removes rollback files. Rollback files record the previous cluster state before the restore operation.

DNS update

To complete the switch over, you must update the DNS records to point the addresses from the source cluster to the newly restored cluster. This is necessary to:

Ensure that BMC Discovery Outposts reconnect to the new active cluster.
Redirect user and system traffic to the restored environment.

Switch back procedure

After a successful switch over, the previously designated secondary cluster assumes the role of the primary cluster and begins ingesting all new data. At this point, the system treats this cluster as the active primary.

To maintain DR readiness, it is essential to re-establish a secondary cluster. This ensures continued protection and high availability.

Steps to reconfigure DR

Designate a New Secondary Cluster
Identify a cluster to serve as the new secondary. This may be:
- The original primary cluster (prior to switch over), or
- A newly provisioned cluster.
Reconfigure the New Primary Cluster
On each node of the new primary cluster:
- Execute the DR setup procedure.
- Update the node’s role to "Primary".
- Configure the DR target by specifying the address of a node in the new secondary cluster.

This process re-establishes the DR topology, and ensures that the new primary cluster is protected by a designated secondary cluster.

Limitations

This section describes known limitations.

Data integrity

The incremental backups are automatically generated every 24 hours.

The sync runs every 15 minutes.

The system only transfers read-only data, which means the current version on the source cluster will be part of the incremental backup on the next day.

In a typical scenario, the secondary cluster data available is one day old.

The list of available backups displayed during the restore procedure might be older than the previous day, especially if an appliance from the source cluster fails to sync. Backups must be successfully synchronized to all members before they are available for restore.

Manual testing

The backup and sync commands are intended to be run by cron. However, you can run them manually for troubleshooting or to reduce the period between backups to have more recent data available to restore.

tw_dr_discovery --backup

Important

A 35 minute flush period is enforced after a backup. Any backup or sync operation triggered during this window is ignored.

tw_dr_discovery --sync

If you see messages during a restore stating that expected backup generations are missing, it is likely that the sync was running before the flush was completed and that a file transfer was skipped to avoid corruption.

This issue is resolved at the next sync, and the restore can run again.

Rollback

If the restore procedure encounters an error, it will roll the data modification back to a state where you can rerun the restore.

Important

The services do not restart if a rollback occurs. You must manually verify and rerun the restore process.

Troubleshooting

The following section describes issues that you might encounter while configuring and using the DR solution.

log file

The utility writes a log file /usr/tideway/log/tw_dr_discovery.log for troubleshooting.

The system user blocked

The system user is blocked after too many incorrect password attempts.

To check whether the system user is blocked, connect to the appliance, as the tideway user using SSH and then run the following:

$ tw_listusers
...
system:
    ...
    fullname: System User
    ...
    auth failures: 12
    user state: USER_STATE_BLOCKED
        reason: "Too many authentication failures (12 attempts) at [snip]"
    ...

The system user is blocked. To unblock it, run:

$ tw_upduser --active system
Set User State USER_STATE_ACTIVE

This action unblocks the system user, and you can log in again.

Creating a disaster recovery cluster

Setting up the DR solution

BMC Discovery version, TKU, and custom patterns

Required disk space

Setting up a DR user

Clusters

BMC Discovery Outposts

Time synchronization

CyberArk integration

Run the setup

Repeat the setup on the other source cluster appliances

Switch over

If the primary cluster is still operational

Restore on the secondary cluster

Automated restore steps

Restart services

Cleanup

DNS update

Switch back procedure

Steps to reconfigure DR

Limitations

Data integrity

Manual testing

Rollback

Troubleshooting

log file

The system user blocked

BMC Helix Discovery 24.3 (On-Premises)

On this page