Disaster recovery deployment
Disaster recovery overview
Component | Description |
---|---|
Primary site | The primary site is your data center, which hosts all your live systems. |
Standby site | The standby site is the disaster recovery system. You must create and maintain this site. |
Failover | Failover is the process of switching to the standby site when a disaster occurs in the primary site. |
Failback | Failback is the process of transferring control back to the primary site to take over the functionality from the standby site. |
To configure disaster recovery, you need MinIO and the disaster recovery utility, which is provided with the installer and located at helix-on-prem-deployment-manager/utilities/disaster-recovery.
You must run the disaster recovery utility with the backup option to back up data and with the restore option to restore data. You can configure the backup frequency and the data retention period.
After configuring disaster recovery, data is backed up, saved in the MinIO on the primary site, and then replicated onto the MinIO on the standby site at scheduled intervals.
For more information about MinIO bucket replication, see Bucket Replication in the MinIO documentation.
Options to fail back to the primary site
Failback is transferring control back to the primary site to take over the functionality from the standby site. Here are a few options for failback:
- Bring up the primary site and fail back to it
- Make the standby site your new primary site
- Set up a new primary site and fail back to it
The following image depicts the options to fail back:
To fail back to the primary site
Perform the following steps:
Configure data backup on Site B.
For more information about configuring data backup, see "To configure data backup on the primary site" in the Configuring-disaster-recovery topic.
- Perform either of the following actions:
- If you chose option 1 or 2, bring up Site A.
- If you chose option 3, create a new site (Site C).
- Perform either of the following actions:
- If you chose option 1 or 2, replicate data from Site B to Site A.
- If you chose option 3, replicate data from Site B to Site C.
For more information, see "To configure MinIO data replication from the primary site to the standby site" in the Configuring-disaster-recovery topic.
- Perform either of the following actions:
If you chose option 1, restore data on Site A.
If you chose option 3, restore data on Site C.
For more information, see Restoring-data-on-the-standby-site.
RPO and RTO measurements
Recovery Point Objective (RPO) is the time-based measurement of tolerated data loss. Recovery Time Objecting (RTO) is the targeted duration between an event failure and the point where the operations resume.
The default configurations set RPO expectations to 2 hours.
Where to go from here