Using a multi-generational datastore

In use, BMC Discovery requires little downtime. However, two tasks that do require downtime are backup and database compaction. Mutli-generational storage enables you to perform both of these tasks while they system is running.

Multi-generational storage requires management

Multi-generational storage is an advanced feature which required significant maintenance and management. If you choose to use multi-generational storage, you must enable it, maintain it, and manage it using command line tools. No out of the box automation is provided. You must create any automation you need using scripting or cron. Although multi-generational storage and the command line tools provided are robust and currently in-use, without adequate maintenance and management, your BMC Discovery system may experience performance degradation or possibly become inoperable.

What is a multi-generational datastore?

A multi-generational datastore is one in which data is stored in generations, or "layers" according to the age of the last write activity. The system can write to and read from the current, most recent generation. All earlier generations are read-only.

Database compaction

Database compaction is an essential part of using a multi-generational datastore. There is a performance impact in accessing the earlier read-only generations, and without compaction, the OS might run out of file descriptors for open files. Database compaction follows this basic procedure:

Write new files that combine a number of the oldest generations.
Switch to use the new combined data.
Delete the old generations.

Database compaction works by creating a copy of the earlier read-only generations, and to do this, it requires at least 50% of the database disk to be unused. The compaction utility (tw_ds_compact) checks the disk space available and does not start compaction unless there is sufficient space. Consequently, you must ensure that you database disk is never filled to more than 50% capacity. If your database disk is approaching 50% capacity, you should review disk monitoring thresholds. There might be sufficient disk space for the compaction, though during the process, the free space might drop below the baseline values at which scanning might be stopped (default is 20% free) or the appliance shut down (default is 10% free).

Database compaction creates its own log: tw_ds_compact.log

If you interrupt a compaction you must continue it using the --fix-interrupted option.

Backing up

Backing up data is essentially a case of copying the earlier read-only generations to a back up location. In practice, you should create a new generation, and then back up all of the read-only generations.

The Appliance/Cluster backup UIs work with a multi-generational datastore, as does tw_backup, though they still take the datastore offline for the backup operation.

To use a multi-generational datastore

Enabling multi-generational mode for the datastore is not reversible. The only way you have of returning to a non-generational datastore is to restore a previous backup or revert the member to standalone.

Perform the following procedure as the tideway user.

Enable the multi-generational mode for the datastore. Enter:
$ tw_options DATASTORE_MULTI_GENERATION=True
Restart the services on the appliance or across the cluster. Enter:
$ tw_service_control --restart
The datastore is now operating in multi-generational mode, but has just one generation.
View the status of the generations. Enter:
$ tw_ds_generation_control
Generations:
g000001 : current

$
Only one is present, g000001 , and that is the current, in-use generation.
Create a new generation. This becomes the new "current" generation and the original generation is marked "read only".
$ tw_ds_generation_control --new
Generations:
g000002 : current
g000001 : read only : 2020-03-25 12:17:37 GMT
$
The generation is created in a new directory, in this example g000002. You can see this in the datastore directory.
$ ls -p var/tideway.db/data/datadir/
__db.001 DB_CONFIG g000001/ g000002/
$
Create another new generation. Again this becomes the new "current" generation. The original generation and the second generation are now marked "read only", and a third directory is created.
$ tw_ds_generation_control --new

Generations:
  g000003 : current
  g000002 : read only : 2020-03-25 12:22:21 GMT
  g000001 : read only : 2020-03-25 12:17:37 GMT
$ ls -p var/tideway.db/data/datadir/
__db.001 DB_CONFIG g000001/ g000002/ g000003/
$
The purpose of a multi-generational mode is to enable online compaction and backup.
Compact the read only generations using the tw_ds_compact tool. Enter:
$ tw_ds_compact --online
...
2020-03-25 12:25:50,188: 2 generations from 1 to 2.
2020-03-25 12:25:50,188: 1 later generation.
...
2020-03-25 12:26:31,730: Switch to use compacted generation.
2020-03-25 12:26:49,266: Compaction complete.
The two read only generations were combined and compacted, a new directory is created, the c prefix denoting a compacted generation.
$ tw_ds_generation_control

Generations:
g000003 : current
c000002 : read only : 2020-03-25 12:22:21 GMT

$ ls -p var/tideway.db/data/datadir/
c000002/ __db.001 DB_CONFIG g000003/

Recommendations for automation

The following points are recommendations for automating the management and maintenance of a multi-generational datastore.

Use cron to schedule creation of a new generation. This could be scheduled daily, no more frequently than that.

- If you are using a multi-generational datastore in a cluster, creating a generation is synchronized across the cluster. Only trigger creation of a new generation on one cluster member.

Wait 35 minutes between completing the creation of a new generation and archiving.
Having waited 35 minutes, archive the files in the just-completed generation. You might choose to:
- rsync to another server
- tar and scp
- use any other standard archiving tools
- If you are using a multi-generational datastore in a cluster, the archival must be performed on each cluster member.
Use cron to schedule compaction. This could be scheduled less often than creation of new generations, for example weekly.
- If you are using a multi-generational datastore in a cluster, the compaction must be performed on each cluster member.
After compaction, archive the new compacted generation. You might choose to:
- rsync to another server
- tar and scp
- use any other standard archiving tools
- If you are using a multi-generational datastore in a cluster, the archival must be performed on each cluster member.

Purging the history

Periodically you should purge the history to reduce the disk space used, and to maintain performance. You should do this in line with any organizational or regulatory policies and requirements.

Use the tw_ds_compact tool to do this. For example, to purge history entries over a year old:

$ tw_ds_compact --history-purge-days 365

Offline compaction

A multi-generational datastore does not prevent you taking the system offline. The tw_ds_compact tool also has an offline compaction option:

$ tw_ds_compact --offline

Sample output is shown below:

$ tw_ds_compact --offline
...
2022-07-13 14:53:31,401: Transaction recovery.
2022-07-13 14:53:45,906: Start compaction.
2022-07-13 14:53:55,908: Compaction progress: 19% (1701855 / 8939555).
2022-07-13 14:54:05,908: Compaction progress: 51% (3891103 / 7483003).
2022-07-13 14:54:17,196: Compaction progress: 79% (5093923 / 6419023).
2022-07-13 14:54:27,196: Compaction progress: 93% (5537550 / 5906100).
2022-07-13 14:54:34,148: Compaction progress: 99% (5574915 / 5575815).
2022-07-13 14:54:34,259: Compaction progress: 100% (5575036 / 5575036).
2022-07-13 14:54:34,270: Total compaction time: 49 seconds.
2022-07-13 14:54:34,389: Compaction complete.
2022-07-13 14:54:34,402: Compacted from 374.5 MiB to 269.2 MiB : 71.9 %
$

From this version of BMC Discovery, completion status for the tw_ds_compact tool is logged. For example:

140578629134144: 2022-07-13 14:54:34,389: ds_compact: INFO: Move n000000/p0008_nUserEventAuditRecord_pidx -> p0008_nUserEventAuditRecord_pidx
140578629134144: 2022-07-13 14:54:34,389: ds_compact: INFO: Move n000000/p0008_nUserEventAuditRecord_rels -> p0008_nUserEventAuditRecord_rels
140578629134144: 2022-07-13 14:54:34,389: ds_compact: INFO: Move n000000/p0008_nUserEventAuditRecord_state -> p0008_nUserEventAuditRecord_state
140578629134144: 2022-07-13 14:54:34,389: ds_compact: INFO: Move n000000/tagged -> tagged
140578629134144: 2022-07-13 14:54:34,389: ds_compact: USEFUL: Compaction complete.
140578629134144: 2022-07-13 14:54:34,402: ds_compact: USEFUL: Compacted from 374.5 MiB to 269.2 MiB : 71.9 %

This command can also be used with non-multi-generational datastores, in a similar manner to tw_ds_offline_compact, but it is faster. However, tw_ds_offline_compact cannot be used with multi-generational datastores.

Reverting to previous generations and restoring backups

When multiple generations are present, tw_ds_compact can be used to remove the latest generation, and revert the datastore to a previous state. This is an offline operation so the services must be stopped across the cluster. If the latest generation is number 42:

$ tw_ds_compact --offline --delete-generation 42

After confirmation, generation 42 is completely removed, and the previous generation, 41, will continue to be written to when the system is started. Multiple generations can be deleted in this way by running with the --delete-generation option multiple times.

Generation deletion is not coordinated across a cluster. In a cluster, you must delete the equivalent generations on each cluster member, otherwise the cluster members will be out of sync with each other.

Restoring backups

To restore backups that have been archived externally, perform the following steps on all the cluster members:

Ensure all services are stopped across the cluster.
Delete all the files in /usr/tideway/var/tideway.db/logs and /usr/tideway/var/tideway.db/data/datadir. (These paths are symbolic links to the correct data locations.)
Restore the required generation directories into /usr/tideway/var/tideway.db/data/datadir. The first will usually be a compacted generation with a name starting with "c"; subsequent ones will start with "g".
Run an offline compaction with tw_ds_compact --offline . This step is essential to ensure that log sequence numbers in the database files are correct.

As with any backup/restore scheme, you should test your processes before they are required.