Archiving data


TrueSight IT Data Analytics collects data and stores the live data on its Indexer servers. By default,TrueSight IT Data Analytics purges live data that is older than the data retention period. Alternatively, you can archive data to storage for long-term data retention before the data is purged. You may choose to archive data for reasons such as:

  • Regulatory compliance
  • Research/Auditing
  • Long-term analytics

Archiving involves taking a snapshot of collected data each day and storing those snapshots to a separate storage device. Restoring archived data requires a dedicated Indexer configured as a restore node..


Note

Archiving is disabled by default and can be enabled when you plan for archiving. When you enable archiving, you must specify the path to where you want the data to be archived. You can add more than one path, but only one path is active at a time. Even though there can be only one active path, when you restore data, snapshots stored in inactive paths can still be restored so long as the path is registered as an inactive path.

This topic contains the following information:

What is archived

All data whose index blocks are configured for archiving is archived once it is scheduled to be purged. 

If you have some data collectors that you don't want to be archived, simply set the data retention period override for those data collectors to a number of days that is less than the default data retention period. This will ensure that the data from those data collectors is purged before it is archived.

Prerequisites for archiving data

Ensure the following when you archive data.

  • The path on the Indexer, where the data is to be archived, should be present on Indexers.

    Tip

    If there are multiple Indexer servers, the archive path specified should be one of the following:

    -A remote path (UNC)

    -A permanently mounted drive (Windows) or mount point (Linux). It means that the mount drive or the mount point should be exactly same on all Indexers.

    It should not be a local path.

  • The path on the Indexer, where the data is to be archived, should have write permissions for the user who installed the Indexer.

Recommendation

To be able to restore data that you archive, ensure that you have configured a restore node. A restore node is an Indexer where archived data is restored. When you install the product, you can choose the node type of an Indexer as a restore node.

Java heap requirements for a Restore node

Restore nodes need enough Java heap to support the amount of data that you are restoring. If you have a 14 day retention period on your Live nodes, then your restore node should have the same  amount of java heap as your live node and should never restore more than 14 days of data. If you want to restore 28 days of data, you must double the java heap on the restore node as compared to the original Indexer node. If you already have 30 GB as the maximum Java heap, you must add more Indexer nodes to your cluster to support higher restore values to the restore Indexer cluster.

The CPU overhead on the Indexer cluster is minimal during an archive.

 

To enable archiving

  1. From the Administration>System Settings tab, select the Data Archive Settings tab.
  2. Toggle the Enable Archive switch  (Enable Archival button.png).
  3. When the Enable Archive switch is turned on, you get an option to add an archival path. Enter the path where you want the data to be archived. You can add any number of archival paths, however, only one path can be active at a time.

    Note

    -An active path is one where the data will be archived. However, if a path is not set as active and you delete it, you will not be able to restore data from it later until you add the path again.

  4. Select the option Active Path icon.pngin front of a path to set it as the active path.
  5. Click Apply.

    Clicking Apply restarts all Indexers.

You also must set archive to on for the index block associated with the data collector from Administration > System Settings> Index Block Settings.

You can toggle the switch to set archive to on or off from here, however the data archive settings that have been set from Administration > System Settings> Data Archive Settings override these settings, which means that if you haven't enabled archive in the Data Archive Settings tab, you cannot enable archive for an index block.

For more information, see Modifying index blocks.

The archive process

The following diagram explains the archive process. 

archive_process.png

Snapshots of data are taken on a daily basis and archived before the data is purged. Snapshots are sent to the archive after the retention period (the default retention period is 7 days). You can restore snapshots to the Restore Node by running the restore snapshots CLI command. For information on where to find snapshots, see Finding snapshots.  For information on restoring archived data, see Restoring archived data. Restored data is automatically deleted from the Restore Node after the specified retention days or the default retention days of 2 days. For information on changing the default retention days, see Changing the default retention days.

Note

If archiving fails, data is not purged immediately from the Live nodes. It is retained for a default period of 7 days before getting purged during which time, you can archive data. To change the default period of 7 days before data gets purged:

  1. Navigate to the following location to locate the searchserviceCustomConfig.properties file.

    • Windows: %BMC_ITDA_HOME%\custom\conf\server
    • Linux: $BMC_ITDA_HOME/custom/conf/server

2. In the searchserviceCustomConfig.properties file, add the following property:

MAXIMUM_NUMBER_OF_BACKLOG_ARCHIVALS=<number of days>

3. Restart the search component.

Finding snapshots

You can find snapshots in the paths that you have set in the Data Archive Settings. The latest snapshots can be found in the active path.  The snapshots are taken daily and arranged in month-wise folders. The format of the name of the folder containing a month's snapshot is as follows:

repo_Year_Month_Epoch_time. For example, repo_2018_Jan_1515052753098. This means that the snapshots are from the month of January 2018 at the epoch time 1515052753098.

snapshotrepo path.png

When you open the month-wise folders, you can find snapshots inside it.

Note

Snapshots are taken successfully even if you rename or delete a shared folder used in an archive path and its parent folder has write permissions for the user who is running the Indexer.

So, when the snapshot path folder name is changed, a new folder is created with the old name and a snapshot is created under that. For example, you have enabled archive with a valid shared path in pa/snap/snap1. After the snapshots are taken, you access the shared location and rename snap1 folder name to snap2 folder name. When snapshots are taken next time, the snapshots will be created under new folder pa/snap/snap1 and not pa/snap/snap2.In this instance, the snapshots will not be available under snap2 and cannot be restored from the snap2 folder.

.

Restoring archived data

If you want to search or analyze data that has been archived, you must restore it. To restore snapshots back into the system you must set up a Restore Indexer node. This is an Indexer cluster node that is set aside for housing restored snapshots.This node is a member of the cluster of Indexers, but does not participate in replicating data to the other nodes in the cluster. It is dedicated to restoring archived data to make it live again.

After you have set up an Indexer as your restore node, follow these steps to restore snapshots:

  1. Run the showdataavailability-CLI-command with the archive option to get details of the archived data. This will enable you to confirm that the snapshots you want to restore are, in fact, in the archive.
  2. Run the restoresnapshots CLI Command by giving the start date and end date of the period for which you want to restore data.

    If you are restoring a sizable amount of data, the restore may take several minutes or longer. You can also monitor the progress of the ongoing restore operation using the status option in this CLI command. 

  3. Run the showdataavailability-CLI-command with the restore option to see the details of the restore, 
  4. Search the data using the TrueSight IT Data Analytics console to verify that the data has been restored.

Note

If you run the restoresnapshots CLI command more than once for different periods and there is an overlap in the time periods selected for restoring data, there is no duplication of data on the restore node.  The restore command will only restore the data that hasn’t already been restored to the restore node.

The number of days that the restored data remains in the restore node before getting deleted is called retention days where retention days is considered starting from the date of restore. Restored data is retained for a default period of 2 days after which it is automatically deleted. You can choose how long the data should remain in the restore node by using the restoresnapshots CLI Command with the retentionDays option. 

Deleting restored data

All restored data that is older than retention days is deleted. If you don't specify retention days, 2 days is considered as the default retention days and restored data is deleted.

Tip

Once restored data is deleted beyond retention days, it is not possible to search it unless you restore data again from the archive by running the restoresnapshots CLI command. Therefore, to be on the safer side, it would be a good idea to specify a higher value of retention days..

To delete data restored from archive for a selected period, run the deleterestoreddata CLI command. For more information, see deleterestoreddata-CLI-command.

Changing the default retention days

It is possible to change the default Retention Days by changing the number of days of retention as follows:

  1. Navigate to the following location to locate the searchserviceCustomConfig.properties file.
    • Windows: %BMC_ITDA_HOME%\custom\conf\server
    • Linux: $BMC_ITDA_HOME/custom/conf/server
  1. In the the searchserviceCustomConfig.properties file,  uncomment and change the following property:

     restore.data.retention.in.days

For example, if you want to change the Time-To-Live from 2 days to 3 days, you can do it by un-commenting the property and specifying the retention days. 

Example: 

Self-health monitoring and troubleshooting

TrueSight IT Data Analytics generates events for archiving and displays them in the destination that you set so that you can gain an early insight into likely issues. For information, see  Self-health monitoring events generated for archiving.

For information on troubleshooting archiving-related issues, see Troubleshooting-archiving-related-issues.


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*