Bulk data request output


The output of a request to the bulk data export API is a single file in ZIP format. This file contains several data and metadata files.

The data files are compressed CSV files, one for each type of exported record:

  • object... .csv.gz
  • page... .csv.gz
  • session... .csv.gz
  • error... .csv.gz

The system creates data files only for record types that have been selected for export. In other words, if you select errors only, the system will creates just one data file.

The metadata files are:

  • exportinfo-... .xml—Describes the source of the data (data provider and request), lists all exported fields, and describes the structure of exported custom fields
  • exportstats-... .csv—Provides statistics about the time range of an export, and counts of available, exported, filtered (objects and errors), dropped, and discarded records, and not stored records, per record type, per-"bucket".
     If you make configuration changes over a course of an export operation (for example, if you change the selection of fields), the system organizes data export into sections, one for each configuration. In this case, the ZIP file contains an exportstats file for each section.
  • globalexportstats-... .csv—Provides statistics about the time range of the request itself, the time of the earliest available data, and so on, and counts of available, exported, filtered (objects and errors), dropped, and discarded records, and not stored records, per record type, for the entire export.

Objects and errors can be filtered administratively via the web interface. If the system becomes overloaded, it might drop records.

Individual exports are not completely integral. In other words, the pages for a single session or objects for a single page might span multiple exports.

The system includes metadata files whether or not data files are created.

Warning

The ZIP file uses ZIP64 encoding to deal with the potential for very large gzip-compressed files in each entry. To avoid corruption during decompression, ensure that you use a utility that can properly process ZIP64 encoding.

If there is no data available for a given export request, the system still returns a ZIP file with a valid statistics file. However, the CSV files only contain column headers. You can also make requests that download only the statistics file.

Recommendations
  • Do not request overlapping data; always specify mutually exclusive time ranges.
  • Configure your firewalls and intrusion detection mechanisms to allow long connections and downloads from the system.

Statistics file

The statistics file contains the following information:

  • A single line with the timeframe requested in the yyy.mm.dd.HH.MM format:
    requested_time_range, <start time>, <end time>
  • If no data is available for the requested timeframe, the system adds a line with the following message:
    no data available for the requested time range

Otherwise, the system adds the following additional lines:

  • A single line with the timeframe exported in the yyy.mm.dd.HH.MM format:
    exported_time_range, <start time>, <end time>

    Note

    If the export was prematurely terminated or if the data available does not cover the entire requested timeframe, this timeframe differs from the requested timeframe.

  • A single line of comma-separated column headers that describe the subsequent lines of statistical data.

Headers and descriptions of their data

 1 If you download the statistics file without requesting any data, the value for this column is always 0.
2 The system does not currently drop records. The value for this column should always be 0.

Note

When a request only downloads the statistics file (in other words, no data was requested), this file reports what would have been downloaded if the data had been requested as well.

If there is a problem during a download, the system terminates the export and adds one of the following messages to the statistics file:

  • #export terminated: requested via UI
  • #export terminated: system rebooting
  • #export terminated: feature disabled
  • #export terminated: purging staging area
  • #export terminated: staging area rollover

Time format in statistics files

The format of time-related fields in statistics files is yyyy.mm.dd.HH.MM[Szzxx] where:

 3 The system provides information about time-zone offset only when the request includes the parameter/value tz=true.
back to top

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*