Page tree

The output of a request to the bulk data export API is a single file in ZIP format. This file contains several data and metadata files.

The data files are compressed CSV files, one for each type of exported record:

  • object... .csv.gz
  • page... .csv.gz
  • session... .csv.gz
  • error... .csv.gz

The system creates data files only for record types that have been selected for export. In other words, if you select errors only, the system will creates just one data file.

The metadata files are:

  • exportinfo-... .xml—Describes the source of the data (data provider and request), lists all exported fields, and describes the structure of exported custom fields
  • exportstats-... .csv—Provides statistics about the time range of an export, and counts of available, exported, filtered (objects and errors), dropped, and discarded records, and not stored records, per record type, per-"bucket".
    If you make configuration changes over a course of an export operation (for example, if you change the selection of fields), the system organizes data export into sections, one for each configuration. In this case, the ZIP file contains an exportstats file for each section.
  • globalexportstats-... .csv—Provides statistics about the time range of the request itself, the time of the earliest available data, and so on, and counts of available, exported, filtered (objects and errors), dropped, and discarded records, and not stored records, per record type, for the entire export.

Objects and errors can be filtered administratively via the web interface. If the system becomes overloaded, it might drop records.

Individual exports are not completely integral. In other words, the pages for a single session or objects for a single page might span multiple exports.

The system includes metadata files whether or not data files are created.

Warning

The ZIP file uses ZIP64 encoding to deal with the potential for very large gzip-compressed files in each entry. To avoid corruption during decompression, ensure that you use a utility that can properly process ZIP64 encoding.

If there is no data available for a given export request, the system still returns a ZIP file with a valid statistics file. However, the CSV files only contain column headers. You can also make requests that download only the statistics file.

Recommendations

  • Do not request overlapping data; always specify mutually exclusive time ranges.
  • Configure your firewalls and intrusion detection mechanisms to allow long connections and downloads from the system.

Statistics file

The statistics file contains the following information:

  • A single line with the timeframe requested in the yyy.mm.dd.HH.MM format:
    requested_time_range, <start time>, <end time>
  • If no data is available for the requested timeframe, the system adds a line with the following message:
    no data available for the requested time range

Otherwise, the system adds the following additional lines:

  • A single line with the timeframe exported in the yyy.mm.dd.HH.MM format:
    exported_time_range, <start time>, <end time>

    Note

    If the export was prematurely terminated or if the data available does not cover the entire requested timeframe, this timeframe differs from the requested timeframe.

  • A single line of comma-separated column headers that describe the subsequent lines of statistical data.

Headers and descriptions of their data

Header

Description

bucket_start_time

Start time of the 5 minute period, as yyy.mm.dd.HH.MM

bucket_end_time

End time of the 5-minute period, as yyy.mm.dd.HH.MM

available_session_records

The number of available session records available for export in the staging area

available_pages

The number of available pages available for export in the staging area

available_objects

The number of available objects available for export in the staging area

available_errors

The number of available errors available for export in the staging area

exported_session_records 1

The number of session records exported in the 5 minute period

exported_ pages 1

The number of pages exported in the 5 minute period

exported_objects 1

The number of objects exported in the 5 minute period

exported_errors 1

The number of errors exported in the 5 minute period

filtered_objects

The number of objects that were blocked by object filters defined in the staging area

filtered_errors

The number of errors that were blocked because their objects were filtered in the staging area

dropped_session_records 2

The number of session records dropped (due to processing errors)

dropped_pages 2

The number of pages dropped (due to processing errors)

dropped_objects 2

The number of objects dropped (due to processing errors)

dropped_errors 2

The number of errors dropped (due to processing errors)

discarded_session_records

The number of session records discarded (because the export was terminated)

discarded_pages

The number of pages discarded (because the export was terminated)

discarded_objects

The number of objects discarded (because the export was terminated)

discarded_errors

The number of errors discarded (because the export was terminated)

session_records_not_stored

The number of session records not stored in the system (because the field is not selected for export)

pages_not_stored

The number of pages not stored in the system (because the field is not selected for export)

objects_not_stored

The number of objects not stored in the system (because the field is not selected for export)

errors_not_stored

The number of errors not stored in the system (because the field is not selected for export)

1 If you download the statistics file without requesting any data, the value for this column is always 0.
2 The system does not currently drop records. The value for this column should always be 0.

Note

When a request only downloads the statistics file (in other words, no data was requested), this file reports what would have been downloaded if the data had been requested as well.

If there is a problem during a download, the system terminates the export and adds one of the following messages to the statistics file:

  • #export terminated: requested via UI
  • #export terminated: system rebooting
  • #export terminated: feature disabled
  • #export terminated: purging staging area
  • #export terminated: staging area rollover

Time format in statistics files

The format of time-related fields in statistics files is yyyy.mm.dd.HH.MM[Szzxx] where:

yyyy

is the year

mm

is the month

dd

is the day

HH

is the hour (in the 24-hour notation)

MM

is the minutes

S 3

is the time-zone offset (+ or -)

zz 3

is the number of hours of the time-zone offset from UTC

xx 3

is the number of minutes of the time-zone offset from UTC

3 The system provides information about time-zone offset only when the request includes the parameter/value tz=true.
back to top