Considerations when loading CSV data


LOADPLUS enables you to load data from comma-separated-value (CSV) files. (CSV files are also referred to as comma-delimited or delimited files.)

CSV data format

Ensure that your CSV data meets the following requirements:

  • Your CSV file must contain delimiters. Enclosure characters (characters that enclose each field) in your CSV file are optional.
  • Input strings should not contain the same characters as your field delimiters or enclosure characters. If they do, LOADPLUS might load the record incorrectly or reject it. To prevent this problem, complete one of the following tasks:
    • Remove these characters from your input strings.
    • Change the delimiter or enclosure character in your input file so that you do not have a conflict with embedded characters. Then, ensure that the TERMINATEDBY, ENCLOSEDBY, and AND parameters on your FORMAT CSV option are compatible with the delimiter and enclosure characters in your input file.
  • Your CSV input file should use a variable-length record format (RECFM=VB). If your input file uses a fixed-length record format, and the last field on a record does not extend to the end of that record, the field must end with a delimiter.
  • A one-to-one correspondence should exist between the input fields of your CSV data and the field specifications on your LOAD command. Otherwise, the following results occur. (For more information, see CSV field specifications.)
    • If you have more input fields than field specifications, LOADPLUS ignores the extra input file data. LOADPLUS maps field specifications to input fields in order, starting with the first specification and the first input field.
    • If you have fewer input fields than field specifications, LOADPLUS ignores the additional field specifications.
  • LOADPLUS supports loading data that was created by using the FORMAT CSV option of UNLOAD PLUS. However, when loading CSV data, LOADPLUS requires that DATE, TIME, and TIMESTAMP data be in Db2 EXTERNAL format. The control cards that UNLOAD PLUS produces do not specify an external format for these data types. Either modify these control cards or load the data by using a LOADPLUS job that provides the correct external format for these data types.

    Important

    Although supported, using the FORMAT CSV option of LOADPLUS and UNLOAD PLUS is not the most efficient method for using these two products to migrate Db2 data.

    If your table structures are almost identical, we recommend that you specify FORMAT BMCLOAD (in UNLOAD PLUS) and FORMAT BMCUNLOAD (in LOADPLUS) to migrate Db2 data. For more information, see Using-data-from-UNLOAD-PLUS-in-internal-BMC-format.

    If your table structures are not similar enough to use this method, we recommend that you use the FORMAT INTERNAL option of UNLOAD PLUS. For more information, see the 

    UNLOAD PLUS for DB2

     documentation.

Installation option considerations when loading CSV data

When you specify FORMAT CSV, LOADPLUS changes the RULES installation option to STANDARD, regardless of the value specified at installation.

FORMAT CSV restrictions

LOADPLUS terminates when either of the following conditions exists and you specify FORMAT CSV:

  • You are loading LOB or XML data.
  • You also specify CONTINUEIF.

CSV field specifications

You must include a field specification on your LOAD command for every column that you are loading.

In addition, each field specification must map to an input field in your CSV file. LOADPLUS maps field specifications to input fields in order, starting with the first specification and the first input field.

Important

Alternatively, you can elect not to include any field specifications. If you do not include any field specifications, LOADPLUS loads the data as if you had included a field specification that maps to every column in the table that you are loading.

Excluding particular input fields

If you do not want to load data from a particular input field, complete one of the following tasks:

  • If the input field that you do not want to load is the last field (or fields) in the CSV file, you do not need to include a field specification for that field. LOADPLUS ignores any extra fields at the end of the input file.
  • If the input field that you do not want to load is not the last field (or fields) in the CSV file, include a placeholder field specification on your LOAD command. In this field specification, use a field name that is not the name of a column in the Db2 table. You must also specify SKIPFIELDS YES on your LOAD command.

The CSV field specification mapping figure illustrates how LOADPLUS maps your field specifications for CSV files to your Db2 table columns. Example 2a does not supply a field specification for the employee department field. Therefore, LOADPLUS loads employee department data into the employee number column, which is not the desired result. Example 2b corrects this problem by adding a placeholder field specification.

Field specification requirements

Your field specifications must meet the following requirements:

  • For fields that you load, field names must match the corresponding column names.
  • For DATE, TIME, or TIMESTAMP data, your field specification must be an EXTERNAL data type. For DATE and TIME data, you must specify one of the delimited formats listed in the tables in Supported-external-DATE-TIME-and-TIMESTAMP-formats. For delimited TIMESTAMP data, if you do not specify a delimited format, LOADPLUS defaults to format D2E.
  • To have LOADPLUS interpret two consecutive delimiter characters as a null field, include the following NULLIF option:

    NULLIF fieldName = ''

    If you do not include the NULLIF option, LOADPLUS treats input fields denoted by two consecutive delimiter characters in the following ways:

    • For DATE, TIME, and TIMESTAMP fields where the column is nullable, LOADPLUS loads a null value.
    • For DATE, TIME, and TIMESTAMP fields where the column is defined as NOT NULL, LOADPLUS loads the default date, time, or timestamp.
    • LOADPLUS treats all other fields as VARCHAR fields with length 0.

Additional field specification considerations

Consider the following additional information about field specification options:

  • You can use NULLIF and DEFAULTIF field specifications. However, be aware that LOADPLUS changes the value of the RULES installation option to STANDARD., affecting how LOADPLUS handles NULLIF and DEFAULTIF specifications.
  • LOADPLUS ignores the following field specifications:
    • POSITION (start:end)

      LOADPLUS starts from the first position of each field. However, we recommend that you specify POSITION(*).

    • Data type, except on specifications for DATE, TIME, and TIMESTAMP fields

      LOADPLUS performs data type conversion only to DATE, TIME, and TIMESTAMP external types.

    • Length

Example field mapping

The following figure illustrates how LOADPLUS maps your field specification to the table that you are loading:


CSV_field_spec_mapping.png

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*