Editing unicode (UTF-16) data

Warning

A record layout is required to recognize data as Unicode data. Therefore, we recommend to edit Unicode data only in Formatted (FMT) or Vertical Formatted (VFMT) display modes.

Unicode (UTF-16) data is converted and displayed as an EBCDIC text image when the data is located on a COBOL usage NATIONAL field in a formatted display.

The following table shows how the Unicode data is displayed in each mode.

Display of Unicode (UTF-16) Data in Browse and Edit

	Data
	NATIONAL Ex. PIC N(3)		NATIONAL Numeric Ex. PIC S99V9 USAGE NATIONAL SIGN TRAILING SEPARATE
Mode	Valid Data	Invalid Data	Valid Data	Invalid Data
Mode	Sample: X'004100420043'	Sample: X'004101580043'	Sample: X'003100320033002D'	Sample: X'0031003200330041'
FMT	ABC	<Browse> ¹ A.C<Edit> A C	-12.3	X'0031003200330041'
VFMT- HEX OFF	ABC	<Browse> A.C<Edit> A C	-12.3	INVALID
VFMT- HEX ON	A B C² 040404 010203	A . C ³ 040504 011803	1 2 3 - 03030302 0102030D	1 2 3 A 03030304 01020301

¹ : The non-displayable substitution character is converted to a period in Browse and also in vertically formatted (VFMT) Edit with HEX ON. In Edit, the substitution character is a protected blank character instead of the period to prevent overtyping the character.
² : The character line (first line) is protected in HEX ON display in Browse and Edit.
³ : The substitution character is always displayed as a period in VFMT HEX ON.

If a NATIONAL field is truncated in the middle, File-AID will convert Unicode data to EBCDIC text image if possible.
If a NATIONAL Numeric field is truncated in the middle, File-AID will not display this field as well as normal numeric field.
If Unicode Conversion fails for any reason, File-AID displays the original Unicode data in HEX format if FMT mode, displays U-INVALID if VFMT mode.

Formatted Displays

For the two modes for formatted display, formatted (FMT) and vertically formatted (VFMT) the following changes apply to:

SHOW PICTURE

For NATIONAL field, the Picture column shows N(nn) with the representation of the data declaration. For example, PICTURE N(5) would be displayed as PICTURE of N(5).

Important

When PICTURE N is defined as DBCS field, the Picture column will continue to display G(nn).

SHOW PICTURE for Unicode (UTF-16) Data

File-AID - Browse - TSOID01.NATL.DATA ------------------------------- COL 1 73
COMMAND ===>                                                  SCROLL ===> CSR
RECORD:      1                       NATL-SAMP                  LENGTH:      84
---- FIELD LEVEL/NAME ------- PICTURE- ----+----1----+----2----+----3----+----4
  5 A                           N(5)     ABC
  5 B                           N(8)     12/31/99
  5 C                           S999.99 -123.45
  5 D                           N(12)    $12,345.67
  5 E                           N(11)    +1.2346E+04
****************************** BOTTOM OF DATA *********************************

SHOW FORMAT

The format for NATIONAL field is nn/UT16 with nn being the length of the field. For example, PICTURE N(5) displays as FORMAT of 10/UT16.

When PICTURE N is defined as DBCS field, the format column will continue to display nn/DBCS).

SHOW FORMAT for Unicode (UTF-16) Data

File-AID - Browse - TSOID01.NATL.DATA ------------------------------- COL 1 73
COMMAND ===>                                                  SCROLL ===> CSR
RECORD:      1                       NATL-SAMP                  LENGTH:      84
---- FIELD LEVEL/NAME ------- -FORMAT- ----+----1----+----2----+----3----+----4
5 A                            10/UT16 ABC
5 B                            16/UT16 12/31/99
5 C                            12/UNUM -123.45
5 D                            24/UT16 $12,345.67
5 E                            22/UT16 +1.2346E+04
****************************** BOTTOM OF DATA *********************************

INIT command

When you issue the INIT command (see INIT), File-AID/MVS will initialize NATIONAL fields as follows:

NATIONAL field (ex. PIC N) is initialized with Unicode blank characters (X'0020').
NATIONAL Numeric field (ex. PIC 9 usage NATIONAL) is initialized with Unicode character zeros (x'0030'). If Signed numeric, '+'(x'002B') is included corresponding to COBOL SIGN TRAILING SEPARATE or SIGN LEADING SEPARATE statement.

FIND and CHANGE commands

When you issue the FIND (see FIND-F) or CHANGE command (see CHANGE-CHG-C) for Unicode (UTF-16) data, File-AID/MVS has these restrictions:

Only supports hex format.
The FIND parameters VALID and INVALID are not supported.
In FMT mode and VFMT HEX OFF mode, the cursor does not point to the exact position of the found string.

Example:

To find the number 611 in a Unicode field, enter this FIND command:

F x'003600310031'

SORT Order

The collating sequence of Unicode is different than that of EBCDIC. The SORT command allows you to reorder the data. The SORT command always operates on the underlying data; thus, when the data is Unicode, the results may be different than for EBCDIC data.

The following table shows the difference between Unicode order and EBCDIC order.

SORT Order for EBCDIC and Unicode UTF-16

EBCDIC		Unicode UTF-16
Order	HEX Value	Order	HEX Value
Space	X’40’	Space	X’0020’
Lowercase letters (a to z)	X’81’ to X’89’ X’91’ to X’99’ X’A2’ to X’A9’	Numbers (0 to 9)	X’0030’ to X’0039’
Uppercase letters (A to Z)	X’C1’ to X’C9’ X’D1’ to X’D9’ X’E2’ to X’E9’	Uppercase letters (A to Z)	X’0041’ to X’005A’
Numbers (0 to 9)	X’F0’ to X’F9’	Lowercase letters (a to z)	X’0061’ to X’007A’

Character Display Line for Unicode UTF-16

File-AID/MVS recognizes Unicode data fields and displays the correct character representation data for the Unicode data, based upon the active code page. For each Unicode field, the Unicode data is converted to the appropriate CCSID.

Once the data has been converted, the normal File-AID/MVS processing will be used to determine if the data is valid. When the data is valid, the character defined in the active code page will be displayed. When any of the characters are invalid, the invalid character will be replaced by an ISPF attribute. The attribute byte will appear as a blank on the display but the attribute byte cannot be overtyped on the character line. Switch to HEX mode to change the data that is invalid.

Values in the character display line are the converted EBCDIC-based data from the Unicode hexadecimal values. In hexadecimal format, you cannot overtype the values in the character display line. Each data position of the values is adjusted to be matched with the corresponding Unicode hexadecimal value, since Unicode data length may be different than the converted EBCDIC-based data length. For example: In EBCDIC, the data 123 is 3 bytes; in Unicode UTF-16 the same data is 6 bytes (003100320033). In vertical format the the hex value is displayed vertically:

1 2 3
030303
010233

The following figure shows Unicode (UTF-16) data displayed in Vertical Formatted display in hexadecimal format.

Unicode (UTF-16) Data in Vertical Formatted display with HEX ON

File-AID - Edit - TSOID01.NATL.DATA --------------------- COLUMNS 000001 000062
COMMAND ===>                                                  SCROLL ===> CSR
        A          B                C             D
        10/UT16    16/UT16          12/UNUM       24/UT16
        (1-10)     (11-26)          (27-38)       (39-62)
        1--------- 2--------------- 3------------ 4-----------------------
****** ***************************** TOP OF DATA ******************-CAPS OFF-*
000001 A B C      1 2 / 3 1 / 9 9 - 1 2 3 4 5   $ 1 2 , 3 4 5 . 6 7
V      0404040202 0303020303020303 020303030303 020303020303030203030202
V      0102030000 01020F03010F0909 0D0102030405 0401020C0304050E06070000
****** **************************** BOTTOM OF DATA ****************-CAPS OFF-*

File-AID/MVS Data Validation

File-AID/MVS uses internal character set tables to determine if data contains unprintable characters. Several tables for different languages are shipped with the product. For online validation, the Character Set to be used is specified under the Parameters option for System Parameters. For batch validation, the CHARSET parameter is used to identify the Character Set. See the Install section for more information.

Printing Unicode Data

Use the CCSID parameter when printing Unicode data to specify the code page to be used. Manually add the parameter to the print JCL to override the default CCSID, for example:

$$DD01 VPRINT SHOW=F,LAYOUT=NATL,OUT=0,TRUNC=NO,
FILLER=ON,ZERO=OFF,CCSID=1140