Tutorial 2: Combine and Convert Data Files


In this tutorial, users will learn how to combine two small data files into one data file. This tutorial uses ConverterPro expressions to transform the data files before they are written to the output file.

Important

In the record layout, dragging a field to another position within a record layout changes the name of the field. For example, Position will become Position_2. If moved again, it will become Position_3. The reason this happens is to prevent duplicate field names. Duplicate field names can occur when a field at one level has the same name as a field at another level. If both fields kept the same name and were moved to the same level, this could cause data errors. However, for automap to work correctly in the tutorials, BMC Compuware recommends that users rename any fields where the name changes to the original field name.

Conversion Type: Many-to-one.

After completing this tutorial, users will be able to do the following:

  • Define data source and data target connections.
  • Import record layouts using the Record Layout Editor.
  • Relate multiple data sources using the Specify Parent-Child Relationships pane.
  • Relate fields within multiple data sources using the Specify Data Relationships pane.
  • Map the source to the target using automapping.
  • Apply expressions to target fields using the Expression Editor.
  • Save and run a conversion specification.
  • Browse source and target data.

Source and Target Data Descriptions

In this tutorial, both the source and target data types are comma-delimited files.

Data Sources

<File-AID/EX installation directory>\Samples\Source.empdata.dat

<File-AID/EX installation directory>\Samples\Source.saldata.dat

The fields in the Source.empdata.dat file include the following:

Firstname,Lastname,SSN,Address,City,State,Year

Source.empdata.dat contains the following data:

Copeland,David,458-23-6852,8300 Jollyville Rd,Austin,Tx,72
Gordan,Julia,449-53-2663,928 E. 40th #103,Austin,Tx,69
Ramirez,Victor,421-44-2255,233 Main St,New Braunfels,Tx,55
Vance,Damon,215-77-4258,3442 Pleasant Valley,Austin,Tx,71

The fields in the Source.saldata.dat include the following:

 SSN,Position,Salary

Source.saldata.dat contains the following data:

421-44-2255,V.P. Sales,75000
458-23-6852,President,120000
215-77-4258,Ex. V.P.,92000
449-53-2663,V.P. Tech Support,78000

Data Target

c:\Target.empdata_trgt.dat or <your directory path>\Target.empdata_trgt.dat

The target format defines the data to be converted. It also provides a blueprint or schema of all output data objects. If the target delimited file is converted to another database or data location, then it is necessary to set up a specific schema layout for the file.

For this tutorial, the target data file layout is as follows:

 Firstname,Lastname,SSN,YOB,Position,Salary

Within the file, the data fields have the following format:

Firstname: String
Lastname: String
SSN: 11-byte fixed-length string
YOB: Four-digit integer YOB
Position: String
Salary: Integer

Define the First Data Source

This conversion specification contains two source files. The first step is to define the first data source using ConverterPro’s Data Sources tab.

  1. Launch ConverterPro like any other application based on your operating system. The ConverterPro window appears.
  2. Click  image2021-8-26_18-38-29.png . The Data Sources tab appears.
  3. From the Connector Type list, select Delimited File. The Delimited File pane appears.
  4. In the Delimited File pane’s Connector Name field, type empdata.
  5. Click Browse, navigate to the Samples folder within the File-AID/EX installation directory, select Source.empdata.dat, and click Open. This is the first data source for this conversion specification. The path and file name appear in the File Name field.

    Important

    If the Source.empdata.dat file is not shown, be sure to select All Files in the Files of Type field.

  6. Click Next. The Record Layout Selection pane appears.
  7. Click Next. The delimited parser pane appears.

    Important

    To convert numeric fields with no delimiter to a delimited target file, the target field type must be numeric.

  8. In the Name field, type empdata.
  9. From the Header Record list, select Yes. The headers move to the column headings row.
  10. Click Next. The record layout table pane appears, and the empdata record layout is complete.
  11. Click More Sources. The project tree view updates to display the first data source (empdata) and connector (empdata). The path and file name appear below the data source.

    Define the Second Data Source

  12. From the Connector Type list, select Delimited File. The Delimited File pane appears.

  13. In the Delimited File pane’s Connector Name field, type saldata.
  14. Click Browse, navigate to the Samples folder within the File-AID/EX installation directory, select Source.saldata.dat, and click Open. The path and file name appear in the File Name field.
  15. Click Next. The Record Layout Selection pane  appears.

    Important

    Import a layout to describe the fields in saldata (just as was done for empdata). Since delimited file was selected on the previous pane, Parse Delimited File and the source file appear on this pane.

  16. Click Next. The parse delimited file settings pane appears.
  17. In the Name field, type saldata.
  18. Click Next. The record layout table pane populates with the new information. The table name (saldata) appears in the Name column. Field names appear below the table name. The Occurs/Length column displays the number of occurrences of the record or the length of the field. Field type, encoding, and comments appear in respective columns.

Change Table Information

Following are the field format requirements for saldata:

Field1 = SSN, Type = String, Encoding = Display,  Occurs/Length =  VAR

Field2 = Position, Type = String, Encoding = Display, Occurs/Length = VAR

Field3 = Salary, Type = String, Encoding = Display, Occurs/Length = VAR

  1. Double-click Field1 to make it editable, type SSN and press Enter. The remaining fields already match the requirements of the tutorial.
  2. Double-click Field2, type Position and press Enter. The remaining fields already match the requirements of the tutorial.
  3. Double-click Field3, type Salary and press Enter. The saldata record layout is complete.
  4. Click Finish Source or Next. The Data Targets tab appears. The project tree view updates to display the second data source (saldata) and  connector (saldata). The  path  and file name appear below the data source.

Define the Data Target

  1. From the Connector Type list, select Delimited File. The Delimited File pane appears.
  2. In the Delimited File pane’s Connector Name field, type empdata_target.
  3. In the File Name field, type one of the following:

    • c:\Target.empdata_trgt.dat
    • <your directory path>\Target.empdata_trgt.dat

    Important

    Ensure that the directory path specified is a valid, locally mapped drive or path to which the user has write access.

  4. From the Target/Action (Exist/Not Exist) field, select Recreate/Create.
  5. Click Next. The Record Layout Selection pane appears. The Existing Record Layouts box shows the empdata and saldata record layouts that have been defined. Next, this tutorial will modify these source record layouts to easily create the record layout for empdata_trgt.dat.
  6. Click image2021-8-26_18-44-52.png . The empdata and saldata record layouts are added to the Selected Record Layouts box.
  7. Click Next. The delimited parser pane appears.
  8. Modify the empdata_trgt record layout by doing the following:
    1. Double-click empdata, rename it to empdata_trgt and press Enter.
    2. Double-click the Year field, rename it to YOB and press Enter.
    3. In the Type column list for YOB, select Integer.
  9. Modify the saldata record layout by doing the following:
    1. Right-click the SSN field and select Delete. The SSN field is deleted.
    2. Drag the Position field below the YOB field in the empdata_trgt record layout.
    3. Drag the Salary field below the Position field in the empdata_trgt record layout.
    4. Double-click and rename any updated field names to the original field names. Automap is easier to use in this tutorial if the source and target fields names match.
    5. Delete the saldata record layout. Only the empdata_trgt record layout remains.
    6. Delete the City field.
    7. Delete the State field.
    8. Delete the Address field.
  10. Click Next. The project tree view updates to display the empdata_trgt target and empdata_target connector. The path and file name appear below the data target. The Mapping Editor tab appears.

Specify Parent-Child Relationships

The Specify Parent-Child Relationships pane of the Mapping Editor tab displays the data sources that have been selected.

The order in which the two data sources are processed is determined by which source is the parent and which source is the child.

  1. Select saldata and click Make Child. The saldata source moves a level below empdata, making it a child and making empdata the parent.
  2. Click Next. The Set Source Record Level Relationships pane appears.

Specify Source Record Level Relationships

The Set Source Record Level Relationships pane displays data sources in the hierarchical order that was specified in the Specify Parent-Child Relationships pane.

This pane is used to specify fields that relate the data sources to each other. This tutorial uses the SSN field to relate the empdata and saldata sources.

  1. Select the empdata’s SSN field and drag-and-drop this field onto the saldata’s SSN field. Source.empdata.SSN appears in the Relationship Expression column. The required relationship information to process the two sources correctly is now in place.

    Using this relationship expression, the Execution Server searches saldata until it finds an SSN to match the current record being checked in empdata. It then continues with the mapping process. If a match is not found, then the Execution Server proceeds to the next record in empdata.

  2. Click Next. The Mapping Editor tab appears.

Map the Source to the Target

The Mapping Editor tab allows users to specify source-to-target mapping. The upper pane displays the data relationships specified on the previous pane. The lower pane displays the fields from the empdata source and from the empdata_trgt target. To view the saldata source in the lower pane, select saldata from the upper pane.

 Using Automap

Use the automap feature to map each field in empdata to a corresponding field in empdata_trgt.

1. In the Source box in the lower pane, right-click empdata and select Automap > By Field Name Match. The Mapped Field column displays the mapping results in which all like-named fields are mapped and the Year field is mapped to the YOB field.

Using the Expression Editor

The Expression Editor allows users to create various expressions that reformat, transform, calculate, and validate source data before it is placed in the mapped target field.

This section of the tutorial creates an expression that results in a four-digit year of birth and applies it to the target YOB (year of birth) field.

This expression will add the numeric value 1900 to the two-digit Year field in the source, thus creating a four-digit year of birth when this source field maps to the target field.

  1. In  the  Mapped Field column  for YOB, click image2021-8-26_18-49-18.png. The Expression Editor - Mapping Customization dialog box appears. The mapped source field, STR2INT(Source.empdata.Year), appears. STR2INT specifies that the string for year is translated into an integer based on your specifying YOB as an integer earlier.
  2. Position the cursor after STR2INT(Source.empdata.Year) and click image2021-8-26_18-49-46.png. A plus sign appears at the cursor position.
  3. Type 1900 after the + sign. The expression now reads as follows:
    STR2INT(Source.empdata.Year)+1900
  4. Click OK to save the expression. The YOB field shows the expression just created.

    Important

    If the change does not appear, click the field again.

 Save and Run the Conversion Specification

Before running this conversion specification, save it to the repository.

  1. From the File menu, select Save. The Save Conversion As dialog box appears.
  2. In the Name field, type Tutorial2.
  3. In the Description field, type Combining two data sources into one data target.
  4. Select the Save with Userid/Password check box.
  5. Click OK.
  6. Click Next twice to advance to the final Mapping Editor window.
  7. Click Run. The Execution Summary Viewer displays the conversion progress.

    The conversion specification name (Tutorial2), execution submission date, and time appear in the Execution Summary Viewer. The Summary tab shows that four records were read from empdata, four records were read from saldata, and four records were written to empdata_trgt. It also shows that no records were discarded.

    The Execution Summary Viewer consists of two tabs: Summary and Details. The default view shows summary information. Select the Details tab for more information about your conversion.

    Target.empdata_trgt.dat is written to the directory path specified.

  8. Click Close.

Browse the Source and Target

To verify what was written to the target, compare the data in the Source Data Browser with the data in the Target Data Browser.

  1. Click  image2021-8-26_18-52-33.png or, from the Tools menu, select Browse Source. The Source Data Browser appears, displaying empdata and saldata content. Click either file in the pane on the left to see the desired information.
  2. Click image2021-8-26_18-52-49.png  or, from the Tools menu, select Browse Target. The Target Data Browser appears, displaying empdata_trgt contents. The fields from empdata and saldata have been merged into empdata_trgt and the year of birth has been converted.
  3. After reviewing the source and target data, close the browsers. 

Tutorial 2 is now complete.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*