Tutorial 2: Combine and Convert Data Files
In this tutorial, users will learn how to combine two small data files into one data file. This tutorial uses ConverterPro expressions to transform the data files before they are written to the output file.
Conversion Type: Many-to-one.
After completing this tutorial, users will be able to do the following:
- Define data source and data target connections.
- Import record layouts using the Record Layout Editor.
- Relate multiple data sources using the Specify Parent-Child Relationships pane.
- Relate fields within multiple data sources using the Specify Data Relationships pane.
- Map the source to the target using automapping.
- Apply expressions to target fields using the Expression Editor.
- Save and run a conversion specification.
- Browse source and target data.
Source and Target Data Descriptions
In this tutorial, both the source and target data types are comma-delimited files.
Data Sources
<File-AID/EX installation directory>\Samples\Source.empdata.dat
<File-AID/EX installation directory>\Samples\Source.saldata.dat
The fields in the Source.empdata.dat file include the following:
Source.empdata.dat contains the following data:
Gordan,Julia,449-53-2663,928 E. 40th #103,Austin,Tx,69
Ramirez,Victor,421-44-2255,233 Main St,New Braunfels,Tx,55
Vance,Damon,215-77-4258,3442 Pleasant Valley,Austin,Tx,71
The fields in the Source.saldata.dat include the following:
Source.saldata.dat contains the following data:
458-23-6852,President,120000
215-77-4258,Ex. V.P.,92000
449-53-2663,V.P. Tech Support,78000
Data Target
c:\Target.empdata_trgt.dat or <your directory path>\Target.empdata_trgt.dat
The target format defines the data to be converted. It also provides a blueprint or schema of all output data objects. If the target delimited file is converted to another database or data location, then it is necessary to set up a specific schema layout for the file.
For this tutorial, the target data file layout is as follows:
Within the file, the data fields have the following format:
Lastname: String
SSN: 11-byte fixed-length string
YOB: Four-digit integer YOB
Position: String
Salary: Integer
Define the First Data Source
This conversion specification contains two source files. The first step is to define the first data source using ConverterPro’s Data Sources tab.
- Launch ConverterPro like any other application based on your operating system. The ConverterPro window appears.
- Click
. The Data Sources tab appears.
- From the Connector Type list, select Delimited File. The Delimited File pane appears.
- In the Delimited File pane’s Connector Name field, type empdata.
Click Browse, navigate to the Samples folder within the File-AID/EX installation directory, select Source.empdata.dat, and click Open. This is the first data source for this conversion specification. The path and file name appear in the File Name field.
- Click Next. The Record Layout Selection pane appears.
Click Next. The delimited parser pane appears.
- In the Name field, type empdata.
- From the Header Record list, select Yes. The headers move to the column headings row.
- Click Next. The record layout table pane appears, and the empdata record layout is complete.
Click More Sources. The project tree view updates to display the first data source (empdata) and connector (empdata). The path and file name appear below the data source.
Define the Second Data Source
From the Connector Type list, select Delimited File. The Delimited File pane appears.
- In the Delimited File pane’s Connector Name field, type saldata.
- Click Browse, navigate to the Samples folder within the File-AID/EX installation directory, select Source.saldata.dat, and click Open. The path and file name appear in the File Name field.
Click Next. The Record Layout Selection pane appears.
- Click Next. The parse delimited file settings pane appears.
- In the Name field, type saldata.
- Click Next. The record layout table pane populates with the new information. The table name (saldata) appears in the Name column. Field names appear below the table name. The Occurs/Length column displays the number of occurrences of the record or the length of the field. Field type, encoding, and comments appear in respective columns.
Change Table Information
Following are the field format requirements for saldata:
Field1 = SSN, Type = String, Encoding = Display, Occurs/Length = VAR
Field2 = Position, Type = String, Encoding = Display, Occurs/Length = VAR
Field3 = Salary, Type = String, Encoding = Display, Occurs/Length = VAR
- Double-click Field1 to make it editable, type SSN and press Enter. The remaining fields already match the requirements of the tutorial.
- Double-click Field2, type Position and press Enter. The remaining fields already match the requirements of the tutorial.
- Double-click Field3, type Salary and press Enter. The saldata record layout is complete.
- Click Finish Source or Next. The Data Targets tab appears. The project tree view updates to display the second data source (saldata) and connector (saldata). The path and file name appear below the data source.
Define the Data Target
- From the Connector Type list, select Delimited File. The Delimited File pane appears.
- In the Delimited File pane’s Connector Name field, type empdata_target.
In the File Name field, type one of the following:
- c:\Target.empdata_trgt.dat
- <your directory path>\Target.empdata_trgt.dat
- From the Target/Action (Exist/Not Exist) field, select Recreate/Create.
- Click Next. The Record Layout Selection pane appears. The Existing Record Layouts box shows the empdata and saldata record layouts that have been defined. Next, this tutorial will modify these source record layouts to easily create the record layout for empdata_trgt.dat.
- Click
. The empdata and saldata record layouts are added to the Selected Record Layouts box.
- Click Next. The delimited parser pane appears.
- Modify the empdata_trgt record layout by doing the following:
- Double-click empdata, rename it to empdata_trgt and press Enter.
- Double-click the Year field, rename it to YOB and press Enter.
- In the Type column list for YOB, select Integer.
- Modify the saldata record layout by doing the following:
- Right-click the SSN field and select Delete. The SSN field is deleted.
- Drag the Position field below the YOB field in the empdata_trgt record layout.
- Drag the Salary field below the Position field in the empdata_trgt record layout.
- Double-click and rename any updated field names to the original field names. Automap is easier to use in this tutorial if the source and target fields names match.
- Delete the saldata record layout. Only the empdata_trgt record layout remains.
- Delete the City field.
- Delete the State field.
- Delete the Address field.
- Click Next. The project tree view updates to display the empdata_trgt target and empdata_target connector. The path and file name appear below the data target. The Mapping Editor tab appears.
Specify Parent-Child Relationships
The Specify Parent-Child Relationships pane of the Mapping Editor tab displays the data sources that have been selected.
The order in which the two data sources are processed is determined by which source is the parent and which source is the child.
- Select saldata and click Make Child. The saldata source moves a level below empdata, making it a child and making empdata the parent.
- Click Next. The Set Source Record Level Relationships pane appears.
Specify Source Record Level Relationships
The Set Source Record Level Relationships pane displays data sources in the hierarchical order that was specified in the Specify Parent-Child Relationships pane.
This pane is used to specify fields that relate the data sources to each other. This tutorial uses the SSN field to relate the empdata and saldata sources.
Select the empdata’s SSN field and drag-and-drop this field onto the saldata’s SSN field. Source.empdata.SSN appears in the Relationship Expression column. The required relationship information to process the two sources correctly is now in place.
Using this relationship expression, the Execution Server searches saldata until it finds an SSN to match the current record being checked in empdata. It then continues with the mapping process. If a match is not found, then the Execution Server proceeds to the next record in empdata.
- Click Next. The Mapping Editor tab appears.
Map the Source to the Target
The Mapping Editor tab allows users to specify source-to-target mapping. The upper pane displays the data relationships specified on the previous pane. The lower pane displays the fields from the empdata source and from the empdata_trgt target. To view the saldata source in the lower pane, select saldata from the upper pane.
Using Automap
Use the automap feature to map each field in empdata to a corresponding field in empdata_trgt.
1. In the Source box in the lower pane, right-click empdata and select Automap > By Field Name Match. The Mapped Field column displays the mapping results in which all like-named fields are mapped and the Year field is mapped to the YOB field.
Using the Expression Editor
The Expression Editor allows users to create various expressions that reformat, transform, calculate, and validate source data before it is placed in the mapped target field.
This section of the tutorial creates an expression that results in a four-digit year of birth and applies it to the target YOB (year of birth) field.
This expression will add the numeric value 1900 to the two-digit Year field in the source, thus creating a four-digit year of birth when this source field maps to the target field.
- In the Mapped Field column for YOB, click
. The Expression Editor - Mapping Customization dialog box appears. The mapped source field, STR2INT(Source.empdata.Year), appears. STR2INT specifies that the string for year is translated into an integer based on your specifying YOB as an integer earlier.
- Position the cursor after STR2INT(Source.empdata.Year) and click
. A plus sign appears at the cursor position.
- Type 1900 after the + sign. The expression now reads as follows:
STR2INT(Source.empdata.Year)+1900 Click OK to save the expression. The YOB field shows the expression just created.
Save and Run the Conversion Specification
Before running this conversion specification, save it to the repository.
- From the File menu, select Save. The Save Conversion As dialog box appears.
- In the Name field, type Tutorial2.
- In the Description field, type Combining two data sources into one data target.
- Select the Save with Userid/Password check box.
- Click OK.
- Click Next twice to advance to the final Mapping Editor window.
Click Run. The Execution Summary Viewer displays the conversion progress.
The conversion specification name (Tutorial2), execution submission date, and time appear in the Execution Summary Viewer. The Summary tab shows that four records were read from empdata, four records were read from saldata, and four records were written to empdata_trgt. It also shows that no records were discarded.
The Execution Summary Viewer consists of two tabs: Summary and Details. The default view shows summary information. Select the Details tab for more information about your conversion.
Target.empdata_trgt.dat is written to the directory path specified.
- Click Close.
Browse the Source and Target
To verify what was written to the target, compare the data in the Source Data Browser with the data in the Target Data Browser.
- Click
or, from the Tools menu, select Browse Source. The Source Data Browser appears, displaying empdata and saldata content. Click either file in the pane on the left to see the desired information.
- Click
or, from the Tools menu, select Browse Target. The Target Data Browser appears, displaying empdata_trgt contents. The fields from empdata and saldata have been merged into empdata_trgt and the year of birth has been converted.
- After reviewing the source and target data, close the browsers.
Tutorial 2 is now complete.