Tutorial 4: Convert Delimited File to XML


In  this tutorial, users will learn how to perform a  conversion that takes two data files,   one containing time zone information and one containing state information,  and combines them into one XML file that groups the states by time zone. The record layout describing the layout of the target XML file is defined by combining the source record layout definitions.

Important

In the record layout, dragging a field to another position within a record layout changes the name of the field. For example, Position will become Position_2. If moved again, it will become Position_3. The reason this happens is to prevent duplicate field names. Duplicate field names can occur when a field at one level has the same name as a field at another level. If both fields kept the same name and were moved to the same level, this could cause data errors. However, for automap to work correctly in the tutorials, Compuware recommends that users rename any fields where the name changes to the original field name.

Conversion Type: Many-to-One.

After completing this tutorial, users will be able to do the following do the following:

  • Define data source and data target connections.
  • Relate multiple data sources using the Specify Parent-Child Relationships pane.
  • Relate fields within multiple data sources using the Specify Data Relationships pane.
  • Map the source to the target using automapping.
  • Save and run the conversion specification.
  • Browse source and target data.

Source and Target Data Descriptions

Data Sources

Connector Types: Delimited Files

Location:

<File-AID/EX installation directory>\samples\Source.timezone.dat

<File-AID/EX installation directory>\samples\Source.state.dat

Source.timezone.dat contains the following data:

timezone_abbrev,timezone_name
CST,Central Standard Time
EST,Eastern Standard Time
MST,Mountain Standard Time
PST,Pacific Standard Time

The first five lines of Source.state.dat are:

state_abbrev,state_name,timezone_abbrev
AK,Alaska,PST
AL,Alabama,EST
AR,Arkansas,CST
AZ,Arizona,MST

Both data files are comma-delimited files that contain a header record describing the columns.

Data Target

Connector Type: XML File

File name and location: <your directory path>/Target.state_by_tz.xml

The output XML file, Target.state_by_tz.xml, will consist of a root element with a child time zone element for each time zone in the Source.timezone.dat input file. Each time zone element contains the associated information from the Source.timezone.dat file together with a child state element for each state in the Source.state.dat file in the time zone. The time zone elements in the output XML file will be of the form:

<timezone>
    <timezone_abbrev>content</timezone_abbrev>
    <timezone_name>content</timezone_name>
    <state state_abbrev="abbrev">
        <state_name>content</state_name>
    </state>
    .
    .
    .
    <state state_abbrev="abbrev">
        <state_abbrev>content</state_abbrev>
        <state_name>content</state_name>
    </state>
</timezone>

Define the Data Sources and Source Record Layouts

After starting ConverterPro, begin defining source data and connection information using the Data Sources tab.

  1. In  the  ConverterPro window, clickimage2021-8-27_15-28-37.png .
  2. From the Connector Type list, select Delimited File.
  3. In the Connector Name field, type timezone_delimited.
  4. Click Browse and navigate to the Samples directory.
  5. Select Source.timezone.dat and click Open.
  6. Click Next. The Record Layout Selection pane appears. Since Delimited File was selected in the previous pane, Parse Delimited File and the source file are displayed in the associated fields.
  7. Click Next. The delimited parser window appears.
  8. In the Name field, type timezone.
  9. From the Header Record list, select Yes  to  move the column headers to the first line  on the grid.
  10. Click Next. The record layout table pane appears. This pane can be used to make changes to the file’s properties. To see detailed information about the columns in this pane, refer to the online help information about record layout.

    Important

    Clicking Next instead of More Sources opens a blank connector for the second data source. This enables users to start the new source by clicking the Sources tab and selecting the desired connector type.

  11. Click More Sources and follow the instructions in the preceding step. Clicking More Sources updates the project tree view to display the first data source (timezone) and connector (timezone). The path and file name are displayed below the data source.

Define the Second Data Source

  1. From the Data Sources tab’s Connector Type list, select Delimited File,
  2. In the Connector Name field, type state_delimited.
  3. Click Browse, navigate to the Samples directory, select Source.state.dat, and click Open.
  4. Click Next. The Record Layout Selection pane appears. Since Delimited File was selected in the previous window, Parse Delimited File and the source file are displayed in the associated fields.
  5. Click Next. The delimited parser pane appears.
  6. In the Name field, type state.
  7. From the Header Record list, select Yes  to  move the column headers to the first line  on the grid.
  8. Click Next. The record layout table pane appears.
  9. Click Next. This updates the project tree view to display the second data source (state_delimited) and connector (state_delimited) and advances to  the  first  pane  of the Data Targets tab. The path and file name are displayed below the data source. 

Define the Data Target and Target Record Layout

  1. From the Connector Type fieldselect XML.
  2. In the Connector Name field, type state_tz_xml.
  3. In the File Name field, type <your directory path>/Target.state_by_tz.xml, or click Browse and navigate to <your directory pathand add file name Target.state_by_tz.xml. The File name field populates with the path and file name.
  4. Click Next. The Record Layout Selection pane appears. Derive Record Layout From is set to Existing Source/Target and the two source record layouts that were just defined appear in the Existing Record Layouts box.
  5. Clickimage2021-8-27_15-33-59.pngto add both record layouts to the Selected Record Layouts box and click Next. The target record layout pane appears. The record layout for the target is constructed from these source definitions.

    Important

    If a mistake is made while setting the record layout, click Back and select both record layouts again. Then click Next to start over.

  6. Right-click state and selecting Add XML Attribute to make the state_abbrev field an Attribute of the state record. This adds a child record named Attributes that has a child field named Attribute1 to the state record.
  7. Drag the state_abbrev field to the Attributes record.
  8. Right-click Attribute1 and select Delete to delete this child.
  9. Select state and drag it to timezone. This makes the state record a child of the timezone record.
  10. Expand the state record node, then right-click the timezone_abbrev field and selecting Delete. This deletes the timezone_abbrev field in the state record, since this information is already in the parent time zone record.
  11. Double-click Top Level Element and change its name to root, then press Enter. This creates a single root element, which XML requires.
  12. Rename any updated field names to the original field name. Automap is easier to use in this tutorial if the source and target fields names match.
  13. Click Finish Target to save the target record layout definition. This updates the project tree view and advances to the Specify Parent-Child Relationships pane of the Mapping Editor tab.

Specify Mapping Information

When selecting more than one data source,  the relationships between the sources needs  to be specified. These relationships determine how the data is read. In this situation, for each record in the timezone data source, we want to read all associated state records from the state data source.

To define the relationship between the time zone and state record layout definitions:

  1. Select state and click Make Child to make state a child of timezone. State moves from the same level as timezone to a level below.
  2. Click Next. The Set Source Record Level Relationships pane displays the field definitions created earlier showing the hierarchical order specified on the previous pane. This window lets users specify the field that relates timezone to state. In this case, use timezone_abbrev to relate the two files.
  3. In the timezone record, click timezone_abbrev and drag it to the Relationship Expression column associated with the timezone_abbrev field in the state record. ConverterPro now has the required relationship information to process the  two files in a related manner.
  4. Click Next. The Mapping Editor appears. Its grid lets users map their data from the source files to the target file. The top pane displays the hierarchy specified on the previous panes. Notice that the relationship expression created shows under the state entry. Clicking on either timezone or state shows the respective fields for each file in the lower source mapping panel.
  5. On the tool bar, click Automap and select By Field Name Match.
  6. Click Next. The Conversion Customization pane appears and displays a visual summary of the conversion process. The data for nodes can be customized by double- clicking the node.
  7. Click Next. The conversion execution pane appears, summarizing the conversion specification.

Save and Run the Conversion Specification

  1. From the File menu, select Save to save the conversion specification.
  2. In the Name field, type Tutorial4.
  3. In the Description field, type 2 Delimited Sources to XML Target.
  4. Select the Save with Userid/Password check box.
  5. Click OK.
  6. Click Run to start the conversion. The Execution Status Viewer window appears, displaying the progress of the conversion.
  7. Click Close to close the Execution Status Viewer window and return to the ConverterPro window.

Browse the Source and Target

To verify what was written to the target, compare the data in the Source Data Browser with the data in the Target Data Browser.

  1. Clickimage2021-8-27_15-38-17.png. The Source Data Browser appears, displaying source.timezone.dat and source.state.dat content. Click either file in the pane on the left to see the desired information on the right.
  2. Click image2021-8-27_15-38-47.png to open the XML file and view your data. The file is opened in your default internet browser.

Tutorial 4 is now complete.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*