Tutorial 5: Convert XML File to Delimited File Using an XML DTD


In this tutorial, users will learn how to convert an XML  file containing state information by time zone into a delimited file. The record layout describing the source XML file is defined by an XML Document Type  Definition (DTD) that the user edits. The target record layout is defined from the source record layout.

Important

In the record layout, dragging a field to another position within a record layout changes the name of the field. For example, Position will become Position_2. If moved again, it will become Position_3. The reason this happens is to prevent duplicate field names. Duplicate field names can occur when a field at one level has the same name as a field at another level. If both fields kept the same name and were moved to the same level, this could cause data errors. However, for automap to work correctly in the tutorials, Compuware recommends that users rename any fields where the name changes to the original field name.

Conversion Type: One-to-One.

After completing this tutorial, users will be able to do the following:

  • Define data source and data target connections.
  • Define a source record layout using an XML DTD.
  • Define a target record layout from the source record layout.
  • Map the source to the target using automapping.
  • Save and run the conversion specification.
  • Browse source and target data.

Source and Target Data Descriptions

Data Source

Connector Type: XML File

Source Data Location:

<File-AID/EX installation directory>\samples\Source.state_by_tz.xml

Source Record Layout Location:

<File-AID/EX installation directory>\samples\Source.state_by_tz.dtd

Following are the first few lines of Source.state_by_tz.xml:

<?xml version="1.0" ?>
<!DOCTYPE root SYSTEM "Source.state_by_tz.dtd">
<root>
    <timezone>
        <timezone_abbrev>CST</timezone_abbrev>
        <timezone_name>Central Standard Time</timezone_name>
        <state state_abbrev="AR">
            <state_name>Arkansas</state_name>
        </state>
        <state state_abbrev="IA">
            <state_name>Iowa</state_name>
        </state>
        <state state_abbrev="IL">
            <state_name>Illinois</state_name>
        </state>
        <state state_abbrev="IN">
            <state_name>Indiana</state_name>
        </state>
        ...

Data Target

Connector Type: Delimited File

File name and location: <your directory path>/Target.state_w_tz.dat

The output delimited file, Target.state_w_tz.dat, is comma delimited and consists of state information for each state together with its time zone abbreviation. The data format is as follows:

State_name, state_abbrev, timezone_abbrev

Define the Data Source and Source Record Layout

After starting ConverterPro, begin defining source data and connection information using the Data Sources tab.

  1. On  the  ConverterPro window, click image2021-8-27_15-58-22.png.
  2. From the Connector Type list, select XML. The XML pane appears.
  3. In the Connector Name field, type state_by_tz_xml.
  4. Click Browse, navigate to the <File-AID/EX installation directory>\samples\directory, select the source.state_by_tz.xml file, and click Open.
  5. Click Next. The Record Layout Selection pane appears.
  6. From the Derive Record Layout From list, select XML DTD to derive the record layout from an XML DTD.
  7. Click Browse, navigate to the <File-AID/EX installation directory>\samples\ directory, select Source.state_by_tz.dtd, and click Open. The Select File field populates with the path and file name.
  8. Click Next. The record layout table pane appears.

    Important

    Since the record layout was derived from an XML DTD on the previous pane, only the record layout derived from the XML DTD is displayed on this pane. The XML DTD shows that state is a child of timezone, which  is  a  child of root.

    Since the source record layout is derived from an XML DTD, ConverterPro validates the source XML data against the XML DTD. This validation requires that a DOCTYPE reference appear in the XML data document.

  9. Click Next. The Data Targets tab appears. The project tree view is updated with the data source (root) and the connector (state_by_tz_xml). The filename and path appear below the data source.

Define the Data Target and Target Record Layout

  1. From the Connector Type field, select Delimited File. The Delimited File pane appears.
  2. In the Connector Name field, type state_tz_delimited.
  3. In the File Name field, type <your directory path>/Target.state_w_tz.dat, or click Browse and navigate to <your directory pathand add file name Target.state_w_tz.dat. The File name field populates with the path and file name.
  4. From in the Target Action (Exist/Not Exist) list, select Recreate/Create.
  5. Click Next. The Record Layout Selection pane appears.
  6. From the Derive Record Layout From list, select Existing Source/Target if it is not already selected.
  7. From the Existing Record Layouts box, select root and click image2021-8-27_16-1-59.png  to copy it to the Selected Record Layouts box.
  8. Click Next. The delimited parser pane appears showing the source and target definitions in this project.

    Important

    The record layout for the target is constructed from the source definition via the steps that follow.

  9. Select the state_abbrev field and drag it to the state record. This makes state_abbrev    a child of state and removes it from the Attributes record.
  10. Right-click the Attributes record and select Delete to delete it.
  11. Select the timezone_abbrev field and drag it to the state record making it a  child of  the state record.
  12. Right-click the timezone_abbrev field and select Move Field > Move Down. Repeat this one time. This field is now the last field in the state record. Fields can also be dragged to the desired position.
  13. Right-click the state record and select Move Group > Make Sibling. Repeat this one time. This makes it a top-level record and puts it on the same level as root.
  14. Right-click root and selecting Delete to delete it.
  15. Rename any updated field names to the original field name. Automap is easier to use in this tutorial if the source and target fields names match.
  16. Verify that the state record Record Separator is set to CR-LF and the Field Separator property is set to comma (,).
  17. Click Next. The project tree view is updated to display the data target (state) and connector  (state_tz_delimited) and the first pane  of  the Mapping Editor  tab  appears.

Specify the Mapping  Information

The Mapping Editor lets users map data from the source XML file to the target delimited file.

  1. From the toolbar, click Automap and select By Field Name Match.
  2. Click Next. The Conversion Customization pane appears and displays a visual summary of the conversion process. The data for nodes can be customized by double- clicking the node.
  3. Click Next.

Save and Run the Conversion Specification

  1. From the File menu, select Save to save the conversion specification. The Save Conversion As dialog box appears.
  2. In the Name field, type Tutorial5.
  3. In the Description field, optionally enter a description.
  4. Select the Save with Userid/Password check box.
  5. Click OK.
  6. Click Run to start the conversion. As the conversion executes, the Execution Status Viewer appears, displaying the progress of the conversion.
  7. After the conversion finishes running, click Close to close the Execution Status Viewer.

Browse the Source and Target

To verify what was written to the target, compare the data in the Source Data Browser with the data in the Target Data Browser.

  1. Click image2021-8-27_16-5-38.png  . The XML data from your source appears in your default Internet browser.
  2. Click image2021-8-27_16-5-53.png    to view the target data. The Target Data Browser appears showing the fields in target.state_w_tz.dat.
  3. After reviewing the source and target data, close the browsers.

Tutorial 5 is now complete.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*