Tutorial 5: Convert XML File to Delimited File Using an XML DTD
In this tutorial, users will learn how to convert an XML file containing state information by time zone into a delimited file. The record layout describing the source XML file is defined by an XML Document Type Definition (DTD) that the user edits. The target record layout is defined from the source record layout.
Conversion Type: One-to-One.
After completing this tutorial, users will be able to do the following:
- Define data source and data target connections.
- Define a source record layout using an XML DTD.
- Define a target record layout from the source record layout.
- Map the source to the target using automapping.
- Save and run the conversion specification.
- Browse source and target data.
Source and Target Data Descriptions
Data Source
Connector Type: XML File
Source Data Location:
<File-AID/EX installation directory>\samples\Source.state_by_tz.xml
Source Record Layout Location:
<File-AID/EX installation directory>\samples\Source.state_by_tz.dtd
Following are the first few lines of Source.state_by_tz.xml:
<!DOCTYPE root SYSTEM "Source.state_by_tz.dtd">
<root>
<timezone>
<timezone_abbrev>CST</timezone_abbrev>
<timezone_name>Central Standard Time</timezone_name>
<state state_abbrev="AR">
<state_name>Arkansas</state_name>
</state>
<state state_abbrev="IA">
<state_name>Iowa</state_name>
</state>
<state state_abbrev="IL">
<state_name>Illinois</state_name>
</state>
<state state_abbrev="IN">
<state_name>Indiana</state_name>
</state>
...
Data Target
Connector Type: Delimited File
File name and location: <your directory path>/Target.state_w_tz.dat
The output delimited file, Target.state_w_tz.dat, is comma delimited and consists of state information for each state together with its time zone abbreviation. The data format is as follows:
Define the Data Source and Source Record Layout
After starting ConverterPro, begin defining source data and connection information using the Data Sources tab.
- On the ConverterPro window, click
.
- From the Connector Type list, select XML. The XML pane appears.
- In the Connector Name field, type state_by_tz_xml.
- Click Browse, navigate to the <File-AID/EX installation directory>\samples\directory, select the source.state_by_tz.xml file, and click Open.
- Click Next. The Record Layout Selection pane appears.
- From the Derive Record Layout From list, select XML DTD to derive the record layout from an XML DTD.
- Click Browse, navigate to the <File-AID/EX installation directory>\samples\ directory, select Source.state_by_tz.dtd, and click Open. The Select File field populates with the path and file name.
Click Next. The record layout table pane appears.
- Click Next. The Data Targets tab appears. The project tree view is updated with the data source (root) and the connector (state_by_tz_xml). The filename and path appear below the data source.
Define the Data Target and Target Record Layout
- From the Connector Type field, select Delimited File. The Delimited File pane appears.
- In the Connector Name field, type state_tz_delimited.
- In the File Name field, type <your directory path>/Target.state_w_tz.dat, or click Browse and navigate to <your directory path> and add file name Target.state_w_tz.dat. The File name field populates with the path and file name.
- From in the Target Action (Exist/Not Exist) list, select Recreate/Create.
- Click Next. The Record Layout Selection pane appears.
- From the Derive Record Layout From list, select Existing Source/Target if it is not already selected.
- From the Existing Record Layouts box, select root and click
to copy it to the Selected Record Layouts box.
Click Next. The delimited parser pane appears showing the source and target definitions in this project.
- Select the state_abbrev field and drag it to the state record. This makes state_abbrev a child of state and removes it from the Attributes record.
- Right-click the Attributes record and select Delete to delete it.
- Select the timezone_abbrev field and drag it to the state record making it a child of the state record.
- Right-click the timezone_abbrev field and select Move Field > Move Down. Repeat this one time. This field is now the last field in the state record. Fields can also be dragged to the desired position.
- Right-click the state record and select Move Group > Make Sibling. Repeat this one time. This makes it a top-level record and puts it on the same level as root.
- Right-click root and selecting Delete to delete it.
- Rename any updated field names to the original field name. Automap is easier to use in this tutorial if the source and target fields names match.
- Verify that the state record Record Separator is set to CR-LF and the Field Separator property is set to comma (,).
- Click Next. The project tree view is updated to display the data target (state) and connector (state_tz_delimited) and the first pane of the Mapping Editor tab appears.
Specify the Mapping Information
The Mapping Editor lets users map data from the source XML file to the target delimited file.
- From the toolbar, click Automap and select By Field Name Match.
- Click Next. The Conversion Customization pane appears and displays a visual summary of the conversion process. The data for nodes can be customized by double- clicking the node.
- Click Next.
Save and Run the Conversion Specification
- From the File menu, select Save to save the conversion specification. The Save Conversion As dialog box appears.
- In the Name field, type Tutorial5.
- In the Description field, optionally enter a description.
- Select the Save with Userid/Password check box.
- Click OK.
- Click Run to start the conversion. As the conversion executes, the Execution Status Viewer appears, displaying the progress of the conversion.
- After the conversion finishes running, click Close to close the Execution Status Viewer.
Browse the Source and Target
To verify what was written to the target, compare the data in the Source Data Browser with the data in the Target Data Browser.
- Click
. The XML data from your source appears in your default Internet browser.
- Click
to view the target data. The Target Data Browser appears showing the fields in target.state_w_tz.dat.
- After reviewing the source and target data, close the browsers.
Tutorial 5 is now complete.