Parsing Xml Loops
This tutorial will show how simple it is to parse those deeply-nested XML loops and then simplify the data into one row that is much easier to read. The bulk of the job is set up before it is even created, in Talend’s XML metadata creator, but read on to find out how it works.
Create the xml document, using the below screenshot. The associated file has two repeat sections, PERSON and FRIEND. Good xpath is normally used to extract these, but Talend does all of that work for you.
We want to look in the Talend repository for “File XML” under “Metadata”. Right-click it and create new file xml.
Under “File Settings” select the XML data from where you have it saved, and Talend will show you basic structure.
In one of the last steps of File creation, Talend will attempt to create the XPATH values for you by asking where you want to loop, and what fields to extract. Because we want to loop at the lowest level, it is best to select FRIEND as the element to loop on. So drag “FRIEND” to the space underneath “Absolute XPath expression”.
After that, drag all three fields into field to extract, and the XPATH will once again be auto-generated.
The preview at the bottom will show the output as if it was sent directly to a tlogrow component afterwards.
Finish this up and your metadata will be saved in the repository.
Create a new job, and drag in the metadata you created from the repository into the canvas. All of its settings have been created already from your previous work.
We want to organize the information to be more readable so add in a tDeNormalize into the canvas, as well as a tLogRow component, and connect them all as follows:
Next, select the tDenormalize component and open the component tab for it. Change the column to denormalize to “FRIEND” and the Delimiter to “,”. This will combine all the FRIEND columns that share the same other columns in their respective rows.
If that didn’t make sense, simply run the job to see the output. Notice how instead of 9 rows, you now have three with all the friend names combined into their own column, separated by a comma.
Now you know how easy it is to do complex XPATH queries in Talend, without writing any XPATH functions!