Using Multischema Files
Sometimes in file processing, there are cases where there are multiple schemas within the one file. This tutorial explains how to separate out these files in Talend to keep all the items distinct. This tutorial will also use the file demo_ex - see below - as an example.
Step 1: Create The File
The file used in this tutorial will have the schemas identified by the words ALPHA, BETA, and GAMMA. Each one corresponds to a different type, and tells the users how they should interpret the data. ALPHA contains quidditch player information, BETA info about house artifacts, and GAMMA a school and its location.
The best way to sort all these mixed up fields is with a tFileInputMSDelimited component. This will allow us to define the file as Multi-Schema, and choose the delimiter we want to separate each item by. Search for the item in the Palette and add it onto the canvas.
Now, select the component and under the “Component” tab, click the “…” button next to “Multi Schema Editor” to open the editing page.
The very top of the editing page will ask for the file, so you’ll need to Browse and select it.
After the file is selected, the editor will show you how the fields will be separated, based on the default delimiter, though the delimiter can be changed. However, in this example, leaving it to the default value of “;” is acceptable.
The first column is chosen as default to have codes that distinguish the schemas, and this is also correct for the chosen file. Click the “Fetch Codes” button on the bottom right to created different schemas based on the codes.
It is then a good idea to change the Schema names to something more useful, so add whatever names make sense to you, though A, B, and C are still acceptable. Also, if you want a different delimiter to be selected for each schema, you can change that here too.
Next, rename each field to what it describes for each schema. This can be done by selecting the Schema names under “Fetch Codes”, and then either clicking the “Edit Columns” button, or scrolling to the bottom of the editor and entering the values of fields there.
Click “OK” to exit the editor.
To view our output, let’s create three logrow components and drag them onto the canvas. Then right-click the tFileInputMSDelimited component and select one of the multiple row options you created to feed into each logrow component.
We want our results to be a little more contained, so to make sure you can distinguish each row’s output, go to the “Component” tab of each tLogRow and change the Mode from Basic to Table.
All that’s left to do now is run the job. Open the “Run” tab and click the “Run” button under “Execution”.
Each Schema should now be separated into its own organized table.