Iterate to load multiple excel files

This post is an extension of our previous post here – Import an Excel file into Access .

Today we are going to show you how to build on this job in order to iterate upon a group of excel files in a specified file directory, enabling you to automate the bulk loading of multiple excel files into a MS Access database table.

If you haven’t yet read the previous tutorial please read it here before proceeding.

Step 1: Set up the File Directory for Iteration

If you recall, the job we just built should look like this:

This job is great but it is only set up to process a specific Excel file which we designate. Our next step is to extend this job so that it will process multiple excel files of the same format from a specified directory.

Select the component tFileList from the Palette and drag it into the job design window

Click on the tFileList component and then click on the “Component” tab near the bottom center of the screen

Under “Directory” , select the directory that contains the excel files by clicking on the “…” button.

 

Step 2: Connect tFileList

Right click on the tFileList component, select Run -> Iterate

Connect this “Iterate” link to the tFileInputExcel component

 

Step 3: Edit the tExcelInput field

Select the tFileInputExcel component and then navigate to the Component tab near the bottom middle of the screen

Change the Property type from “Repository” to “Built In”

Under “File name / Stream” delete the filepath string

Type in tfilelist then press cntrl + space

Select tFileList_1.CURRENT_FILEPATH

Step 4: Run the job

Congratulations! You are now ready to run your job.

Make sure you have multiple excel files in your directory

Select the “Run” tab near the bottom middle of the screen and click the “Run” button

Now you will see all your files are processed one by one and loaded into the Access database

 

 

ETL Advisors is a leading data integration consulting firm consisting of certified Talend development staff specializing in Talend open studio and enterprise data integration development.
  1. Pingback: ETL Advisors » Import an excel file into ms access

  2. Adil Reply

    Hi,
    I’m new in Talend and I have 4 different files in the same folder. These files must be stored in 4 different tables.

    Please help me to find a solution

    • admin Reply

      Hi Adil,

      Thanks for your question. Do your 4 different files utilizing a standard naming convention? ie: 2013_04_10_Products, 2013_04_10_Accounts etc? If so then you can use this file naming convention to separate each of the files into the correct processing route. Please let me know your file naming convention and I will post a solution for you?

      Thanks!

  3. Alexis Reply

    Thanks admin. It make sense, but what happened if there is not any standard naming convention. On the other hand, how I can delete each processed file, in my case this files must be replaced with daily new data.

  4. admin Reply

    Hi Alexis, If the files in the directory have no standard naming convention. You can set a file mask in the tFileList component of “*” which should pick up each file in the directory and iterate on each one. Then by using this variable”tFileList_1.CURRENT_FILEPATH” as described in the tutorial, the file name will be dynamically generated based upon the current file from the tFileList component.

    To delete files, you can use tFileDelete.

    Does that answer your question?

  5. Ohad Reply

    Hi,
    How can I produce output files in with custom convention ?
    Suppose I’m using tBase64 component, and I need to save each binary to a location in that convention:
    CurrentDate + “_” + GUID + “_” + “name from xpath in source xml” + “.pdf”

    Please refer to this post:
    http://www.talendforge.org/forum/viewtopic.php?pid=134951#p134951

  6. admin Reply

    Hi Ohad,

    It sounds like you are trying to read multiple elements from the XML source and save each record into a separate output document. If that is correct I suggest the following approach:

    tFileInputXML (extract binary and source name for output file) -Main-> tBase64 (convert binary) -Main-> tFlowToIterate (iterate on each row) -Iterate-> tSetGlobalVar (set global variable for the name from xml source and the binary string) ->OnComponentOk->tFixedFlowInput (input name and binary) -Main->tFileOutputDelimited.

    In file name for the output file you can use (String)globalMap.get(“paramName”) or similar to get the global variable parameter from the tSetGlobalVar component.

    For information on using CurrentDate in file name, see this guide: https://help.talend.com/display/TalendOpenStudioforBigDataUserGuide520EN/10.5+TalendDate+Routines

  7. Ohad Reply

    Hi admin,
    Thanks for your kind answer. That really helps !
    I will try to implement your solution during the weekend.

    Just one more question before that : why using “tFileOuputDelimited” for a binary file ?
    I know that “tFileOutputDelimited” suppose to be used when handling csv and txt files…
    Shouldn’t I use “tFileOutputRaw” instead ?

    • admin Reply

      Hi Ohad, Yes you can use the tFileOutputRaw in the same manner as I described the tFileOutputDelimited.

  8. Ohad Reply

    I’ve also noticed that “tFixedFlowInput” requires number of lines to be generated, and that number is not constant. What I DO know that: number of input rows = number of output rows.

    • admin Reply

      The tFixedFlowInput would be used to process each record after it has been stored in the global variable, so you would be processing one row at a time. Does that resolve your question?

  9. Ohad Reply

    Hi admin, that almost resolve my question.
    I can’t figure out how to “set global variable for the name from xml source and the binary string” in the “tSetGlobalVar” shape.
    In all of the web documentation i’ve found, I didn’t see any example of setting a global var using xpath on incomming xml.

  10. Ohad Reply

    Here is the image of my process:
    http://postimg.org/image/iaenwfp77/

    I’m having trouble to understand what I should write in the properties of these components.

  11. Ohad Reply

    Hi,
    I’m new to talend and your new tutorial really helped me! Hope you’ll write more of those !
    My pdf process now works fine, but I’ve added “tMap” after “tFileInputXML” at the beginning, because “tBase64” tries to run “getBytes()” for every column it gets.
    For instance: I have an integer on my incomming xml, so the build failed because “tBase64” can’t execute “getBytes()” on it.
    As said, I’ve solved it by “tMap” (mapping only “pdf” column to “tBase64”).
    Thanks a lot.

    Are you available to occasionally questions via e-mail ?

  12. admin Reply

    Hi Ohad, glad to hear you were able to determine a solution.

    The best forum for general questions is probably in the comments section of our site or on the Talend Forge forum. Otherwise, please reach out directly via email or our contact form if you would like to hire any of our consulting resources for a specific project.

    All the best.

  13. Hanuman Reply

    i want to multiple files as input and processed with one process and need the output in multiples after completion process.

  14. Jeevitha Reply

    Hi Admin,

    1. using tfilelist i can able to pass multiple files as input. But i need all the files as seperated output through one processing. I am getting only one output file. kindly sugeest me an option.
    2. Next question is how to pass an unique id at runtime. Because i am inputting multiple files so i need to find which belongs to which id. kindly suggest me an option for this also

  15. admin Reply

    Hi Jeevitha,

    1. You should connect your components in this order: tFileList-(Iterate)->tFileInputDelimited-(Main)->tFileOutputDelimited. The file name for tFileInputDelimited should be dynamic, ie : ((String)globalMap.get(“tFileList_1.CURRENT_FILEPATH”), you should use a similar method for the tFileOutputDelimited component – using file name that contains: ((String)globalMap.get(“tFileList_1.CURRENT_FILE”).

    2. To pass variables to the job at runtime I suggest utilizing context variables. Here is further details on context variables: http://www.talendforge.org/tutorials/tutorial.php?idTuto=34

  16. Jeevitha Reply

    Thanks admin i have tried option 2 . its working perfectly. Thank you so much for the response

  17. Jeevitha Reply

    Hi Admin,

    1. I have input file name called in_123.xlsx and in_234.xlsx . output name will be out_123.xlsx if i pass a context variable value =123 during runtime. is there any option to take that 123 from the input file name itself as value to pass while in runtime without passing context value manually

  18. admin Reply

    Hi Jeevitha, in this case I would not use context variables but instead dynamically retrieve the file name from the tFileList component. You can access the file name with the global variable:

    ((String)globalMap.get(“tFileList_1_CURRENT_FILE”))

    In order to extract the _123.xlsx value from this variable you could use a tJava component with some code like this:

    String infile = ((String)globalMap.get(“tFileList_1_CURRENT_FILE”));
    int lastUnderscore = infile.lastIndexOf(“_”);
    int strLength = infile.length();
    String outfile = infile.substring(lastUnderscore, strLength);
    String outfileName = “out_”+outfile;
    globalMap.put(“outputFileName”, outfileName);

    Then in the tFileOutput component use this for the file name:
    ((String)globalMap.get(“outputFileName”))

    This code is just an example but should give you the general idea.

  19. Jeevitha Reply

    Hi Admin,

    In tjava component i have some variables and values like a=20, b=12345. in my output excel file has 1000 rows with 3 columns.I want five columns in total i want to print a value in 4th column and b value in 5th column for all the 1000 rows.can you clear me how to pass the value of a and b into output file.

  20. Jeevitha Reply

    Hi Admin,

    I have a variable in tjava component .how to access that variable and put into add column in tmap .

  21. Neha Reply

    Hi Admin,
    I have successfully iterated the files from the folder. Now suppose I want the files from this folder which have been read to be moved to another archive folder, how can I achieve so?
    Please help!

  22. sree Reply

    Hi,

    Do you have any tutorial to read dynamically from a source database to load to target database.
    Source and target are SQL server.
    There are 100’s of tables in source and Target. Instead of developing 100’s of jobs, is There a way to read table name and load the target using single job.
    Keys in target could be different from table to table. Target tables need to be loaded on incremental basis.

  23. K.C.ARATHI Reply

    I am new to talend. In my work i need to extract data from various files which will be in a folder. every day new files will get added. But how to take the newly added excel files inside the built job. each time when i trigger the job the data should get extracted even from newly added files.

Leave a Reply

*

captcha *