Sometimes integrating Talend with Excel data files can be particularly challenging, in this post we are going to walk you through the basic steps required to get started on build a job to load a MS Excel file into an Access database.
You can download the example excel file used in this tutorial from here:
Lets assume we are trying to load data from an excel file that contains the following fields:
- Salesman Name
- Salesman Id
- Account Id
- Number of Units
Step 1: Open Talend
Open Talend and create or open an existing project
Step 2: Create a new job
- Right click on Job Designs in the Repository window and select “Create job”
- Name the job “excel_load_example”
Step 3: Create a Ms Excel file repository element
Now we need to create a repository item for our excel file. To do this click the arrow next to “Metadata” in the Repository window and right click on “File Excel” and select “Create file Excel”.
Enter a name for your example file schema and click next:
Select your example file that we saw earlier (or use your own) by clicking the “Browse” button.
Then select which work sheets in the Excel file contain the data we want to use for our schema.
On the next screen select the checkbox for “Set heading row as column names”, then click “Refresh Preview”. Click Next.
Talend has now generated an estimated schema, review this schema and make any changes as you would like, then click Finish.
Step 4: Create an Access Database Table Repository Element
In this next step we are going to create a repository element for the Access database table schema so that we can use this in our job design.
Right click on “DB Connections” under “Metadata” in the Repository window and select “Create connection”
Name your database then click next:
Select DB Type as “Access” and DB Version as “Access 2007” or “Access 2003” if you are using that version.
Click on the “Browse” button and navigate to your Access database file.
Once selected then click on the “Check” button to make sure that Talend is able to connect to your DB.
If the connection works then click “Finish”
Now the DB Connection should appear under “Repository” -> “Metadata” -> “Db Connections” in the panel on the left
Right click on the DB name in this panel and and select “Retrieve schema”
On the pop up screen, click “Next”
On the following screen, select the “sales” table from our database and click “Next”
The final window will display the schema of your database table, which can now be used in the repository for future job design. Review the schema and make any changes if needed, then click “Finish”.
Step 5: Design your job
In this step we are are going to design our job to connect our MS Excel file to our MS Access DB.
Open the job we created in Step 2 by double clicking the name of the job under Job Designs in the Repository window.
In the palette window on the right hand side type “fileinputexcel” into the search box, then drag and drop the component “tFileInputExcel” into the job window in the center of the screen.
Select the tFileInputExcel Component in the job window and then select the “Component” tab near the bottom middle of the screen.
Click the drop down box next to “Property type” and select “Repository”
Then click the button with three dots that appears and select the Excel file from the repository that we created in Step 3.
If the file is in Excel 2007 format then click the check box “Read excel 2007 file format (xlsx)”
Select a tMap component from the palette on the right hand side of the screen and drag it into the job design window
Right click on the tFileInputExcel component, select Row->Main and connect a row to the tMap component.
Select a tAccessOutput component from the Palette and drag it over to the job design window.
Click on the tAccessOutput component to select it, then navigate to the Component tab in the lower middle of the screen
Under Property Type, select Repository and then select the access database from the repository that we created in Step 4
Under Schema, select Repository and then select the “sales” table from our database in the repository
Next right click the tMap component, Select “Row” then “New Output”.
Connect this new output to the tAccessOutput component
Name the output row- “output1”
A popup window will appear asking you: “Do you want to get the schema of the target component” – click “Yes”.
Now your tMap is connected to the tAccessOutput component, double click the tMap component to open up the tMap editor.
Click and drag the data fields from the “row1” panel to the left of the screen to the corresponding “output1” data fields. This tMap editor enables you to map to fields from the Excel input to the correct output fields in Access.
Click “Apply” then “OK” to save the changes and return to the Job Design window
Now your job design is complete and we just need to run the job to load the file into Access
Step 6: Run your Job
In the group of tabs in the lower middle of the screen, select the “Run” tab
Under Execution, click the “Run” button
Your job will load and then run and you will see how many rows were processed in the job design window.
Congratulations! You have now loaded your data into your Access DB table.
Check out the next tutorial in our series which will show you how to extend this job so that you can iterate over a large group of excel files and load them all into an Access db table.