MongoDB + JSON Part One

One NoSQL database connector included in Talend Open Studio for Big Data is MongoDB, and using it is quite simple. This tutorial will show the functionality of the different components by importing files into MongoDB from a JSON file, and then exporting a JSON file from MongoDB. This tutorial will be covered in two parts, with the second part being found here.

Step 1: View the file (JSON saved as .txt)

Download and review the file cars_example, it holds the JSON data that we will import into Mongo.

Step 2: Create your job

Create a new job in Talend Big Data, we will need it to utilize the tMongoDBOutput component, that will allow us to load data.

 

Step 3: Load the data from file

 

Create a tFileInputJSON component by searching it in the Palette and dragging it onto your Talend canvas.

Then, under component view, choose the file location you have saved to, and change the mapping. You will need a column to write into so click “Edit schema” and create a new column “dealership”. Then set the JSONPath query to pull everything inside the Dealership array with “$.Dealership[*]”.

Step 4: Pull data out of the JSON file

Because the information is stored in a JSON format, we can use tExtractJSONFields to pull out the information. Create this component and link from the input component with a Main row.

In its component view, we will extract the JSON fields. Click “Edit schema” and add in the six columns contained for each item under Dealership: make, model, year, color, miles, price. Then, save the schema.

Make sure the JSON field item is set to dealership (there shouldn’t be any other options).

Then, make sure the XPath queries are set to names of the items in the JSON file (the letters must be capitalized as they are in the file). The loop query should be an empty string, because in the last component we’ve already set the path to these items.

Step 5: Check your work

You’ll now want to add a tLogRow to make sure your data is coming through as predicted so simply add one to be used after your extract component.

If you run the job now, the data should output as follows:

 

Step 6: Load into MongoDB

Finally add in a MongoDBOutput component, and make sure its connected to the last component. Change the DB Version to whichever version of MongoDB you are currently using. This demo is set up to feed into Database Demos and Collections Cars. You should be able to sync the columns from the previous components.

Make sure MongoDB is running on your server and then run the job.

Check the database and collection to make sure your data is loaded!

In Part 2 we will demonstrate how to pull some of this data out and save it to a new JSON file.

Continue to Part 2

Leave a Reply

*

captcha *