1. Python Script

[Pages:6]Tutorial for MongDB CRUD and Join Operations in Python Script as Client CIS 612

By Asanka Kavinda Mananayaka

1. Python Script

"""********************************************************************** Join in MongoDB Author: Asanka K. Mananayaka Lab_4_2 - Part2 Due Date: April 19, 2019 **********************************************************************"""

import os from pymongo import MongoClient import json

DATABASE = 'YelpData' BUSINESS_TB = 'Business' REVIEWS_TB = 'Reviews' JOINED_TB = 'BusinessReviews'

def main(): """ YELP business data extraction. """

# Connect to MongoDB client = MongoClient(port=27017)

# Create database if it doesn't exist db = client[DATABASE]

# Read JASON file and store in MongoDB import_business_data(db)

# Read JASON file and store in MongoDB import_review_data(db)

# Join business data with review data join_business_and_reviews(db)

def import_business_data(db): """ Imports business data. """

file_path = ("C:/Users/Kavinda/Desktop/CIS612/YelpData/" "UnzippedFiles/business100ValidForm.json")

# Return if the table already exists if BUSINESS_TB in db.list_collection_names():

print (BUSINESS_TB, " table already exists!") return

# Read JSON file with open(file_path, encoding='utf8') as f:

data = json.load(f)

record_count = 0

# Insert JSON records for business in data["Business"]:

result = db[BUSINESS_TB].insert_one(business) record_count = record_count + 1

print ('No. of business documents inserted :', record_count)

def import_review_data(db): """ Imports review data. """

file_path = ("C:/Users/Kavinda/Desktop/CIS612/YelpData/" "UnzippedFiles/review.json")

# Return if the table already exists if REVIEWS_TB in db.list_collection_names():

print (REVIEWS_TB, " table already exists!") return

record_count = 0

# Insert JSON records into mongoDB with open(file_path, encoding='utf8') as file:

for line in file:

print ('Loading: ', record_count) review = json.loads(line) db[REVIEWS_TB].insert_one(review) record_count = record_count + 1

print ('No. of review documents inserted :', record_count)

def join_business_and_review_data(db): """ Join review data for each business. """

# Return if the table already exists if REVIEWS_TB in db.list_collection_names():

print (REVIEWS_TB, " table already exists!") return

record_count = 0

# Insert JSON records into mongoDB with open(file_path, encoding='utf8') as file:

for line in file: review = json.loads(line) db[REVIEWS_TB].insert_one(review) record_count = record_count + 1

print ('No. of review documents inserted :', record_count)

def join_business_and_reviews(db): """ Join review data for each business. """

# Mongo join query mongo_query = [ {

"$lookup": {

"from":"Reviews", "localField":"business_id", "foreignField":"business_id", "as":"Reviews" } }, { "$project": { "business_id":1, "name":1, "city":1, "state":1, "longitude":1, "latitude":1, "stars":1, "categories":1, "review_count":1, "neighborhoods":1, "Reviews":{"business_id":1, "review_id":1, "user_id":1, "stars":1, "text":1} } } ]

# Submit query and create new table with the join results for business in db[BUSINESS_TB].aggregate( mongo_query ):

db[JOINED_TB].insert_one(business)

if __name__ == "__main__": main()

2. Importing Business and Review Data

After executing the functions `import_business_data()' and `import_review_data ()', Mongo DB should have two databases called `Business' and `Reviews'.

3. Importing Business and Review Data

The current list of 100 businesses is out-of-date and does not contain any matching reviews in the reviews collection. Therefore, a business id from one of the `Reviews' collection was copied to the `Business' collection so that the join query will produce at least one join. After executing the function `join_business_and_review_data()' a third collection called `BusinessReviews' should appear in the database.

Following screenshot shows a an array of reviews appearing under the business with an business id of `NZnhc2sEQy3RmzKTZnqtwQ':

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download