CCA175 : Practice Questions and Answer

CCA175 : Practice Questions and Answer

Total 61 Scenarios



Check Sample Paper

Access all Question paper (Only paid subscription)

About Cloudera CCA175 (Scala) : Hands-on Practice Scenario on CDP with Spark 2.4 Certifications Preparation Kit : Total 61 Solved scenarios Cloudera Certification CCA175 (Hadoop & Spark Developer) is one of the most popular and demanding certification in the BigData worlds and since inception HadoopExam is providing certification preparation material for BigData world and as part of our technical team hard work, we are able to provide the certification preparation material for the new version of CCA175 and this time we have separated the certification preparation material for the Scala and PySpark version. This is a Scala version certification preparation material. As you might be knowing Cloudera has changed their BigData platform and this is no more only Hadoop framework but this is Integrated Software which can run in Public Cloud as well as On-prem Data Center. And also to let you know Cloudera had discontinued their QuickstartVM as of now and there is no VM available for practicing the CCA175 certifications. HadoopExam technical team created a complete guideline and providing the same with this certification preparation material to set up single node Hadoop Cluster instance in the Google Cloud (Google Cloud provide $300 free credits and that is good enough for practicing the CCA175 scenarios). In this step by step guide you would learn how to setup the instance and data on that instance for practicing the scenarios. Each and every scenario we are providing as part of Practice Scenario Test is tested and executed on that single Node CDP cluster. To access all the scenarios, you don't have to install any software, you can access questions and answer with our Browser based Simulator. It is a Hands-on exam, which candidate have to complete in two hours. You will be given 10-12 tasks and have to complete at least 70% to clear the exam. Being a hands-on exam, it has more credibility in industry. In this practice set we have covered the entire syllabus as well as executed on new Cloudera platform CDP. It includes in depth complex scenarios where as you move ahead complexity of the scenarios increases. We are in process of adding complementary videos for selected problems given in online simulator. Practice and Sample Problem with its solutions will be provided in HadoopExam Online Simulator only. You can check the sample paper below.

Exam Instructions

? Cloudera CCA175 Hadoop (CDP) &Spark 2.4 in Scala ? Assessment-1-20 ? Please accomplish all the assessment, given in next pages. ? This assessment questions are valid for Cloudera Hadoop & Spark Developer

Certification CCA175 which is based on CDP and Spark 2.4. ? There are currently total 20 assessment in this set. ? As on when you move further, exercise complexity would be increased. ? To access the data, a separate link provided in each exercise. ? This assessment is not time bounded, but in the real exam it would be certainly.

As of today, real exam is 120 Mins, which include section as below and there are no separate timeline given for individual exercise. ? Assessment: In this case you have to implement problem solution on the Cloudera CDP (Single Node) platform. ? Before appearing in real exam, please check or drop an email to hadoopexam@, so if there is any update, we will share with you. ? Once you appear the exam, please share your feedback, question patterns and what we should improve etc. So that future learners can get benefit from your feedback. And we can improve and provide better material to you as well.

Exercise-1

Problem Statement: You have to create data or files from the given dataset (Check Data Tab to access and download the data).

? hecourses.json ? students.csv

Based on that please accomplish the following activities.

1. Create this two files in local directory and then upload the hdfs under the spark4 directory.

2. Use the inbuild schema for the hecourses.json file and create a new Dataframe from this.

3. Define a new schema for the students.csv as given below column name. a. StdID b. CourseId c. RegistrationDate

4. Using the above schema create a DataFrame for the "students.csv" data. 5. Find the list of the courses using both the dataframe which is/are not yet

subscribed and then save the result in the "spark4/notsubscribed.json" directory. 6. Find the total fee collected by each course category. The column name of the

total fee collected field should be "TotalFeeCollected"

7. Save the result in the "spark4/TotalFee.json"

Below is the Data for the Exercise

//File Contents for the hecourses.json

[{ "CourseId": 1001, "CourseFee": 7000, "Subscription": "Annual", "CourseName": "Hadoop Professional Training", "Category": "BigData", "Website": ""

}, {

"CourseId": 1002, "CourseFee": 7500, "Subscription": "Annual", "CourseName": "Spark Professional Training", "Category": "BigData", "Website": "" },{ "CourseId": 1003, "CourseFee": 7000, "Subscription": "Annual", "CourseName": "PySpark Professional Training", "Category": "BigData", "Website": "" }, { "CourseId": 1004, "CourseFee": 7000, "Subscription": "Annual", "CourseName": "Apache Hive Professional Training", "Category": "Analytics", "Website": "" },{ "CourseId": 1005, "CourseFee": 10000, "Subscription": "Annual", "CourseName": "Machine Learning Professional Training", "Category": "Data Science", "Website": "" }, { "CourseId": 1006, "CourseFee": 7000, "Subscription": "Annual", "CourseName": "SAS Base", "Category": "Analytics", "Website": "" }]

//File Contents for the students.csv

ST1,1004,20200201 ST1,1003,20200211 ST2,1002,20200206 ST2,1001,20200204 ST3,1004,20200202 ST4,1003,20200211 ST6,1004,20200207 ST7,1005,20200202 ST9,1003,20200206 ST9,1002,20200209 ST3,1001,20200208 ST2,1004,20200207 ST1,1005,20200201 ST2,1003,20200204

Solution:

Step-1: Create the json file locally.

mkdir spark4

cd spark4

vi hecourses.json

vi students.csv

Step-2: Now upload this file to hdfs.

//Go to home directory

cd ~

//Upload spark4 directory to hdfs

hdfs dfs -put -f spark4

//Check whether data have been uploaded or not.

hdfs dfs -ls spark4

Step-3: Read this file in Spark as a DataFrame

//We should note that this is a multipline file.

//So need to provide option accordingly

val heCourseDF = spark.read.option("multiline","true").json("spark4/hecourses.json")

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download