PySpark with Kafka and Databricks Content
PySpark with Kafka and Databricks Content
Table of Content:
1. What is Python? 2. Python Setup
? Python3 Installation ? PyCharm Installation ? Jupyter-lab Installation 3. Data Types ? Numbers ? Variable Assignment ? Strings ? Strings Slicing and Indexing ? String Properties and methods ? Print Formatting with Strings ? Lists ? Dictionaries ? Tuples ? Sets ? Booleans 4. Variables
5. Comparison operators ? Comparison operators(,=,==,!=) ? Chaining comparison operators
6. Python Statements ? if, elif and else statements ? for loops ? while loops ? List Comprehension
7. Methods and Functions ? Introduction to function ? Basics of Functions ? Logic with python function ? Tuples unpacking ? Interactions b/w python functions ? Lambda Expressions(map, flatmap, filter functions)
8. Object Oriented Programming ? Introduction ? Attributes and Class Keyword ? Class object, Attributes, Methods
Address: DVS Technologies, Opp Home Town, Beside Biryani Zone, Marathahalli, Bangalore Phone: 9632558585, 8892499499|E-mail : dvs.training@|dvstechnologies.in
Page 1
? Inheritance and Polymorphism ? Magic/Dunder Methods 9. Modules and Packages 10. Error and Exception Handlings 11. Advanced Python ? Python I/O ? Reading and Writing to file and folder ? Collections Module ? DateTime Module ? Math and Random Module ? Logger Module ? Regular Expression Module ? Zipping and Un-zipping Module
12. Internals of Python
13. Pandas Module ? Core components of Pandas, Series and Data frames ? Processing data from CSV, Json, XML, Parquet, Database.
14. Work-Shop ? Mutli Processing Module (Optional/Workshop). ? Web Components (Optional/Workshop) o Flask o Flask Socket-IO
PySpark:
1.SparkCore ? Why Spark? ? Bird View of Spark Architecture
Spark Core: ? Abstractions in Spark.
2. RDD ? What is RDD? ? What are the different ways to create an RDD o parallelize o textfile o wholetextfile. ? What are RDD Partitions and there importance ? About RDD Parallelism
3. DAG ? Jobs
? Stages
Address: DVS Technologies, Opp Home Town, Beside Biryani Zone, Marathahalli, Bangalore Phone: 9632558585, 8892499499|E-mail : dvs.training@|dvstechnologies.in
Page 2
? Tasks 4. Transformations and Actions
? What are Narrow and Wide Transformations ? Understanding and working on different transformations and Actions
5. In-detail Understanding about Py-spark Architecture ? Overview of Pyspark Architecture ? Py4j Module ? Py4j Gateway Server ? Python Runner and Python Worker ? Compute method ? Understanding Pyspark Serializations and De-serializations o Marshall o Pickle
7. Variables ? Closure ? Broadcast ? Accumulator
8. Discussing Spark-Core optimizations techniques
PySpark-SQL:
1.Disadvantages of Pandas Dataframe ? What is Spark Dataframe ? Different ways of creating Dataframs. ? RDD to DF and DF to RDD ? Working with different data sources like CSV, XML, Excel, JSON, JDBC, Parquet, HUDI(Optional/Workshop) by using Different Spark SQL API's Select, where, groupby, case, otherwise, etc.
2.Join ? Hints ? Broadcast
Address: DVS Technologies, Opp Home Town, Beside Biryani Zone, Marathahalli, Bangalore Phone: 9632558585, 8892499499|E-mail : dvs.training@|dvstechnologies.in
Page 3
? Merge-sort ? Shuffle hash Join
3.Dataframe Persistence/Memory Management Techniques ? cache ? persist o MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2 o unpersist
4.Windowing operations in Spark ? What is window and different types of windows ? Time-based ? Offset-based ? Analytics functions: rank, dense rank, row number, lead, lag , ect ? Spark Catalyst Optimizer/ Spark Query Engine ? Parsed logical plan, Analysed logical plan, Optimized logical plan, Physical plan ? Explain method ? Adaptive Query Executions ? Optimizing Skew joins
5.Understanding concepts of YARN ? Deploying pyspark Applications in YARN in client and cluster modes ? Discussing spark deployment strategies o Static deployment o Dynamic deployment
6. Spark Streaming ? Understanding Kafka Concepts o Creating PyKafka producers and consumers
7.Understanding the concept of spark structed streaming and integrating kafka ? spark 8.AWS
S3(Datalake) EMR Lambda Athena 9. Databricks
9.Final Project
Address: DVS Technologies, Opp Home Town, Beside Biryani Zone, Marathahalli, Bangalore Phone: 9632558585, 8892499499|E-mail : dvs.training@|dvstechnologies.in
Page 4
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- pandas udf and python type hint in apache spark 3
- improving python and spark performance and
- dataframes github pages
- pyspark 2 4 quick reference guide wisewithdata
- research project report spark blinkdb and sampling
- python data engineer with pyspark
- intro to dataframes and spark sql github pages
- pyspark of warcraft europython
- cheat sheet pyspark sql python lei mao s log book
- pyspark with kafka and databricks content
Related searches
- mortgage calculator with taxes and insurance
- english words with meaning and sentence
- words with synonym and antonym
- dictionary with synonyms and antonyms
- passage with questions and answers
- mortgage calculator with taxes and insur
- mortgage calculator with taxes and pmi
- auto loan calculator with interest and tax
- comprehension with questions and answers
- car payment calculator with interest and tax
- help with grammar and writing
- mortgage calculator with pmi and taxes