Hadoop Online Tutorials



Scala, Spark & Kafka Course ContentsBy Siva Kumar BhuchipalliUnderstand the difference between Apache Spark and HadoopLearn Scala and its programming implementationWhy ScalaScala InstallationGet deep insights into the functioning of ScalaExecute Pattern Matching in ScalaFunctional Programming in Scala – Closures, Currying, Expressions, Anonymous FunctionsKnow the concepts of classes in ScalaObject Orientation in Scala – Primary, Auxiliary Constructors, Singleton & Companion ObjectsTraits and Abstract classes in ScalaScala Simple Build Tool – SBTBuilding with MavenSpark BasicsWhat is Apache Spark?Spark InstallationSpark ConfigurationSpark ContextUsing Spark ShellResilient Distributed Datasets (RDDs) – Features, Partitions, Tuning ParallelismFunctional Programming with SparkWorking with RDDsRDD Operations - Transformations and ActionsTypes of RDDsKey-Value Pair RDDs – Transformations and ActionsMapReduce and Pair RDD OperationsSerializationSpark on a clusterOverviewA Spark Standalone ClusterThe Spark Standalone Web UIExecutors & Cluster ManagerSpark on YARN FrameworkWriting Spark ApplicationsSpark Applications vs. Spark ShellCreating the SparkContextConfiguring Spark PropertiesBuilding and Running a Spark ApplicationLoggingSpark Job AnatomyCaching and PersistenceRDD LineageCaching OverviewDistributed PersistenceImproving Spark PerformanceShared Variables: Broadcast VariablesShared Variables: AccumulatorsPer Partition ProcessingCommon Performance IssuesSpark API for different File Formats & Compression CodecsTextCSVSequenceParquetORCCompression Techniques – Snappy, Zlib, GzipSpark SQLSpark SQL OverviewHiveContextSQL DatatypesDataframes vs RDDsOperations on DFsParquet Files with Spark Sql – Read, Write, Partitioning, Merging SchemaORC FilesJSON FilesInferring Schema programmaticallyCustom Case Classes Temp Tables vs Persistent TablesWriting UDFs Hive SupportJDBC Support - ExamplesHBase Support - ExamplesSpark StreamingSpark Streaming OverviewExample: Streaming Word CountOther Streaming OperationsSliding Window OperationsDeveloping Spark Streaming Applications – Integration with Kafka and HbaseKafkaKafka EcosystemOverviewProducerConsumerBrokerTopicsPartitionsKafka Twitter Data SetupWriting Producer in ScalaWriting Consumer in Scala & JavaKafka Integration with Spark StreamingReal use case – Integration of Kafka with Spark Streaming for processing Streaming Log files and Storing results into HbaseTotal Hours – 45-50 Hours Total Course Duration – 6 weeks ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download