BigDataTech.IN Spark & Hadoop Syllabus



BigData Introduction and Hadoop FundamentalsData Storage and AnalysisComparison with RDBMSHadoop – A Brief HistoryHDFSBlocksNN & DNHDFS Federation & High AvailabilityHDFS ClientsHDFS Command LineHDFS CLI – File System Operations LabHDFS Web UIHDFS Java ClientHDFS Java Client – File System Operations LabCRUD Operations using Java ClientYARN – Cluster Management (Hadoop 2.x)How Yarn Applications run?YARN vs MapReduceYARN SchedulingCapacity, Fair Scheduler, FIFOMap Reduce MR Programming ModelInputFormatsOutputFormatsCompressionSerialization & Data TypesFile Based Data structuresSequence file, Map File, ORC, ParquetTuning MapReduce JobsAdvanced MapReduceJoins -Map-side, Reduce-sideDistributed CacheHiveComparison with RDBMSHQLData typesTablesImporting and ExportingPartitioning and Bucketing – Advanced.Joins and Join Optimization.Functions- Built in & user definedAdvanced Optimization of HQLStorage File Formats – AdvancedLoading and Storing DataSerDes – AdvancedSqoopIntroductionImport – Deep diveExport – Deep diveSqoop Optimization – Incremental LoadReal time scenariosFlumeConfigure Flume and Import dataArchitecture and LABOozieDifferent workflow jobsOoze scheduler.LABHBaseNoSQL databases IntroductionCAP theoremHBase ArchitectureHBase Clients – Java ClientLoadling DataHive – HBase IntegrationMonitoring the ClusterHortonWorks AmbariCloudera ManagerMapR MCSHUE, RM UIReal Time Project ArchitectureTerminology usedProduction implementationCont…..SPARK & SCALAScala BasicsLecture, Functional languageScala Vs JavaHands-OnStrings, NumbersList, Array, Map, SetControl Statements, collectionsFunctions, methodsPattern matchingSpark OverviewLectureThe power of Spark?Spark EcosystemSpark Components vs HadoopHands-OnInstallation & Eclipse configurationPrograms in Command line Interface & EclipseProcess Local, HDFS filesRDD FundamentalsLecture, Purpose and Structure of RDDsTransformations, Actions, and DAGKey-Value Pair RDDsHands-OnCreating RDDs from Data FilesReshaping Data to Add StructureInteractive Queries Using RDDsSparkSQL and DataFramesLectureSpark SQL and DataFrame UsesDataFrame / SQL APIsCatalyst Query OptimizationHands-onCreating (CSV, JSON) DataFramesQuerying with DataFrame API and SQLCaching and Re-using DataFramesProcess Hive data in SparkSpark StreamingLecture, Streaming SourcesDStream APIs and Stateful StreamsHands-OnCreating DStreams from SourcesOperating on DStream DataStructured StreamingKafka Kafka introductionInstallationKafka integration with SparkIntegration with FlumeLabs:Covers All Certification SyllabusReal Time use cases and Data sets coveredWord count, Sensors(Weather Sensors)Dataset, Social Media data sets like YouTube, Twitter data analysis, Unix Basics LabSparkSQL, Hadoop, Hive, Sqoop, Oozie, HBase, Flume Installations –Pseudo ModeMaster Projects:Real-time BigData EDW Real-time Streaming ApplicationReal-time concepts covered areSpark SQL, SCALAHive - Advanced topicsSqoop import/exportOozie SchedulingHow Hadoop MR used in DWRDBMS concepts, ETL tool concepts, Integration with Reporting tools ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download