School of Computer Engineering, KIIT UNIVERSITY Software ...



|Program(s) |B Tech [CSE] |Academic Session, Semester |Autumn, 2016 , |

| | | |7th Sem |

|Subject Name |Data Analytics |Subject Code |IT- 3002 |

Teachers :

Dr. Bhabani Shankar Prasad Mishra, Dr Siddharth Swarup Rautaray, Dr Manjusha Pandey

IT-3004 DATA ANALYTICS Cr-4

COURSE OBJECTIVES

– To understand the concept of data analytics

– To explore tools and practices for working with big data

– To understand how data analytics can leverage into a key component

– To understand how to mine the data

– To learn about stream computing

– To know about the research that requires the integration of large amounts of data

COURSE OUTCOMES

– Identify the need for data analytics for different domains

– Performing analysis of data using R tool.

– Use of Hadoop, Map Reduce Framework

– Apply data analytics for a give problem

– Contextually integrate and correlate large amounts of information automatically to gain faster insights

SYLLABUS

INTRODUCTION TO BIG DATA                                                (9Hrs)

Importance of Data, Characteristics of Data  Analysis of Unstructured Data, Combining Structured and Unstructured Sources. Introduction to Big Data Platform – Challenges of conventional systems – Web data – Evolution of Analytic scalability, analytic processes and tools, Analysis vs reporting – Modern data analytic tools,  Types of Data, Elements of Big Data, Big Data Analytics, Data Analytics Lifecycle. Exploring the Use of Big Data in Business Context, Use of Big Data in Social Networking, Business Intelligence, Product Design and Development

DATA ANALYSIS  (10Hrs)

Exploring R: Exploring Basic Features of R, Programming Features, Packages, Exploring RStudio, Handling Basic Expressions in R, Basic Arithmetic in R, Mathematical Operators, Calling Functions in R, Working with Vectors, Creating and Using Objects, Handling Data in R Workspace, Creating Plots, Using Built-in Datasets in R, Reading Datasets and Exporting Data from R, Manipulating and Processing Data in R, Statistical Features-Analysis of time series: linear systems analysis, nonlinear dynamics – Rule induction – Neural networks: learning and generalization, competitive learning, principal component analysis and neural networks.

BIG DATA TECHNOLOGY FOUNDATIONS & MINING DATA STREAMS      10Hrs)

Exploring the Big Data Stack, Data Sources Layer, Ingestion Layer, Storage Layer, Physical Infrastructure Layer, Platform Management Layer, Security Layer, Monitoring Layer, Analytics Engine, Visualization Layer, Big Data Applications, Virtualization. Introduction to Streams Concepts – Stream data model and architecture – Stream Computing, Sampling data in a stream – Filtering streams, Counting distinct elements in a stream.

FREQUENT ITEMSETS AND CLUSTERING (9Hrs)

Mining Frequent itemsets – Market based model – Apriori Algorithm – Handling large data sets in Main memory – Limited Pass Algorithm – Counting frequent itemsets in a stream – Clustering Techniques – Hierarchical – K- Means. Analytical Approaches and Tools to Analyze Data: Text Data Analysis, Graphical User Interfaces, Point Solutions.

FRAMEWORKS AND VISUALIZATION                (10Hrs)

Distributed and Parallel Computing for Big Data, MapReduce – Hadoop, Hive, MapR – Hadoop -YARN - Pig and PigLatin, Jaql - Zookeeper - HBase, Cassandra- Oozie, Lucene- Avro, Mahout. Hadoop Distributed file systems – Visualizations – Visual data analysis techniques, interaction techniques; Systems and applications.

                                                                                                                 

TEXT BOOKS:

1. Big Data, Black Book, DT Editorial Services,  Dreamtech Press, 2015

2. Big Data and Analytics, Seema Acharya, Subhashini Chellappan, Infosys Limited, Publication: Wiley India Private Limited,1st Edition 2015

REFERENCES:

1. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with advanced analystics, John Wiley & sons, 2012.

2. Glenn J. Myatt, Making Sense of Data, John Wiley & Sons, 2007 Pete Warden, Big Data Glossary,O’Reilly, 2011.

3. Jiawei Han, MichelineKamber “Data Mining Concepts and Techniques”, Second Edition, Elsevier, Reprinted 2008.

4. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data by EMC Education Services (Editor), Wiley, 2014

5. Stephan Kudyba, Thomas H. Davenport, Big Data, Mining, and Analytics, Components of Strategic Decision Making, CRC Press, Taylor & Francis Group. 2014

6. Norman Matloff , THE ART OF R PROGRAMMING, No Starch Press, Inc.2011.

7. Big Data For Dummies, Judith Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman, Wiley 2013

LESSON PLAN

|Chapters |Topic/Coverage |No. of |Lecture |

| | |lectures |serial no. |

|1. INTRODUCTION TO BIG DATA          |Data Science |4 | |

|                                     |Importance of Data | | |

| |Characteristics of Data  Analysis of Unstructured Data | | |

| |Combining Structured and Unstructured Sources. | | |

| |Big Data Platform – Challenges of conventional systems – Web data –| | |

| |Evolution of Analytic scalability, analytic processes and tools, | |1-9 |

| |Analysis vs reporting – Modern data analytic tools. | | |

| |Tutorial |1 | |

| |Data Analytics Lifecycle. |3 | |

| |Tutorial |1 | |

|2. DATA ANALYSIS  |Exploring R: |4 |10-19 |

| |Exploring Basic Features of R, | | |

| |Exploring RStudio, | | |

| |Handling Basic Expressions in R, | | |

| |Basic Arithmetic and , Mathematical Operators in R, Calling | | |

| |Functions in R, | | |

| |Working with Vectors, | | |

| |Creating and Using Objects, | | |

| |Handling Data in R Workspace, | | |

| |Creating Plots, | | |

| |Reading Datasets and Exporting Data from R, | | |

| |Manipulating and Processing Data in R. | | |

| |Tutorial |1 | |

| |Statistical Features. |3 | |

| |Analysis of time series: linear systems analysis, | | |

| |nonlinear dynamics – Rule induction – | | |

| |Neural networks: learning and generalization, competitive | | |

| |learning,  | | |

| |principal component analysis and neural networks. | | |

| |Tutorial |1 | |

|3. BIG DATA TECHNOLOGY FOUNDATIONS & |Exploring the Big Data Stack, |4 |20-29 |

|MINING DATA STREAMS    |Data Sources Layer, | | |

| |Ingestion Layer, | | |

| |Storage Layer, | | |

| |Physical Infrastructure Layer, | | |

| |Platform Management Layer, | | |

| |Security Layer, | | |

| |Monitoring Layer, | | |

| |Analytics Engine, | | |

| |Visualization Layer, | | |

| |Big Data Applications, | | |

| |Virtualization. | | |

| | | | |

| |Tutorial |1 | |

| |Introduction to Streams Concepts – |3 | |

| |Stream data model and architecture – Stream Computing,  | | |

| |Sampling data in a stream – Filtering streams, | | |

| |Counting distinct elements in a stream. | | |

| | | | |

| |Tutorial |1 | |

|4. FREQUENT ITEMSETS AND CLUSTERING |Mining Frequent itemsets – |4 |30-38 |

| |Market based model – | | |

| |Apriori Algorithm – | | |

| |Handling large data sets in Main memory | | |

| |Limited Pass Algorithm – | | |

| |Counting frequent itemsets in a stream – Clustering Techniques –  | | |

| |Hierarchical – K- Means. | | |

| | | | |

| |Tutorial |1 | |

| |Analytical Approaches |2 | |

| |Tools to Analyze Data | | |

| |Text Data Analysis, | | |

| |Graphical User Interfaces | | |

| |Tutorial |1 | |

|5. FRAMEWORKS AND VISUALIZATION      |Distributed and Parallel Computing for Big Data, |4 |39-48 |

|          |MapReduce – | | |

| |Hadoop, | | |

| |Hive, | | |

| |MapR – | | |

| |Hadoop -YARN - | | |

| |Pig and PigLatin, | | |

| |Jaql - Zookeeper - | | |

| |HBase, | | |

| |Cassandra- Oozie, Lucene- | | |

| |Avro, | | |

| |Mahout. | | |

| |Tutorial |1 | |

| |Hadoop Distributed file systems – |3 | |

| |Visualizations – | | |

| |Visual data analysis techniques, | | |

| |interaction techniques; | | |

| |Systems and applications. | | |

Course Delivery Plan:

|Duration |Topics |

|Week- 1 |Introduction to Data Science |

|Week- 2 |Elements of Big Data |

|Week- 3 |Data Analytics Lifecycle |

|Week- 4 |Exploring R programming features |

|Week- 5 |Manipulating and Processing Data in R |

|Week- 6 |Exploring the Big Data Stack |

|Week- 7 |Analytics Engine & Virtualization |

|Week- 8 |Introduction to Streams Concepts |

|Week- 9 |Mining Frequent itemsets |

|Week- 10 |Clustering Techniques |

|Week-11 |Distributed and Parallel Computing for Big Data |

|Week-12 |Hadoop Framework & Implementation |

Assessment:

Assessment will be based on quizzes, class test and presentations.

Evaluation Scheme:

|Exam |Marks |

|End Semester |60 |

|Internal |Mid Semester |25 |

| |Assignment/Quiz |15 |

|Total |100 |

Program Educational Objectives

PEO-1. To lead a successful career in industries or pursue higher studies or entrepreneurial endeavors.

PEO-1. To offer techno-commercially feasible and socially acceptable solutions to real life engineering problems.

PEO-1. To demonstrate effective communication skill, professional attitude and a desire to learn.

Program Outcomes

a) Ability to apply knowledge of mathematics, science, engineering, computing to solve complex problems.

b) Ability to identify, analyze and solve complex software and hardware engineering problems.

c) Ability to design, implement and evaluate various computer based systems to meet the needs of the society by considering public health, safety, cultural, societal and environmental issues.

d) Ability to design & conduct experiments and interpret data.

e) Ability to use techniques, skills and modern engineering and IT tools to various relevant engineering practices.

f) Ability to examine and understand the impact of societal, health, safety, legal and cultural concerns at local, national and international levels relevant to engineering practices.

g) Ability to recognize the sustainability and environmental impact of the computer-based engineering solutions.

h) Ability to follow prescribed norms, responsibilities and ethics in engineering practices.

i) Ability to work effectively as an individual and in a team.

j) Ability to communicate effectively through oral, written and pictorial means with engineering community and the society at large.

k) Ability to recognize the need for and to engage in life-long learning.

l) Ability to understand and apply engineering & management principles in executing projects.

Course Outcomes (CO) of Data Analytics

1. Able to understand the emergence and importance of Big data Science along with necessity of big data analysis.

2. Able to understand the working of R-tool and write structured and well-commented scripts in R to analyze the data set with an ability to test and debug them in the laboratory.

3. Able to understand the Big data stack with layered and the concepts related to Big data technologies and its applications.

4. Able to understand the concepts of Data mining and usage of data mining techniques for analysis of big data.

5. Able to have an understanding framework and visualization of hadoop architecture and its applications.

CO POabcdefghijkl1HHMMMMH2HMHHHMH3HHMHHMMH4HHMHHMMH5HMHMHMMH

NOTE:

H: Highly contribute to PO

M: Moderately contribute to PO

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Related searches