INTRODUCTION TO DATA SCIENCE
[Pages:32]INTRODUCTION TO DATA SCIENCE
JOHN P DICKERSON
Lecture #4 ? 09/09/2021 Lecture #5 ? 09/14/2021 CMSC320 Tuesdays & Thursdays 5:00pm ? 6:15pm
ANNOUNCEMENTS
Register on Piazza: umd/fall2021/cmsc320 ? XXX have registered already ? Very few have not registered yet
If you were on Piazza, you'd know ... ? Project 1 will be out shortly. (Worth 10% of grade, as are each of the four
projects.) ? Link will be on course website @ cmsc320.github.io
We've also linked some reading for the week! ? Quizzes are generally due on Tuesdays at noon; on ELMS now.
2
THE DATA LIFECYCLE
Data collection
Data processing
Exploratory analysis & Data viz
Analysis, hypothesis testing, &
ML
Insight & Policy
Decision
3
NEXT FEW CLASSES
1. NumPy: Python Library for Manipulating nD Arrays Multidimensional Arrays, and a variety of operations including Linear Algebra
2. Pandas: Python Library for Manipulating Tabular Data Series, Tables (also called DataFrames) Many operations to manipulate and combine tables/series
3. Relational Databases Tables/Relations, and SQL (similar to Pandas operations)
4. Apache Spark Sets of objects or key-value pairs MapReduce and SQL-like operations
4
NEXT FEW CLASSES
1. NumPy: Python Library for Manipulating nD Arrays Multidimensional Arrays, and a variety of operations including Linear Algebra
2. Pandas: Python Library for Manipulating Tabular Data Series, Tables (also called DataFrames) Many operations to manipulate and combine tables/series
3. Relational Databases Tables/Relations, and SQL (similar to Pandas operations)
4. Apache Spark Sets of objects or key-value pairs MapReduce and SQL-like operations
5
NUMERIC & SCIENTIFIC APPLICATIONS
Number of third-party packages available for numerical and scientific computing These include: ? NumPy/SciPy ? numerical and scientific function libraries. ? numba ? Python compiler that support JIT compilation. ? ALGLIB ? numerical analysis library. ? pandas ? high-performance data structures and data analysis tools. ? pyGSL ? Python interface for GNU Scientific Library. ? ScientificPython ? collection of scientific computing modules.
Many, many thanks to: FSU CIS4930
6
NUMPY AND FRIENDS
By far, the most commonly used packages are those in the NumPy stack. These packages include: ? NumPy: similar functionality as Matlab ? SciPy: integrates many other packages like NumPy ? Matplotlib & Seaborn ? plotting libraries ? iPython via Jupyter ? interactive computing ? Pandas ? data analysis library ? SymPy ? symbolic computation library
[FSU]
7
THE NUMPY STACK
Mid- & Latesemester
Today/next class
Later
Image from Continuum Analytics
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- hive functions cheat sheet by qubole
- working within the data lake
- pushdowndb accelerating a dbms using s3 computation
- c talyst support to spark with adding native sql
- spark sql 内置函数列表
- sparql by example the cheat sheet
- spark walmart data analysis project exercise
- fundamentals of programming languages
- 一条 sql 在 apache spark 之旅(中)
- 1 2 https 206049
Related searches
- introduction to computer science free
- introduction to java programming and data structures
- introduction to computer science class
- introduction to data analysis ppt
- introduction to computer science course
- introduction to computer science online
- introduction to soil science pdf
- introduction to data analysis pdf
- introduction to computer science pdf
- introduction to environmental science pdf
- introduction to computer science syllabus
- introduction to computer science textbook