DSC 201: Data Analysis & Visualization
[Pages:29]DSC 201: Data Analysis & Visualization
Data Transformation
Dr. David Koop
D. Koop, DSC 201, Fall 2016
Data Wrangling
? Data wrangling: transform raw data to a more meaningful format that can be better analyzed
? Data cleaning: getting rid of inaccurate data ? Data transformations: changing the data from one representation to
another ? Data reshaping: reorganizing the data ? Data merging: combining two datasets
D. Koop, DSC 201, Fall 2016
2
Example: Football Game Data
? Data about football games, teams, and players - Game is between two Teams - Each Team has Players
? For each game, we could specify every player and all of their information... why is this bad?
D. Koop, DSC 201, Fall 2016
3
Example: Football Game Data
? Data about football games, teams, and players
- Game is between two Teams
- Each Team has Players
? For each game, we could specify every player and all of their information... why is this bad?
? Normalization: reduce redundancy, keep information that doesn't change separate
? 3 Relations: Team, Player, Game
? Each relation only encodes the data specific to what it represents
Player
Id Name Height Weight
Team
Id Name Wins Losses
Game
Id Location Date
D. Koop, DSC 201, Fall 2016
3
Example: Football Game Data
? Have each game store the id of the home team and the id of the away team (one-to-one)
? Have each player store the id of the team he plays on (many-to-one)
Player
Id Name Height Weight TeamId
Team
Id Name Wins Losses
? What happens if a player plays on 2+ teams?
Game
Id Location Date Home Away
D. Koop, DSC 201, Fall 2016
4
Concatenation
? Take two data frames with the same columns and add more rows
? pd.concat([data-frame-1, data-frame-2, ...])
? Default is to add rows (axis=0), but can also add columns (axis=1) ? Can also concatenate Series into a data frame. ? concat preserves the index so this can be confusing if you have two
default indices (0,1,2,3...)--they will appear twice - Use ignore_index=True to get a 0,1,2...
D. Koop, DSC 201, Fall 2016
5
Merges (aka Joins)
? Need to merge data from one DataFrame with data from another DataFrame
? Example: Football game data merged with temperature data
Game
Id Location Date Home Away
0 Boston 9/2 1 15
1 Boston 9/9 1
7
2 Cleveland 9/16 12 1
3 San Diego 9/23 21 1
No data for San Diego
Weather
wId
City
0 Boston
1 Boston
...
...
7 Boston
...
...
21 Boston
...
...
36 Cleveland
Date 9/2 9/3 ... 9/9 ... 9/23 ... 9/16
Temp 72 68 ... 75 ... 54 ... 81
D. Koop, DSC 201, Fall 2016
6
Inner Strategy
Merged
Id
Location
Date Home
Away
Temp
wId
0
Boston
9/2
1
15
72
0
1
Boston
9/9
1
7
75
7
2
Cleveland
9/16
12
1
81
36
No San Diego entry
D. Koop, DSC 201, Fall 2016
7
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- with pandas f m a vectorized m a f operations cheat
- biokit read the docs
- python pandas quick guide math
- dsc 201 data analysis visualization
- easy as py a first course in python with a taste of data
- handout 10 bentley university
- data wrangling ii munging tidy data and working with
- python transpose matrix zip static
Related searches
- data analysis questions examples
- data analysis research paper example
- data analysis method
- data analysis methods examples
- data analysis methods in research
- types of data analysis methods
- data analysis in research methodology
- data analysis in research pdf
- examples of data analysis paper
- data analysis techniques for research
- data analysis and interpretation pdf
- data analysis tools