DSC 201: Data Analysis & Visualization
[Pages:42]DSC 201: Data Analysis & Visualization
Data Transformation
Dr. David Koop
D. Koop, DSC 201, Fall 2018
Example: Football Game Data
? Have each game store the id of the home team and the id of the away team (one-to-one)
? Have each player store the id of the team he plays on (many-to-one)
Player
Id Name Height Weight TeamId
Team
Id Name Wins Losses
? What happens if a player plays on 2+ teams?
Game
Id Location Date Home Away
D. Koop, DSC 201, Fall 2018
2
Concatenation in pandas
? Take two data frames with the same columns and add more rows ? Example: 2015 Game Data and 2016 Game Data
? pd.concat([data-frame-1, data-frame-2, ...])
? Default is to add rows (axis=0), but can also add columns (axis=1) ? Can also concatenate Series into a data frame. ? concat preserves the index so this can be confusing if you have two
default indices (0,1,2,3...)--they will appear twice - Use ignore_index=True to get a 0,1,2...
D. Koop, DSC 201, Fall 2018
3
Merges (aka Joins)
? Need to merge data from one DataFrame with data from another DataFrame
? Example: Football game data merged with temperature data
Game
Id Location Date Home Away
0 Boston 9/2 1 15
1 Boston 9/9 1
7
2 Cleveland 9/16 12 1
3 San Diego 9/23 21 1
No data for San Diego
Weather
wId
City
0 Boston
1 Boston
...
...
7 Boston
...
...
21 Boston
...
...
36 Cleveland
Date 9/2 9/3 ... 9/9 ... 9/23 ... 9/16
Temp 72 68 ... 75 ... 54 ... 81
D. Koop, DSC 201, Fall 2018
4
Inner Strategy
Merged
Id
Location
Date Home
Away
Temp
wId
0
Boston
9/2
1
15
72
0
1
Boston
9/9
1
7
75
7
2
Cleveland
9/16
12
1
81
36
No San Diego entry
D. Koop, DSC 201, Fall 2018
5
Outer Strategy
Merged
Id
Location
0
Boston
NaN
Boston
...
...
1
Boston
NaN
Boston
...
...
NaN Cleveland
...
...
2
Cleveland
...
...
3
San Diego
Date 9/2 9/3 ... 9/9 9/10 ... 9/2 ... 9/16 ... 9/23
Home 1
NaN ... 1 NaN ... NaN ... 12 ... 21
Away 15 NaN ... 7 NaN ... NaN ... 1 ... 1
Temp 72 68 ... 75 76 ... 61 ... 81 ... NaN
wId 0 1 ... 7 8 ... 22 ... 36 ...
NaN
D. Koop, DSC 201, Fall 2018
6
Left Strategy
Merged
Id
Location
0
Boston
1
Boston
2
Cleveland
3
San Diego
Date 9/2 9/9 9/16 9/23
Home 1 1 12 21
Away 15 7 1 1
Temp 72 75 81 NaN
wId 0 7 36
NaN
D. Koop, DSC 201, Fall 2018
7
Right Strategy
Merged
Id
Location
Date Home
Away
Temp
wId
0
Boston
9/2
1
15
72
0
NaN
Boston
9/3
NaN
NaN
68
1
...
...
...
...
...
...
...
1
Boston
9/9
1
7
75
7
NaN
Boston
9/10
NaN
NaN
76
8
...
...
...
...
...
...
...
NaN Cleveland
9/2
NaN
NaN
61
22
...
...
...
...
...
...
...
2
Cleveland
9/16
12
1
81
36
...
...
...
...
...
...
...
No San Diego entry
D. Koop, DSC 201, Fall 2018
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- with pandas f m a vectorized m a f operations cheat
- biokit read the docs
- python pandas quick guide math
- dsc 201 data analysis visualization
- easy as py a first course in python with a taste of data
- handout 10 bentley university
- data wrangling ii munging tidy data and working with
- python transpose matrix zip static
Related searches
- data analysis questions examples
- data analysis research paper example
- data analysis method
- data analysis methods examples
- data analysis methods in research
- types of data analysis methods
- data analysis in research methodology
- data analysis in research pdf
- examples of data analysis paper
- data analysis techniques for research
- data analysis and interpretation pdf
- data analysis tools