Python pandas quick guide - University of Utah
[Pages:11]Python pandas quick guide
Shiu-Tang Li May 12, 2016
Contents
1 Dataframe initialization / outputs
3
1.1 Load csv files into dataframe. . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Initialize a dataframe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Create a new column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Output a dataframe to csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Take a quick glance of a dataframe
4
2.1 Print the data frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Get the description of numerical columns . . . . . . . . . . . . . . . . . . . 4
2.3 Get the dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Get the data type / get the filtered data by data type . . . . . . . . . . . . 4
2.5 Get the unique elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Select data from a dataframe
5
3.1 Get column names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Select a specific column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Select the sub-dataframe of a few columns . . . . . . . . . . . . . . . . . . . 5
3.4 Select rows with restrictions on columns . . . . . . . . . . . . . . . . . . . . 5
3.5 Select rows with row index . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.6 Select row index with max values in a specific column . . . . . . . . . . . . 5
3.7 Select given entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.8 Iterate rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Revise data in a dataframe
7
4.1 Revise data in a particular entry . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Reindex rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 Reindex one row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4 Rename columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1
4.5 Drop columns / rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.6 Find / drop / fill missing values . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.7 Data frame transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.8 Change types of a column . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.9 Merge data frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Search key words in a dataframe
9
5.1 Exact match in target column . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Perform operations on a dataframe
10
6.1 Sort dataframe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.2 Rearrange dataframe - pivot table . . . . . . . . . . . . . . . . . . . . . . . 10
6.3 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.4 `Apply' function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 Others
11
2
1 Dataframe initialization / outputs
1.1 Load csv files into dataframe.
1 import pandas
2 d a t a f r a m e = pandas . r e a d c s v ( "C : / U s e r s / Shiu-Tang L i / . . . c s v " ,
3
encoding = " ISO-8859-1" )
4 # encoding : to deal with unicodes
1.2 Initialize a dataframe
1 import pandas as pd
2 df1 = pd . DataFrame = ({ ' c1 ' : [ ' 1 ' , ' 2 ' , ' 3 ' , ' 4 ' ] , ' c2 ' : [ ' 5 ' , ' 6 ' , ' 7 ' , ' 8 ' ]})
3 df2 = pd . DataFrame ({ ' c1 ' : 2 ,
4
' c2 ' : np . a r r a y ( [ 0 ] 100 , dtype= ' i n t 3 2 ' ) ,
5
' c3 ' : ' hello ' })
1 import pandas as pd 2 l i s t o f d i c t s = [{ ' column1 ' :3 , ' column2 ' :4} , { ' column1 ' :7 , ' column2 ' :2} , { ' column1
' :6 , ' column2 ' :8}] 3 d a t a f r a m e = pd . DataFrame ( l i s t o f d i c t s , index=[ ' index1 ' , ' index2 ' ] ) 4 # construct data frame from l i s t of d i c t i o n a r i e s
1.3 Create a new column
1 d a t a f r a m e [ ' new column ' ] = L i s t OR S e r i e s 2 # will get warning message
1.4 Output a dataframe to csv
1 import pandas 2 d a t a f r a m e . t o c s v ( "C : / U s e r s / Shiu-Tang L i / . . . c s v " )
Remark. May load .csv as list of lists instead of data frames.
3
2 Take a quick glance of a dataframe
2.1 Print the data frame
1 print ( data frame . head (5) ) 2 # print f i r s t 5 rows 3 print (data frame) 4 # print the data frame , dimension information i s also attached
2.2 Get the description of numerical columns
1 print (data frame . describe () )
2.3 Get the dimension
1 dim = data frame . shape 2 number of rows = dim [0] 3 number of columns = dim [1]
2.4 Get the data type / get the filtered data by data type
1 types = data frame . dtypes 2 # types i s a S e r i e s l a b e l e d by column names , showing the data type f o r each
column . 3 i n t e g e r i n d e x = t y p e s [ t y p e s == i n t 6 4 ] . index 4 # t y p e s [ t y p e s == i n t 6 4 ] i s a f i l t e r e d S e r i e s with i n t e g e r v a l u e s . 5 print (data frame[ integer index ]) 6 # this gives us the f i l t e r e d data frame with only int64 values .
2.5 Get the unique elements
1 pr in t ( data frame [ ' column name ' ] . unique () ) 2 # w i l l return a l i s t showing d i s t i n c t elements in the column 3 pr in t ( data frame [ ' column name ' ] . value counts () ) 4 # w i l l return a table showing the counts in the column
4
3 Select data from a dataframe
3.1 Get column names
1 print ( data frame . columns) 2 # data frame . columns i s a l i s t of strings 3 first column = data frame . columns [0] 4 # p r i n t the f i s t column , which i s a s t r i n g
3.2 Select a specific column
1 column = data frame [ ' column name ' ] 2 # column i s a [ S e r i e s ] object , c o n t a i n s row index + values , both are l i s t s 3 column values = column . values 4 column index = column . index
3.3 Select the sub-dataframe of a few columns
1 data frame2 = data frame [[ ' column name1 ' , ' column name2 ' ]] 2 data frame2 = data frame [ data frame . column [0:2]] 3 # s e l e c t the f i r s t two columns in two d i f f e r e n t ways 4 data frame2 = data frame [ ' column name x ' : ' column name y ' ] 5 # s e l e c t the columns between the two columns
3.4 Select rows with restrictions on columns
1 d a t a f r a m e [ d a t a f r a m e [ ' column name ' ] == s o m e v a l u e s ]
3.5 Select rows with row index
1 data frame . iloc [ i ] 2 #i : row index 3 data frame . iloc [0:3] 4 # s e l e c t the rows with
indices
0 ,1 ,2
Remark. The difference between loc and iloc: If the index of the dataframe is 3, 7, 0, 2, . . ., iloc[0] will select the third row (true integer index), loc will select the 1st row (index by locations).
3.6 Select row index with max values in a specific column
1 data frame [ ' column name ' ] . idxmax () 2 # r e t u r n s t h e 1 s t row i n d e x t h a t has max
3.7 Select given entry
5
1 # Approach 1: i : true row index 2 data frame . i l o c [ i ][ ' column name ' ] 3 # Approach 2: i : location 4 data frame [ ' column name ' ] . values [ i ] 5 # Approach 3: i : true row index 6 data frame . ix [ i , ' column name ' ]
3.8 Iterate rows
1 f o r i , row in data frame . i t e r r o w s () : 2 # i : row i n d i c e s ; row : each row
6
4 Revise data in a dataframe
4.1 Revise data in a particular entry
1 # i : t r u e row index 2 # Approach 1 ( will get warning message) : 3 data frame . ix [ i , ' column name ' ] = new value 4 # Approach 2 ( will get warning message) : 5 data frame [ ' column name ' ][ i ] = new value 6 # Approach 3: 7 data frame . set val ue ( i , ' column name ' , new value ) 8 # Approach 4: 9 data frame . at [ i , ' column name ' ] = new value
4.2 Reindex rows
1 data frame . index = [index1 , index2 , . . . ] 2 # r e p l a c e i n d i c e s with a new l i s t 3 data frame = data frame . set index ([ ' column name ' ]) 4 # indexed by a p a r t i c u l a r column 5 d a t a f r a m e = d a t a f r a m e . r e s e t i n d e x ( drop=True ) 6 # r e i n d e x e d from 0 . drop=F a l s e : make a dataframe column with t h e o l d index
values .
4.3 Reindex one row
1 old indices = data frame . index
2 new indices = old indices . values
3 for i , item in enumerate( new indices ) :
4
i f item == ' o l d i n d e x t o b e c h a n g e d ' :
5
new indices [ i ] = ' new index '
6 data frame . index = new indices
4.4 Rename columns
1 d a t a f r a m e . rename ( columns={ ' column name1 ' : ' new column name1 ' , ' column name2 ' : ' new column name2 ' } , i n p l a c e=True )
2 # rename a few columns : w i l l get warning message 3 data frame . columns = [ ' column name1 ' , ' column name2 ' , . . . ] 4 # rename a l l columns
4.5 Drop columns / rows
1 d a t a f r a m e . drop ( ' column name ' , a x i s =1, i n p l a c e=True ) 2 # drop a column 3 data frame . drop ([ ' column name1 ' , ' column name2 ' , . . . ] , 4 # drop a few columns 5 d a t a f r a m e . drop ( ' row index1 ' , a x i s =0, i n p l a c e=True ) 6 # drop a row
a x i s =1,
i n p l a c e=True )
7
7 d a t a f r a m e . drop ( [ ' row index1 ' , ' row index2 ' , . . . ] , a x i s =0, i n p l a c e=True ) 8 # drop a few rows
4.6 Find / drop / fill missing values
1 import pandas as pd 2 i s n u l l = pd . i s n u l l ( data frame [ " column name " ]) 3 # w i l l r e t u r n a t r u e / f a l s e S e r i e s . N u l l v a l u e = NaN . 4 new data frame = data frame . dropna () 5 # drop a l l rows with missing values 6 new data frame = data frame . dropna( axis = 1) 7 # drop a l l columns with missing values 8 new data frame = d a t a f r a m e . dropna ( s u b s e t =[" column1 " , " column2 " ] ) 9 # drop a l l rows with missing values in the two columns 10 d a t a f r a m e [ " column " ] = d a t a f r a m e [ " column " ] . f i l l n a ( d a t a f r a m e [ " column " ] . median ( )
) 11 d a t a f r a m e [ " column " ] = d a t a f r a m e [ " column " ] . f i l l n a ( d a t a f r a m e [ " column " ] . mean ( ) ) 12 d a t a f r a m e [ " column " ] = d a t a f r a m e [ " column " ] . f i l l n a ( something ) 13 # f i l l NaN v a l u e s with column median / mean / or s t h e l s e
4.7 Data frame transpose
1 data frame .T
4.8 Change types of a column
1 data frame [ ' column name ' ] = data frame [ ' column name ' ] . astype ( f l o a t ) 2 # changing types to float
4.9 Merge data frames
1 import pandas as pd 2 l i s t 1 = [{ ' c1 ' :4 , ' c2 ' :3} , { ' c1 ' :2 , ' c2 ' : 3}] 3 l i s t 2 = [{ ' c1 ' :6 , ' c2 ' :9} , { ' c1 ' :8 , ' c2 ' :10}] 4 df1 = pd . DataFrame ( l i s t 1 , index =[0 ,1]) 5 df2 = pd . DataFrame ( l i s t 2 , index=[ ' two ' , ' t h r e e ' ] ) 6 df = pd . concat ([ df1 , df2 ]) 7 # combine two data frames with common columns , i n c r e a s e
contain keys of different data types
rows .
index
could
1 import pandas as pd 2 df1 = pd . DataFrame ({ ' key ' : [ ' a ' , ' b ' ] , ' c1 ' : [1 , 2] , ' c2 ' : [8 , 2]}) 3 df2 = pd . DataFrame ({ ' key ' : [ ' a ' , ' b ' ] , ' c3 ' : [3 , 5]}) 4 merged df = pd . merge ( df1 , df2 , on= ' key ' ) 5 # combine two data frames with the same data in the column ' key ' . Merged data
frame will contain 4 columns .
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- pandas dataframe notes university of idaho
- reading and writing data with pandas
- technical analysis library in python documentation
- python pandas quick guide university of utah
- python programming pandas
- chapter 1 data handling using pandas i pandas
- with pandas f m a vectorized m a f operations cheat sheet
- cheat sheet pandas python datacamp
- pandas data manipulation bentley university
- data analysis with pandas
Related searches
- state of utah division of finance
- python pandas apply
- python pandas string to date
- python pandas read csv
- pandas user guide pdf
- python pandas number of row
- python quick guide pdf
- state of utah medical license
- state of utah education
- state of utah finance department
- pandas quick reference
- university of utah stadium map