5. Traversing DataFrame Elements using



Part 2 : Data Frame Continued …………………………..5. Traversing DataFrame Elements usingiterrows() , iteritems() and itertuples()To iterate over the rows of the DataFrame, we can use the following functions ?iteritems()?? to iterate over the (key,value) pairsiterrows()?? iterate over the rows as (index,series) pairs itertuples()?? iterate over the rows as named tuplesLets take an example>>>import pandas as pd>>>import numpy as np>>>d={'Name':['Shalini','Varsha','Shanti','Madhu'],'Age':[23,56,54,34]}>>> print df Name Age0 Shalini 231 Varsha 562 Shanti 543 Madhu 34 5.1 USE of iterrows()>>>print('ITER ROWS')>>>for key,values in df.iterrows(): for val in values: print('Hello',val)ITER ROWSHello ShaliniHello 23Hello VarshaHello 56Hello ShantiHello 54Hello MadhuHello 345.2 USE of iteritems()>>>print('ITER ITEMS')>>>for key,values in df.iteritems(): for val in values: print('Hello',val)ITER ITEMSHello ShaliniHello VarshaHello ShantiHello MadhuHello 23Hello 56Hello 54Hello 345.3 USE of itertuples()>>>print('ITER TUPLES')>>>for rows in df.itertuples(): print(rows)ITER TUPLESPandas(Index=0, Name='Shalini', Age=23)Pandas(Index=1, Name='Varsha', Age=56)Pandas(Index=2, Name='Shanti', Age=54)Pandas(Index=3, Name='Madhu', Age=34)6. Binary Operations in a DataFrame (add, sub, mul, div, radd , rsub) :Lets take a DataFrames with numeric data>>> s1=[[1,2,3],[4,5,6]]>>> s2=[[3,2,5],[5,7,8]]>>> s3=[[5,5,5],[4,4,4]]350520019050Created three data frames namely dfr1, dfr2 and dfr3Created three data frames namely dfr1, dfr2 and dfr3237172595250>>> dfr1=pd.DataFrame(s1)>>> dfr2=pd.DataFrame(s2)447675203835ADDITION 0ADDITION >>> dfr3=pd.DataFrame(s3)6.1 >>> dfr1 0 1 20 1 2 31 4 5 6>>> dfr2 0 1 20 3 2 51 5 7 8>>> dfr3 0 1 20 5 5 51 4 4 4 An individual value or a Data frame can be added to another Dataframe2466975223520002552700189230Here 2 is added to each element of Data Frame dfr2Here 2 is added to each element of Data Frame dfr2>>> dfr1+2 0 1 20 3 4 51 6 7 82524125240665002590800176530Corresponding element of dfr1 and dfr2 is addedCorresponding element of dfr1 and dfr2 is added>>> dfr1+dfr2 0 1 20 4 4 81 9 12 1421431251905002486025133350It will add Corresponding elements of dfr2 with dfr1 (dfr2+dfr1)It will add Corresponding elements of dfr2 with dfr1 (dfr2+dfr1)>>> dfr1.add(dfr2) 0 1 20 4 4 81 9 12 142571750171450Here ‘r’ stands for reverse it will add Corresponding elements of dfr2 with dfr1 (dfr2+dfr1)Here ‘r’ stands for reverse it will add Corresponding elements of dfr2 with dfr1 (dfr2+dfr1)>>> dfr1.radd(dfr2) 25812751524000 0 1 20 4 4 81 9 12 142552700158115It will add Corresponding elements of dfr1, dfr2 and dfr13 It will add Corresponding elements of dfr1, dfr2 and dfr13 >>> dfr3+dfr1+dfr2 0 1 226289006985000 9 9 131 13 16 1830480029210SUBTRACTION 0SUBTRACTION 6.2 236220062865It will subtract Corresponding elements of dfr1 with dfr2 It will subtract Corresponding elements of dfr1 with dfr2 >>> dfr1-dfr223336253556000 0 1 20 -2 0 -21 -1 -2 -2222885024384000234315047625It will subtract Corresponding elements of dfr1 with dfr2 It will subtract Corresponding elements of dfr1 with dfr2 >>> dfr1.sub(dfr2) 0 1 20 -2 0 -21 -1 -2 -2226695026479500229552576200Here ‘r’ stands for reverse it will subtract Corresponding elements of dfr2 with dfr1 (dfr2 - dfr1)Here ‘r’ stands for reverse it will subtract Corresponding elements of dfr2 with dfr1 (dfr2 - dfr1)>>> dfr1.rsub(dfr2) 0 1 20 2 0 21 1 2 22333625163195Here 2 is subtracted with each element of Data Frame dfr1Here 2 is subtracted with each element of Data Frame dfr1>>> dfr1-220859751143000 0 1 20 -1 0 11 2 3 4-952510541000center178435 In the Same way Multiplication can be done with * operator and mul() function and Division can be done with / operator and div() function In the Same way Multiplication can be done with * operator and mul() function and Division can be done with / operator and div() function7. Matching and Broadcasting Operations:7.1 Matching : Whenever we perform arithmetic operations on dataframe data is aligned on the basis of matching indexes and then performed arithmetic ; for non-overlapping indexes the arithmetic operations result as a NaN . This default behavior of data alignment on the basis of matching indexes is known as MATCHING382905017145Data Frame 1 0 1 20 21 52 4341 55 66Data Frame 2 0 10 34 44 6Matching is done 0 1 20 55 56 NaN1 45 61 NaN00Data Frame 1 0 1 20 21 52 4341 55 66Data Frame 2 0 10 34 44 6Matching is done 0 1 20 55 56 NaN1 45 61 NaN108585010795import pandas as pds1=[[21,52,43],[41,55,66]]s2=[[34,4],[4,6]]dfr1=pd.DataFrame(s1)dfr2=pd.DataFrame(s2)print('Data Frame 1')print(dfr1)print('Data Frame 2')print(dfr2)print('Matching is done')print(dfr1+dfr2)00import pandas as pds1=[[21,52,43],[41,55,66]]s2=[[34,4],[4,6]]dfr1=pd.DataFrame(s1)dfr2=pd.DataFrame(s2)print('Data Frame 1')print(dfr1)print('Data Frame 2')print(dfr2)print('Matching is done')print(dfr1+dfr2) 5972175142875Output0Output58102501720853704590434975Data Frame 1 0 1 20 21 52 431 41 55 66Data Frame 2 0 10 34 41 4 6Broadcasting is done 0 1 20 24 56 481 44 59 7100Data Frame 1 0 1 20 21 52 431 41 55 66Data Frame 2 0 10 34 41 4 6Broadcasting is done 0 1 20 24 56 481 44 59 711009015453390import pandas as pds1=[[21,52,43],[41,55,66]]s2=[[34,4],[4,6]]dfr1=pd.DataFrame(s1)dfr2=pd.DataFrame(s2)print('Data Frame 1')print(dfr1)print('Data Frame 2')print(dfr2)print('Matching is done')print(dfr1+dfr2)print('Broadcasting is done')s3=[3,4,5]print(dfr1+s3)0import pandas as pds1=[[21,52,43],[41,55,66]]s2=[[34,4],[4,6]]dfr1=pd.DataFrame(s1)dfr2=pd.DataFrame(s2)print('Data Frame 1')print(dfr1)print('Data Frame 2')print(dfr2)print('Matching is done')print(dfr1+dfr2)print('Broadcasting is done')s3=[3,4,5]print(dfr1+s3)7.2 Broadcasting : Enlarging the smaller object in a binary operation by replicating its elements so as to match the shape of larger object.616267534290Output0Output589597569858. Handling Missing Data :As data comes in many shapes and forms, pandas aims to be flexible with regard to handling missing data. While NaN?is the default missing value marker for reasons of computational speed and convenience, we need to be able to easily detect this value with data of different types: floating point, integer, boolean, and general object. In many cases, however, the Python?None?will arise and we wish to also consider that “missing” or “not available” or “NA”. Function NameUseisnull()Returns True or False for each value in pandas object if it is a missing value or notnotnull()Returns True or False for each value in pandas object if it is a data value or notdropna()It will remove(drop) all the rows which contain NaN values anywhere in rowDropna(how=’all’)It will remove nly those rows that have all NaN valuesfillna(<dictionary Values>)It will fill missing Values with the value specified>>>import pandas as pd>>>KV_shift1={'Computer':[20,25,22,50],'Projectors':[1,1,1,14],'iPad':[1,1,1,7],'AppleTv':[1,1,1,7]}>>>dfr1=pd.DataFrame(KV_shift1,index=['SrCompLab','SecCompLab','PriLab','Others'])>>>print(dfr1)>>>KV_shift2={'Computer':[20,25,22,50],'Visualizers':[1,1,1,14],'iPad':[1,1,1,7],'AppleTv':[1,1,1,7]}>>>dfr2=pd.DataFrame(KV_shift2,index=['SrCompLab','SecCompLab','PriLab','Others'])>>>print(dfr2)>>>KV3Gwl=dfr1+dfr2>>>print(KV3Gwl)3771900106680 Computer Projectors iPad AppleTv488632519050DataFrame : dfr10DataFrame : dfr1SrCompLab 20 1 1 1SecCompLab 25 1 1 1PriLab 22 1 1 1Others 50 14 7 7 Computer Visualizers iPad AppleTv3819525762048768007620DataFrame : dfr20DataFrame : dfr2SrCompLab 20 1 1 1SecCompLab 25 1 1 1PriLab 22 1 1 1Others 50 14 7 7 404812574295 AppleTv Computer Projectors Visualizers iPad46672505715DataFrame : KV3Gwl0DataFrame : KV3GwlSrCompLab 2 40 NaN NaN 2SecCompLab 2 50 NaN NaN 2PriLab 2 44 NaN NaN 2Others 14 100 NaN NaN 148.1 Use of isnull() and notnull()print('ISNULL ()' )print(KV3Gwl.isnull())print('NOTNULL ()' )print(KV3Gwl.notnull())5017770-430530notnull() Will Give True If Corresponding Element Contains an data00notnull() Will Give True If Corresponding Element Contains an data1285875-278130Isnull() Will Give True If Corresponding Element Contains NaN00Isnull() Will Give True If Corresponding Element Contains NaNISNULL () AppleTv Computer Projectors Visualizers iPadSrCompLab False False True True FalseSecCompLab False False True True FalsePriLab False False True True FalseOthers False False True True FalseNOTNULL () AppleTv Computer Projectors Visualizers iPadSrCompLab True True False False TrueSecCompLab True True False False TruePriLab True True False False TrueOthers True True False False True8.2 Use of dropna()>>> KV3Gwl AppleTv Computer Projectors Visualizers iPadSrCompLab 2 40 6.0 2.0 2SecCompLab 2 50 NaN 5.0 2PriLab 2 44 7.0 NaN 2Others 14 100 5.0 2.0 14>>> KV3Gwl.dropna() AppleTv Computer Projectors Visualizers iPadSrCompLab 2 40 6.0 2.0 2Others 14 100 5.0 2.0 148.3 Use of fillna()>>> KV3Gwl.fillna({'Projectors':0,'Visualizers':0}) AppleTv Computer Projectors Visualizers iPadSrCompLab 2 40 6.0 2.0 2SecCompLab 2 50 0.0 5.0 2PriLab 2 44 7.0 0.0 2Others 14 100 5.0 2.0 149. Comparision among Panda Objects (Series, DataFrame)We can compare Panda Objects using == operator or using equals() function. The difference between these two is that == compares each element of first dataframe with corresponding element of second dataframe. Lets clear with following exampleimport pandas as pdimport numpy as npKV_Shift1={'Computer':[20,25,22,50],'Projectors':[1,1,np.NaN,14],'iPad':[np.NaN,1,2,7],'AppleTv':[1,1,1,7]}dfr1=pd.DataFrame(KV_Shift1,index=['SrCompLab','SecCompLab','PriLab','Others'])KV_Shift2={'Computer':[20,25,22,50],'Projectors':[1,1,np.NaN,14],'iPad':[np.NaN,1,2,7],'AppleTv':[1,1,1,7]}dfr2=pd.DataFrame(KV_Shift2,index=['SrCompLab','SecCompLab','PriLab','Others'])print('Data Frame 1 :')print(dfr1)print('Data Frame 2 :')print(dfr2)print('Checking Equality using == operator')print(dfr1==dfr1)print('Checking Equality using aequal() funcition')print(dfr1.equals(dfr1))Data Frame 1 : Computer Projectors iPad AppleTvSrCompLab 20 1.0 NaN 1SecCompLab 25 1.0 1.0 1PriLab 22 NaN 2.0 1Others 50 14.0 7.0 7Data Frame 2 : Computer Projectors iPad AppleTv391477530480Look here NaN == NaN are not equalLook here NaN == NaN are not equalSrCompLab 20 1.0 NaN 1.0SecCompLab 25 1.0 1.0 1.0PriLab 22 NaN 2.0 1.0Others 50 14.0 7.0 7.0Checking Equality using == operator Computer Projectors iPad AppleTv397192583820 equals() treated both NaN Values Equal equals() treated both NaN Values Equal SrCompLab True True False TrueSecCompLab True True True TruePriLab True False True TrueOthers True True True TrueChecking Equality using equals() funcitionTrue10. Boolean Reduction : With Boolean Reduction ,You can get overall result for a row or a column with a single True or False. For this purpose Pandas offers following Boolean reduction functions or attributes10.1 empty : Tells whether the Data Frame is Empty.10.2 any () : It returns True if any of the element is True over requested axis.10.3 all () : This function will return True if all the values on an axis are satisfying condition.import pandas as pdimport numpy as npdf1=pd.DataFrame()s1=[[2,5,8],[10,5,2]]df2=pd.DataFrame(s1)if df1.empty==True: print('Data1 Frame is Empty')if df2.empty==True: print('Data Frame2 is Empty')else: print('Data Frame2 is not Empty')print('Data Frame')print(df2)print('Used function all()')print((df2<5).all())print('Used function any()')print((df2<5).any())Data1 Frame is EmptyData Frame2 is not EmptyData Frame 0 1 20 2 5 81 10 5 2Used function all()0 False1 False2 Falsedtype: boolUsed function any()0 True1 False2 Truedtype: bool ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download