Create DataFrame

What is Data Frame? A Data frame is a 2D (two-dimensional) data structure, i.e., data is arranged in tabular form i.e. In rows and columns.

Or we can say that, Pandas DataFrame is similar to excel sheet Let's understand it through an example

known as Indexes

Name 0 Arprit

Age 62

Department Surgery

Charges Gender

300

M

1 Zarina 22

ENT

250

F

2 Kareem 32

Orthopaedic 200

M

3 Arun

12

Surgery

300

M

4 Zubin 30

ENT

250

M

5 Kettaki 16

ENT

250

F

6 Ankita 29

Cardiology

800

F

7 Zareen 45

300

F

8 Kush 19

Cardiology

800

M

I9d Shilpa 23

Nuclear

400

F

1. Create DMaetdaicFinrae me

Known as Columns

Data Values

pandas DataFrame can be created using the following constructor -

pandas.DataFrame( data, index, columns, dtype, copy)

The parameters of the constructor are as follows -

Sr.No

Parameter & Description

1

Data data takes various forms like ndarray, series, map, lists, dict, constants and

also another DataFrame.

2

Index For the row labels, the Index to be used for the resulting frame is Optional

Default np.arrange(n) if no index is passed.

3

Columns For column labels, the optional default syntax is - np.arrange(n). This is

only true if no index is passed.

4

Dtype Data type of each column.

5

Copy This command (or whatever it is) is used for copying of data, if the default is

False.

pythonclassroomdiary. by Sangeeta M Chauhan , PGT CS KV NO.3 Gwalior

A pandas DataFrame can be created using various inputs like -

Lists

dictionary

Series

Numpy ndarrays

Another DataFrame

1.1 Create an Empty DataFrame

>>> import pandas as pd >>> df=pd.DataFrame() >>> df

Empty DataFrame Columns: [] Index: []

1.2 Create a DataFrame from Lists Example 1

>>> MyList=[10,20,30,40] >>> MyFrame=pd.DataFrame(MyList) >>> MyFrame

0 0 10 1 20 2 30 3 40

Example 2: (Nested List)

>>> Friends = [['Shraddha','Doctor'],['Shanti','Teacher'],['Monica','Engineer']]

>>> MyFrame=pd.DataFrame(Friends,columns=['Name','Occupation']) >>> MyFrame

Name Occupation 0 Shraddha Doctor 1 Shanti Teacher 2 Monica Engineer

1.3 Creation of a DataFrame from Dictionary of ndarrays / Lists

All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the

arrays. If no index is passed, then by default, index will be range(n), where n is the array

length.

pythonclassroomdiary. by Sangeeta M Chauhan , PGT CS KV NO.3 Gwalior

Example 1 (without index)

>>> data = {'Name':['Shraddha', 'Shanti', 'Monica', 'Yogita'],'Age':[28,34,29,39]}

>>> df = pd.DataFrame(data) >>> df

Name Age 0 Shraddha 28 1 Shanti 34 2 Monica 29 3 Yogita 39

Example 2 (with index)

>>> data = {'Name':['Shraddha', 'Shanti', 'Monica',

'Yogita'],'Age':[28,34,29,39]}

>>> df = pd.DataFrame(data, index=['Friend1','Friend2','Relative1','Relative2'])

>>> df

Name Age

Friend1 Shraddha 28

Friend2

Shanti 34

Relative1 Monica 29

Relative2 Yogita 39

1.4 Create a DataFrame from List of Dictionaries

Here we are passing list of dictionary to create a DataFrame. The dictionary

keys are by default taken as column names.

Example 1:

>>> Mydict= [{'Won': 15, 'Loose': 2},{'Won': 5, 'Loose': 10}, {'Won': 8, 'Loose': 9},{'Won':4}]

>>> df = pd.DataFrame(Mydict) >>> df

Loose Won 0 2.0 15 1 10.0 5 2 9.0 8 3 NaN 4

Notice that Missing Value is stored as NaN (Not a Number)

Example 2:

>>> Mydict=[{'Won': 15, 'Loose': 2},{'Won': 5, 'Loose': 10},{'Won': 8, 'Loose':

9}] >>> df = pd.DataFrame(Mydict, index=['India', 'Pakistan','Autralia']) >>> df

Loose Won

India

2 15

Pakistan

10 5

Autralia

9 8

pythonclassroomdiary. by Sangeeta M Chauhan , PGT CS KV NO.3 Gwalior

Example 3 We can also create a DataFrame with by specifying list of dictionaries, row indices, and column indices.

>>> L_dict = [{'Maths': 78, 'Chemistry': 78,'Physics':87},{'Maths': 67, 'Chemistryb': 70},{'Physics':77,'Maths':87}]

A

>>> df1 = pd.DataFrame(L_dict, index=['Student1', 'Student2','Student3'],

columns=['Physics', 'Chemistry','Maths'])

>>> df1

Student1 Student2 Student3

Physics

87.0 NaN

77.0

Chemistry

78.0 NaN NaN

Maths

78 67 87

B

>>> df2 = pd.DataFrame(L_dict, index=['Student1', 'Student2','Student3'],

columns=['Chemistry','Maths'])

>>> df2

Student1 Student2 Student3

Chemistry

78.0 NaN NaN

Maths

78 67 87

>>> df3 = pd.DataFrame(L_dict, index=['Student1', 'Student2','Student3'],

C

columns=['English','Chemistry','Maths'])

>>> df3

Student1 Student2 Student3

English

NaN NaN NaN

Chemistry

78.0 NaN NaN

Maths

78 67 87

Observe the lines mentioned with A, B and C above.Output of A,B,C

are depends upon the COLUMNS MENTIONED while creating DataFrame. If

Dictionary Keys are matched with Columns specified then the

corresponding data will be shown. If columns mentioned are not

matched with Keys then NaN will be displayed

2. Addition of New Column & Row

2.1 Column Addition

>>> L_dict = [{'Maths': 78, 'Chemistry': 78,'Physics':87},{'Maths': 67, 'Chemistry': 70},{'Physics':77,'Maths':87,'Chemistry':90}]

df3 = pd.DataFrame(L_dict, index=['Student1', 'Student2','Student3'], columns=['English','Chemistry','Maths'])

>>> df3['Physics']=[45,56,65]

pythonclassroomdiary. by Sangeeta M Chauhan , PGT CS KV NO.3 Gwalior

>>> df3

A new column' Physics' has been added with new data

Student1 Student2 Student3

English NaN NaN NaN

Chemistry 78 70 90

Maths 78 67 87

Physics 45 56 65

We can Update column Data also by using same method

>>> df3['English']=[78,98,89]

>>> df3

English Chemistry Maths Physics

Student1

78

78

78

45

Student2

98

70

67

56

Student3

89

90

87

65

We can add new column using Data ,stored in existing Frame

>>> df3['Total']=df3.English+df3.Chemistry+df3.Maths+df3.Physics

>>> df3

Student1 Student2 Student3

English 78 98 89

Chemistry 78 70 90

Maths 78 67 87

Physics 45 56 65

Total 279 291 331

2.2 Row Addition

Look a new Column Total has been added with total of marks in other subjects

i.

To add row with by specifying row index

>>> df3.loc['Student4']=[45,67,45]

>>> df3

English Chemistry Maths

Student1

78

Student2

98

78

78

70

67

Student3

89

Student4

45

90

87

67

45

ii.

To add/Modify row with by specifying row index no.

>>> df3.iloc[3]=[45,67,45]

>>> df3

English Chemistry Maths

Student1

78

78

78

Student2

98

70

67

Student3

89

90

87

Student4

45

67

45

>>> df3.iloc[3]=[65,77,90]

>>> df3

Student1 Student2 Student3 Student4

English 78 98 89 65

Chemistry 78 70 90 77

Maths 78 67 87 90

pythonclassroomdiary. by Sangeeta M Chauhan , PGT CS KV NO.3 Gwalior

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download