U n i t C o n v e r s i o n Tr i c k

[Pages:13]9/7/2020

pandas-unit-convert-Using-Pandas

Unit Conversion Trick

We may also want to do math on pandas data. So to do this I thought we could convert our units.

In [1]:

%matplotlib inline import pandas as pd import numpy as np import matplotlib.pylab as plt from scipy import stats from matplotlib.backends.backend_pdf import PdfPages

In [2]: df_well_data=pd.read_csv('well_data.csv')

I did not talk about units except for Arsenic. But we have the following elements and their units.

'Si':ppb, 'P':ppm, 'S':ppb, 'Ca':ppb, 'Fe':ppm, 'Ba':ppb, 'Na':ppm, 'Mg':ppb, 'K':ppb, 'Mn':ppm, 'As':ppb, 'Sr':ppb, 'F':ppm, 'Cl':ppm, 'SO4':ppm, 'Br':ppm

Remember back to chemistry........ ppm=parts per million = milligrams per liter = mg/l ppb=parts per billions = micrograms per liter = ug/l we can convert between these two. 1000 ppb = 1 ppm or to convert

mg

1000u

ug

1

= 1000

l

m

l

localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false

1/13

9/7/2020

pandas-unit-convert-Using-Pandas

So lets convert As to ppm. It is now ppb. We could just print it first

localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false

2/13

9/7/2020

pandas-unit-convert-Using-Pandas

In [3]: print(df_well_data.As/1000)

localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false

3/13

9/7/2020

pandas-unit-convert-Using-Pandas

0

NaN

1

NaN

2

NaN

3

0.078977

4

NaN

5

NaN

6

0.028071

7

NaN

8

0.096886

9

0.080627

10

NaN

11

0.077007

12

0.039250

13

0.131249

14

0.000177

15

NaN

16

NaN

17

0.147639

18

NaN

19

0.052427

20

NaN

21

NaN

22

NaN

23

NaN

24

0.005365

25

NaN

26

NaN

27

NaN

28

0.053098

29

NaN

...

729 0.165760

730

NaN

731 0.000180

732 0.093650

733 0.101430

734

NaN

735

NaN

736 0.087900

737

NaN

738 0.001170

739 0.085260

740 0.055750

741 0.026960

742

NaN

743

NaN

744

NaN

745

NaN

746

NaN

747

NaN

748 0.000120

749 0.031690

750 0.009150

751 0.026980

752 0.021740

753 0.117820

754 0.000130

localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false

4/13

9/7/2020

pandas-unit-convert-Using-Pandas

755 0.017390

756 0.112370

757 0.248930

758

NaN

Name: As, Length: 759, dtype: float64

that didn't change it. We could set a new column.

In [4]: df_well_data['As-ppm']=df_well_data.As/1000

In [5]: print(df_well_data.describe())

Well_ID

Lat

Lon

Depth

Si \

count

759.000000 759.000000 759.000000 759.000000 407.000000

mean

6417.088274 23.789249 90.641199 65.554677 40101.151444

std

6695.778189 0.578493 0.578800 42.186161 10117.680290

min

2.000000 22.780000 89.610000 0.000000 12605.576700

25%

4116.000000 23.285000 90.155000 45.000000 33200.310900

50%

5928.000000 23.790000 90.650000 50.000000 40021.490000

75%

8134.500000 24.300000 91.130000 70.000000 45369.825000

max 141499.000000 24.770000 91.650000 523.000000 70304.057950

P

S

Ca

Fe

Ba \

count 407.000000 407.000000

407.000000 407.000000 407.000000

mean

0.809323 3407.292389 41129.291921 5.556200 89.078507

std

0.902860 5364.247733 20161.130827 5.153779 54.172650

min

0.008210 -41.390000 3577.160000 -0.003680 5.630000

25%

0.151957 149.635000 26996.273955 1.706806 53.423976

50%

0.507850 1220.877945 40166.830000 3.931310 79.674488

75%

1.189271 4341.695000 52976.458285 8.531585 113.711543

max

5.477616 45035.460000 116040.620000 30.192230 293.440000

...

count

...

mean

...

std

...

min

...

25%

...

50%

...

75%

...

max

...

Mg

K

Mn

As \

407.000000 343.000000 407.000000 407.000000

20.487685 5068.337278 1.309343 89.688641

11.359487 5566.741424 0.978969 101.530582

2.490000

18.855854 0.000000 0.000000

14.020000 2804.793027 0.545000 14.026849

18.250000 3563.100000 1.183136 54.400000

24.780000 4979.045000 1.850000 129.433314

104.545670 44273.150000 6.271782 700.890000

count mean std min 25% 50% 75% max

Sr 407.000000 186.770328

90.501136 34.470000 119.685000 174.617542 233.535000 681.287906

F 413.000000

0.216924 0.194702 -0.010200 0.113000 0.166700 0.239700 1.595800

Cl 411.000000

42.371061 34.464444

0.226100 18.931200 31.298100 52.962050 217.525000

SO4 397.000000

9.300930 14.287154

0.120000 0.407100 3.344400 11.973300 125.317000

Br 405.000000

0.056895 0.087262 0.005800 0.019300 0.032700 0.053800 0.994800

As-ppm 407.000000

0.089689 0.101531 0.000000 0.014027 0.054400 0.129433 0.700890

[8 rows x 21 columns]

localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false

5/13

9/7/2020

pandas-unit-convert-Using-Pandas

or we could have just looked at column titles

In [6]: df_well_data.columns

Out[6]: Index(['Well_ID', 'Lat', 'Lon', 'Depth', 'Drink', 'Si', 'P', 'S', 'Ca', 'Fe', 'Ba', 'Na', 'Mg', 'K', 'Mn', 'As', 'Sr', 'F', 'Cl', 'SO4', 'Br', 'As-ppm'],

dtype='object')

We could delete our new column now In [7]: del df_well_data['As-ppm']

We could also just convert the As column itself.

In [8]: df_well_data.As/=1000

In [9]: print (df_well_data.As.describe())

count 407.000000

mean

0.089689

std

0.101531

min

0.000000

25%

0.014027

50%

0.054400

75%

0.129433

max

0.700890

Name: As, dtype: float64

It worked wonderfully.

localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false

6/13

9/7/2020

pandas-unit-convert-Using-Pandas

We sometimes want molar concentrations. I can't cross out the math on the screen but we can use the elemental weights to convert from parts per billion to micromolar from parts per million to millimolar. for example

mg

1M ole

mM

=

= mM

l

gramsinonemole

l

The average arsenic is 0.0897 mg/l

mg

M

0.0897

= 0.001197mM

l

74.921

Now lets try on our data

In [10]: df_well_data.As/=74.921 print(df_well_data.As.describe())

count 407.000000

mean

0.001197

std

0.001355

min

0.000000

25%

0.000187

50%

0.000726

75%

0.001728

max

0.009355

Name: As, dtype: float64

It worked beautifully. But that number is too small. So I will convert to micromolar by multibplying by 1000.

In [11]: df_well_data.As*=1000 print (df_well_data.As.describe())

count 407.000000

mean

1.197109

std

1.355169

min

0.000000

25%

0.187222

50%

0.726098

75%

1.727597

max

9.355054

Name: As, dtype: float64

A big be careful!!!! If you keep rerunning a cell it will do the math over and over again!!!!

localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false

7/13

9/7/2020

In [ ]:

pandas-unit-convert-Using-Pandas

But we have up to 16 elements we may want to convert. Could we convert them all ot once.

I used to do this using a dict or dictionary. And it worked well. I learned this at codeacademy. But I was thinking some this year. I thought we could do it better using pandas. How do we do it? What we are really doing is tracking our "metadata".

Make an excel file we can continuously update In it list the Element, Its Formula Weight, and what I call Unit for fancy printing. I called it UnitConversion.xlsx () we can read it in with pd.read_excel and we can set the index to the Element Setting the index to the Element makes it easy to search! For the units I used a lot of different formatting to make it fancy so you can see how it works. It uses Latex

In [12]: #I am just re-reading in the data to start fresh. df_well_data=pd.read_csv('well_data.csv')

In [13]: df_units=pd.read_excel('UnitConversion.xlsx',index_col='Element')

localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false

8/13

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download