U n i t C o n v e r s i o n Tr i c k
[Pages:13]9/7/2020
pandas-unit-convert-Using-Pandas
Unit Conversion Trick
We may also want to do math on pandas data. So to do this I thought we could convert our units.
In [1]:
%matplotlib inline import pandas as pd import numpy as np import matplotlib.pylab as plt from scipy import stats from matplotlib.backends.backend_pdf import PdfPages
In [2]: df_well_data=pd.read_csv('well_data.csv')
I did not talk about units except for Arsenic. But we have the following elements and their units.
'Si':ppb, 'P':ppm, 'S':ppb, 'Ca':ppb, 'Fe':ppm, 'Ba':ppb, 'Na':ppm, 'Mg':ppb, 'K':ppb, 'Mn':ppm, 'As':ppb, 'Sr':ppb, 'F':ppm, 'Cl':ppm, 'SO4':ppm, 'Br':ppm
Remember back to chemistry........ ppm=parts per million = milligrams per liter = mg/l ppb=parts per billions = micrograms per liter = ug/l we can convert between these two. 1000 ppb = 1 ppm or to convert
mg
1000u
ug
1
= 1000
l
m
l
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false
1/13
9/7/2020
pandas-unit-convert-Using-Pandas
So lets convert As to ppm. It is now ppb. We could just print it first
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false
2/13
9/7/2020
pandas-unit-convert-Using-Pandas
In [3]: print(df_well_data.As/1000)
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false
3/13
9/7/2020
pandas-unit-convert-Using-Pandas
0
NaN
1
NaN
2
NaN
3
0.078977
4
NaN
5
NaN
6
0.028071
7
NaN
8
0.096886
9
0.080627
10
NaN
11
0.077007
12
0.039250
13
0.131249
14
0.000177
15
NaN
16
NaN
17
0.147639
18
NaN
19
0.052427
20
NaN
21
NaN
22
NaN
23
NaN
24
0.005365
25
NaN
26
NaN
27
NaN
28
0.053098
29
NaN
...
729 0.165760
730
NaN
731 0.000180
732 0.093650
733 0.101430
734
NaN
735
NaN
736 0.087900
737
NaN
738 0.001170
739 0.085260
740 0.055750
741 0.026960
742
NaN
743
NaN
744
NaN
745
NaN
746
NaN
747
NaN
748 0.000120
749 0.031690
750 0.009150
751 0.026980
752 0.021740
753 0.117820
754 0.000130
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false
4/13
9/7/2020
pandas-unit-convert-Using-Pandas
755 0.017390
756 0.112370
757 0.248930
758
NaN
Name: As, Length: 759, dtype: float64
that didn't change it. We could set a new column.
In [4]: df_well_data['As-ppm']=df_well_data.As/1000
In [5]: print(df_well_data.describe())
Well_ID
Lat
Lon
Depth
Si \
count
759.000000 759.000000 759.000000 759.000000 407.000000
mean
6417.088274 23.789249 90.641199 65.554677 40101.151444
std
6695.778189 0.578493 0.578800 42.186161 10117.680290
min
2.000000 22.780000 89.610000 0.000000 12605.576700
25%
4116.000000 23.285000 90.155000 45.000000 33200.310900
50%
5928.000000 23.790000 90.650000 50.000000 40021.490000
75%
8134.500000 24.300000 91.130000 70.000000 45369.825000
max 141499.000000 24.770000 91.650000 523.000000 70304.057950
P
S
Ca
Fe
Ba \
count 407.000000 407.000000
407.000000 407.000000 407.000000
mean
0.809323 3407.292389 41129.291921 5.556200 89.078507
std
0.902860 5364.247733 20161.130827 5.153779 54.172650
min
0.008210 -41.390000 3577.160000 -0.003680 5.630000
25%
0.151957 149.635000 26996.273955 1.706806 53.423976
50%
0.507850 1220.877945 40166.830000 3.931310 79.674488
75%
1.189271 4341.695000 52976.458285 8.531585 113.711543
max
5.477616 45035.460000 116040.620000 30.192230 293.440000
...
count
...
mean
...
std
...
min
...
25%
...
50%
...
75%
...
max
...
Mg
K
Mn
As \
407.000000 343.000000 407.000000 407.000000
20.487685 5068.337278 1.309343 89.688641
11.359487 5566.741424 0.978969 101.530582
2.490000
18.855854 0.000000 0.000000
14.020000 2804.793027 0.545000 14.026849
18.250000 3563.100000 1.183136 54.400000
24.780000 4979.045000 1.850000 129.433314
104.545670 44273.150000 6.271782 700.890000
count mean std min 25% 50% 75% max
Sr 407.000000 186.770328
90.501136 34.470000 119.685000 174.617542 233.535000 681.287906
F 413.000000
0.216924 0.194702 -0.010200 0.113000 0.166700 0.239700 1.595800
Cl 411.000000
42.371061 34.464444
0.226100 18.931200 31.298100 52.962050 217.525000
SO4 397.000000
9.300930 14.287154
0.120000 0.407100 3.344400 11.973300 125.317000
Br 405.000000
0.056895 0.087262 0.005800 0.019300 0.032700 0.053800 0.994800
As-ppm 407.000000
0.089689 0.101531 0.000000 0.014027 0.054400 0.129433 0.700890
[8 rows x 21 columns]
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false
5/13
9/7/2020
pandas-unit-convert-Using-Pandas
or we could have just looked at column titles
In [6]: df_well_data.columns
Out[6]: Index(['Well_ID', 'Lat', 'Lon', 'Depth', 'Drink', 'Si', 'P', 'S', 'Ca', 'Fe', 'Ba', 'Na', 'Mg', 'K', 'Mn', 'As', 'Sr', 'F', 'Cl', 'SO4', 'Br', 'As-ppm'],
dtype='object')
We could delete our new column now In [7]: del df_well_data['As-ppm']
We could also just convert the As column itself.
In [8]: df_well_data.As/=1000
In [9]: print (df_well_data.As.describe())
count 407.000000
mean
0.089689
std
0.101531
min
0.000000
25%
0.014027
50%
0.054400
75%
0.129433
max
0.700890
Name: As, dtype: float64
It worked wonderfully.
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false
6/13
9/7/2020
pandas-unit-convert-Using-Pandas
We sometimes want molar concentrations. I can't cross out the math on the screen but we can use the elemental weights to convert from parts per billion to micromolar from parts per million to millimolar. for example
mg
1M ole
mM
=
= mM
l
gramsinonemole
l
The average arsenic is 0.0897 mg/l
mg
M
0.0897
= 0.001197mM
l
74.921
Now lets try on our data
In [10]: df_well_data.As/=74.921 print(df_well_data.As.describe())
count 407.000000
mean
0.001197
std
0.001355
min
0.000000
25%
0.000187
50%
0.000726
75%
0.001728
max
0.009355
Name: As, dtype: float64
It worked beautifully. But that number is too small. So I will convert to micromolar by multibplying by 1000.
In [11]: df_well_data.As*=1000 print (df_well_data.As.describe())
count 407.000000
mean
1.197109
std
1.355169
min
0.000000
25%
0.187222
50%
0.726098
75%
1.727597
max
9.355054
Name: As, dtype: float64
A big be careful!!!! If you keep rerunning a cell it will do the math over and over again!!!!
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false
7/13
9/7/2020
In [ ]:
pandas-unit-convert-Using-Pandas
But we have up to 16 elements we may want to convert. Could we convert them all ot once.
I used to do this using a dict or dictionary. And it worked well. I learned this at codeacademy. But I was thinking some this year. I thought we could do it better using pandas. How do we do it? What we are really doing is tracking our "metadata".
Make an excel file we can continuously update In it list the Element, Its Formula Weight, and what I call Unit for fancy printing. I called it UnitConversion.xlsx () we can read it in with pd.read_excel and we can set the index to the Element Setting the index to the Element makes it easy to search! For the units I used a lot of different formatting to make it fancy so you can see how it works. It uses Latex
In [12]: #I am just re-reading in the data to start fresh. df_well_data=pd.read_csv('well_data.csv')
In [13]: df_units=pd.read_excel('UnitConversion.xlsx',index_col='Element')
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandas-unit-convert-Using-Pandas.ipynb?download=false
8/13
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- pandas a foundational python library for data analysis and statistics
- using the dataiku dss python api for interfacing with sql databases
- an open source solar power forecasting tool using pvlib python
- pandas dataframe notes
- iso 8601 an international standard for date and time formats
- 56 proc of the 9th python in science conf scipy 2010 data
- u n i t c o n v e r s i o n tr i c k
- programming principles in python csci 503 490
- date timestamp guidelines
- dynamic mbuf