P a n d a s Ti m e S e r i e s
[Pages:26]9/7/2020
pandastimeseries
Pandas Time Series
Today we are going to look at CO data and try to learn about time series. I will introduce it and then you will be 2
in charge of analyzing CO data from Mauna Lao. We are going to start by looking at the data from the Scripps 2
Pier.
In [1]:
%matplotlib inline import pandas as pd import numpy as np import matplotlib.pylab as plt from scipy import stats from matplotlib.backends.backend_pdf import PdfPages
First go get CO2 data for Scripps. () and the La Jolla Pier.
I did flask daily values. Download and Open the CSV. rename the columns and called them Date, HR, Excel, Year, Flask, Flag, CO2. Then we can skip the rows and columns we don't want.
I kept all the information in place and used skiprows to skip what we don't want. () T
Start by just reading in the data. do you get the data? Then get more clever as you go. Here is the read_csv information. () or ()
You just need to try and do to learn.
In [5]: df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68)
Flashback. Remember I am just printing head() to save space on the printouts
In [6]: print (df_scripps.head())
Date 0 10/23/1968 1 10/31/1968 2 1/13/1969 3 1/17/1969 4 1/20/1969
Hr 11:10 14:44 15:00 15:24 15:42
Excel 25134.47 25142.61 25216.62 25220.64 25223.65
Year 1968.810 1968.832 1969.035 1969.046 1969.054
Flasks 0 0 2 3 0
Flags 4 4 0 0 4
CO2 369.38 332.92 333.00 331.50 326.87
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandastimeseries.ipynb?download=false
1/26
9/7/2020
pandastimeseries
But we only want to the Date, Hr, Flags, and CO2. So just grab those columns using use_cols
I just learned a neat trick and the reason why my printouts might look different then yours sometimes. Sometimes when I print it actually just shows () which can be useful. so I am going to use this call to get us smaller descriptive outputs.
In [7]: df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68,usecols=[0,1,5,6]) print (df_())
RangeIndex: 1244 entries, 0 to 1243
Data columns (total 4 columns):
Date
1244 non-null object
Hr
1244 non-null object
Flags 1244 non-null int64
CO2
1244 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 39.0+ KB
None
This does the same thing but using column names.
In [8]: df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68,usecols=['Date','Hr','Flag s','CO2']) print (df_())
RangeIndex: 1244 entries, 0 to 1243
Data columns (total 4 columns):
Date
1244 non-null object
Hr
1244 non-null object
Flags 1244 non-null int64
CO2
1244 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 39.0+ KB
None
Now lets turn the Date into a datetime index. It is critical to get a datetime index. I can't stress that enough. So you can make sure it
Remember you need a date time index!
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandastimeseries.ipynb?download=false
2/26
9/7/2020
pandastimeseries
Remember what I said. The date time index is critical and can really mess you up if you don't have it!
Don't forget this!!!
First we are making Date the index column
In [9]: df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68,usecols=[0,1,5,6],index_col ='Date') print (df_())
Index: 1244 entries, 10/23/1968 to 12/17/2014
Data columns (total 3 columns):
Hr
1244 non-null object
Flags 1244 non-null int64
CO2
1244 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 38.9+ KB
None
We now have the Dates as an index but we don't have a datetime index. We need that! parse_dates tells it it is a date
CRITICAL!
In [11]: df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68,usecols=[0,1,5,6]\ ,index_col='Date',parse_dates=True)
print (df_())
DatetimeIndex: 1244 entries, 1968-10-23 to 2014-12-17
Data columns (total 3 columns):
Hr
1244 non-null object
Flags 1244 non-null int64
CO2
1244 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 38.9+ KB
None
Now we have a datetime index! See how it says Datetimeindex
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandastimeseries.ipynb?download=false
3/26
9/7/2020
pandastimeseries
Now we have that. We could have also. I am now showing you some different tricks to do the same thing. These might come in handy later
In [12]: df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68,usecols=[0,1,5,6],\ parse_dates=True,index_col='Date')
print (df_())
DatetimeIndex: 1244 entries, 1968-10-23 to 2014-12-17
Data columns (total 3 columns):
Hr
1244 non-null object
Flags 1244 non-null int64
CO2
1244 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 38.9+ KB
None
But you can also just read in and deal with it later. First we can set the Date to a Datetime. Then set that to an index.
In [13]:
df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68,usecols=[0,1,5,6]) df_scripps.Date=pd.to_datetime(scripps.Date) df_scripps.set_index('Date',inplace=True) print (df_())
DatetimeIndex: 1244 entries, 1968-10-23 to 2014-12-17
Data columns (total 3 columns):
Hr
1244 non-null object
Flags 1244 non-null int64
CO2
1244 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 38.9+ KB
None
We could also do it all at once.
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandastimeseries.ipynb?download=false
4/26
9/7/2020
pandastimeseries
In [14]: df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68,usecols=[0,1,5,6]) df_scripps=df_scripps.set_index(pd.to_datetime(df_scripps.Date)) print (df_())
DatetimeIndex: 1244 entries, 1968-10-23 to 2014-12-17
Data columns (total 4 columns):
Date
1244 non-null object
Hr
1244 non-null object
Flags 1244 non-null int64
CO2
1244 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 48.6+ KB
None
But back to reading in. We can also add the hours to the date and set the index and dateime all at once. BUT YOU NEED the double brackets on parse dates for this to work because it is a list you are using
In [15]:
df_scripps=pd.read_csv('fldav_ljo.csv',skiprows=68,usecols=[0,1,5,6],parse_dat es=[['Date','Hr']]
,index_col='Date_Hr') print (df_())
DatetimeIndex: 1244 entries, 1968-10-23 11:10:00 to 2014-12-17 17:25:00
Data columns (total 2 columns):
Flags 1244 non-null int64
CO2
1244 non-null float64
dtypes: float64(1), int64(1)
memory usage: 29.2 KB
None
This is a fun trick. We can use python to grab the csv file for us off of the website! No need to download. But something is wrong with the file and we will lose the first row of data. I needed to change to 69 skip rows and then you can see the problem with the names.
In [24]:
url=' c/daily_co2/fldav_ljo.csv' df_scripps=pd.read_csv(url,skiprows=69,usecols=[0,1,5,6]) print (df_())
RangeIndex: 1311 entries, 0 to 1310
Data columns (total 4 columns):
1968-10-23 1311 non-null object
11:10
1311 non-null object
4
1311 non-null int64
369.38
1311 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 41.0+ KB
None
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandastimeseries.ipynb?download=false
5/26
9/7/2020
pandastimeseries
Definately funny
In [26]:
url=' c/daily_co2/fldav_ljo.csv' df_scripps=pd.read_csv(url,skiprows=69,usecols=[0,1,5,6],parse_dates=[[0,1]],i ndex_col=[0]) print (df_())
DatetimeIndex: 1311 entries, 1968-10-31 14:44:00 to 2018-05-03 14:49:00
Data columns (total 2 columns):
4
1311 non-null int64
369.38 1311 non-null float64
dtypes: float64(1), int64(1)
memory usage: 30.7 KB
None
Now we can rename our columns and lose our first datapoint
In [28]:
url=' c/daily_co2/fldav_ljo.csv' df_scripps=pd.read_csv(url,skiprows=69,usecols=[0,1,5,6],parse_dates=[[0,1]],i ndex_col=[0]) df_scripps.columns=['Flags','CO2'] print (df_())
DatetimeIndex: 1311 entries, 1968-10-31 14:44:00 to 2018-05-03 14:49:00
Data columns (total 2 columns):
Flags 1311 non-null int64
CO2
1311 non-null float64
dtypes: float64(1), int64(1)
memory usage: 30.7 KB
None
You are now an expert at getting data in. You can figure it out! Lets plot the date.
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandastimeseries.ipynb?download=false
6/26
9/7/2020
pandastimeseries
In [29]: df_scripps.plot() Out[29]:
Lets just plot the CO2 data
In [30]: df_scripps.CO2.plot() Out[30]:
There is obviosly some bad data points. Go look at the csv file header and figure out which values are good and only keep the good data. Good data has a flag of 0. I am going to drop all the other data
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandastimeseries.ipynb?download=false
7/26
9/7/2020
pandastimeseries
In [32]: df_scripps=df_scripps[df_scripps.Flags==0] df_scripps.CO2.plot()
Out[32]:
Now that is much nicer!
Now can you plot just the data from your birth year? In [33]: df_scripps['1972'].CO2.plot() Out[33]:
Remember all your slicing? We can also now slice by Date!!! Now just plot it for the years you were in high school
localhost:8888/nbconvert/html/python/fall20/BigDataPython/pandastimeseries.ipynb?download=false
8/26
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- unscramble d l s e e t
- unscramble d m e n c i
- r l g e b i unscramble
- d u m b s underground bases
- veggietales where s god when i m s scared
- r i c e chemo
- s e l b r m a
- i e yah shua holy spirit i e 577 02 9006 i e yahshua 577029006
- i e yah shua 577029006 i e yah 5 7 7 0 2 9 0 0 6 i e holy spirit
- i e yah shua 577029006 i e yah 577 02 9006 i e yah holy spirit
- i e yah shua 577029006 holy spirit i e god the father i e yahshua 577 02 9006
- i e yah shua 577029006 i e yah shua father mother i e god the father i e