Advanced Data Management (CSCI 490/680)
Advanced Data Management (CSCI 490/680)
Data Cleaning
Dr. David Koop
D. Koop, CSCI 680/490, Spring 2021
Comma-separated values (CSV) Format
? Comma is a eld separator, newlines denote records
- a,b,c,d,message 1,2,3,4,hello 5,6,7,8,world 9,10,11,12,foo
? May have a header (a,b,c,d,message), but not required ? No type information: we do not know what the columns are (numbers,
strings, oating point, etc.) - Default: just keep everything as a string - Type inference: Figure out the type to make each column based on values ? What about commas in a value? double quotes
D. Koop, CSCI 680/490, Spring 2021
2
if lf
Reading & Writing Data in Pandas
Format Tteyxpt e text text text text
binary binary binary binary binary binary binary binary binary binary SQL SQL
Data Description CSV Fixed-Width Text File JSON HTML Local clipboard MS Excel OpenDocument HDF5 Format Feather Format Parquet Format ORC Format Msgpack Stata SAS SPSS Python Pickle Format SQL Google BigQuery
D. Koop, CSCI 680/490, Spring 2021
Reader read_csv read_fwf read_json read_html read_clipboard read_excel read_excel read_hdf read_feather read_parquet read_orc read_msgpack read_stata read_sas read_spss read_pickle read_sql read_gbq
Writer to_csv
to_json to_html to_clipboard to_excel
to_hdf to_feather to_parquet
to_msgpack to_stata
to_pickle to_sql to_gbq
[]
3
read_csv
? Convenient method to read csv les ? Lots of different options to help get data into the desired format ? Basic: df = pd.read_csv(fname) ? Parameters:
- path: where to read the data from - sep (or delimiter): the delimiter (',', ' ', '\t', '\s+') - header: if None, no header - index_col: which column to use as the row index - names: list of header names (e.g. if the le has no header) - skiprows: number of list of lines to skip
D. Koop, CSCI 680/490, Spring 2021
4
if
if
Writing CSV data with pandas
? Basic: df.to_csv() ? Change delimiter with sep kwarg:
- df.to_csv('example.dsv', sep='|')
? Change missing value representation
- df.to_csv('example.dsv', na_rep='NULL')
? Don't write row or column labels:
- df.to_csv('example.csv', index=False, header=False)
? Series may also be written to csv
D. Koop, CSCI 680/490, Spring 2021
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- 1 pandas 1 introduction
- multi hypothesisparsingoftabular dataincomma
- python data representations
- pandastable documentation read the docs
- chapter 14 data wrangling munging processing and
- advanced data management csci 490 680
- dsc 201 data analysis visualization
- data analysis
- outputin python
- programming principles in python csci 503
Related searches
- advanced financial management pdf
- advanced financial management books
- advanced financial management acca pdf
- advanced financial management book pdf
- advanced financial management notes pdf
- advanced financial management final exam
- advanced financial management books pdf
- advanced financial management book
- advanced financial management course
- advanced financial management notes
- advanced property management naples
- advanced property management hays ks