Csvcols Documentation

csvcols Documentation

Release 0.3 David Ormsbee

September 11, 2012

1 Recommendations 2 Warnings 3 Reference 4 Indices and tables Python Module Index

CONTENTS

3 5 7 11 13

i

ii

csvcols Documentation, Release 0.3

This library takes a column-oriented approach towards CSV data. Everything is stored internally as Unicode, and everything is outwardly immutable. It has support for:

? Parsing CSV files, including some Excel exported quirks ? Selecting and renaming columns ? Transforming documents by column ? Re-sorting a document by columns or rows ? Creating new documents by appending old ones together ? Merging rows CSV files are everywhere and every language has a library to read them row by row. But sometimes that's not the best way to look at it. You often want to make manipulations, transform or make rule checks on certain columns. If you keep the row by row model, then you just end up trying to jam everything into a single pass over the data. Or maybe you suck up everything into a 2D data structure and edit it in several passes. But then you start having side-effects, and you're not sure what changed what. Then you want to add a new rule that requires data from an older pass through the data, and you start making temporary data structures to hold the values of special columns or rows. I've had the 800 lb gorilla version of this thrown on my lap. It's a maintenance nightmare, and my frustrations with the code base inspired the creation of this library. The library in a nutshell:

import csvcols from csvcols import Column, S # S = shorthand for Selector

# Read Document from file. If encoding is not specified, UTF-8 is assumed. raw_shipping_doc = csvcols.load("shipping_orders.csv", encoding='latin-1')

# Select a subset of the columns and make them into a new Document. While # we're doing this, we can rename or transform Columns. users_doc = raw_shipping_doc.select(

S("email", transform=unicode.lower), S("BILLING_LAST", rename="last_name", transform=unicode.title), S("BILLING_FIRST", rename="first_name"), ("CUSTOM 1", "special_notes"), # We can use tuples for renames as well "country" # Or simple strings if we don't want to do any transforms )

# If the email, last name, and first initial match, merge the records # together, and keep the longer first name. By default, this sorts as well. merged_doc = users_doc.merge_rows_on(

lambda row: (row.email, row.last_name, row.first_name[0]), lambda r1, r2: r1 if len(r1.first_name) > len(r2.first_name) else r2 )

# Create a new Column based on existing data. is_edu_user_col = Column("Y" if s.endswith(".edu") else "N"

for s in merged_doc.email)

# Append this new column to the doc (note: this creates a new doc) final_doc = merged_doc + ("is_edu_user", is_edu_user_col)

print cvscols.dumps(final_doc)

CONTENTS

1

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download