Release 0.1.1 Andrew Straw, Florian Finkernagel

pydataframe Documentation

Release 0.1.1 Andrew Straw, Florian Finkernagel

February 21, 2014

1 Core classes 2 Functions 3 Reading and writing 4 Dialects 5 Indices and tables Python Module Index

Contents

1 5 7 9 11 13

i

ii

CHAPTER 1

Core classes

class DataFrame(value_dict=None, columns_ordered=None, row_names_ordered=None) An implemention of an almost R like DataFrame object. Usage: u = DataFrame( { "Field1": [1, 2, 3], "Field2": ['abc', 'def', 'hgi']}, optional: ['Field1', 'Field2'] ["rowOne", "rowTwo", "thirdRow"])

A DataFrame is basically a table with rows and columns. Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they're converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.__get_item) (it also works for setting values). Please note that any slice get's you another DataFrame, to access individual entries use get_row(), get_column(), get_value(). DataFrames also understand basic arithmetic and you can either add (multiply,...) a constant value, or another DataFrame of the same size / with the same column names. aggregate(key_vars, aggregation_function)

Iterate for every value combination of the key vars, call the aggregation_function with the sub-df. Take the returned dicts, turn them into a new df as2DMatrix(dtype=None) Return all columns as 2d(nRows, nCols)-numpy matrix. Please use pep 8 conform as_2d_matrix() Default dtype is float64 Raises a ValueError if not all columns could be converted. as_2d_matrix(dtype=None) Return all columns as 2d(nRows, nCols)-numpy matrix. Default dtype is float64 Raises a ValueError if not all columns could be converted. cbind_view(*others) Stack frames next to each other ( column wise ). Take frames with distinct fields, but identical row lengths, and stack them next to each other in order. The new DataFrame shares the values with its parents. convert_type(column_name, value_casting_func) Cast a column into another type copy() Return a deep copy of the DataFrame.

1

pydataframe Documentation, Release 0.1.1

digitize_column(column_name, bins=None, no_of_bins=None, min=None, max=None) Convert a column into a number of bin-ids

dim() Return (rowCount, columnCount)

For R compability.

drop_all_columns_except(*column_names) Remove all columns except those passed as par.ameters

drop_column(column_name) Remove a column from the DataFrame.

get_as_list_of_arrays_view() Return the data storage as [numpy.ma.array, numpy.ma.array,...].

get_as_list_of_lists() Return the data storage as [ [..., ],... ].

This is useful if you want to set from a DataFrame ignoring the column names:

df_one = DataFrame({"A": (1, 2),'B': (3, 4)}) df_two = DataFrame({"b": (1, 2), 'D': (3, 4)}) df_one[:,:] = df_two -> ValueError df_one[:, :] = df_two.get_as_list_of_lists() => DataFrame({"A": (2, 4), "B": (6, 8) })

get_column(column_name) Return a column as a copy of the actual values.

->numpy array

get_column_names() Return the column names (in order).

get_column_view(column_name) Returns a column directly from the internal storage.

->numpy array

get_row(row_idx) Return a row as a dictionary.

get_row_as_list(row_idx) Return a row as a list in order, without row names.

get_value(row_idx, column_name) Return the value of a cell.

->int/str/object... To set, use df[row_idx, column_name] = X

groupby(column_name) Yield (value, sub_df) for all values of column column_name, where sub_df[:,'column_name'] == value for each subset. Basically, itertools.groupby for dataframes. No prior sorting necessary.

has_row(row_name) check whether a certain row name or number exists

impose_partial_column_order(order, last_order=None) Order columns... first those in order, then everything not in order or last_order alphabetically, then last_order

insert_column(column_name, values, position='last') Insert a new column into your DataFrame.

2

Chapter 1. Core classes

pydataframe Documentation, Release 0.1.1

iter_columns() Return an iterator over the columns (arrays).

iter_rows() Return an iterator over the rows (dicts).

iter_rows_old() Return an iterator over the rows (dicts).

iter_values_columns_first() Return an iterator over all values. Iterates first column one, first row, then column one, second row...

iter_values_rows_first() Return an iterator over all values. Iterates first row column one, first row column two...

join_columns_on(other, name_here, name_there) Join to DataFrames on an column with common values

mean(field) Return the arithmetic mean (average) of a column.

mean_and_sem(field) Return mean and standard error of the mean.

mean_and_std(field) Return mean and standard deviation.

rankify_column(column_name, lower_is_better=True) Turn a column into a ranked order 0..len(self)

rbind_copy(*others) Stack frames below each other (rows) Take frames with the same fields, and stack them `below' each other. Gives you a copy of the data.

rename_column(old_name, new_name) Rename a column old_name may also be the number of a column.

set_column(column_name, value) Replace a column, or create a new one. Todo: Unittests

shallow_copy() Return a shallow copy of the DataFrame - ie. shared data columns, but differing objects

sort_by(field_or_fields, ascending=True) Sort a DataFrame by one or more fields. Direction (ascending) needs to be specified for each field. This is not an inplace sort, but returns a new DataFrame! You can sort by one field: sort_by(my_field, ascending=False) or by multiple fields: sort_by([my_fieldA, my_fieldB], [True, False]), but then you will need to pass in the order for each one as well.

turn_into_character(column_name, levels=None) Convert a level column into it's character values

3

pydataframe Documentation, Release 0.1.1

turn_into_level(column_name, levels=None) Convert a column into something that fit's into an R factor

types() Return a tuple of the types of the DataFrame

where(boolean_row_function) Return numpy.array(dtype=bool, data=[boolean_row_function(x) for x in self.iter_rows()) I.e. return a truth array, iterate over all rows and call boolean_row_function on each to fill array.

class Factor A Factor is a numpy array constrained to a few values that each have a unique label. map_level(level) Map a level to the appropriate stored value map_value(value) Map a value to a level (label)

4

Chapter 1. Core classes

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download