Release 0.1.1 Andrew Straw, Florian Finkernagel
pydataframe Documentation
Release 0.1.1 Andrew Straw, Florian Finkernagel
February 21, 2014
1 Core classes 2 Functions 3 Reading and writing 4 Dialects 5 Indices and tables Python Module Index
Contents
1 5 7 9 11 13
i
ii
CHAPTER 1
Core classes
class DataFrame(value_dict=None, columns_ordered=None, row_names_ordered=None) An implemention of an almost R like DataFrame object. Usage: u = DataFrame( { "Field1": [1, 2, 3], "Field2": ['abc', 'def', 'hgi']}, optional: ['Field1', 'Field2'] ["rowOne", "rowTwo", "thirdRow"])
A DataFrame is basically a table with rows and columns. Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they're converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.__get_item) (it also works for setting values). Please note that any slice get's you another DataFrame, to access individual entries use get_row(), get_column(), get_value(). DataFrames also understand basic arithmetic and you can either add (multiply,...) a constant value, or another DataFrame of the same size / with the same column names. aggregate(key_vars, aggregation_function)
Iterate for every value combination of the key vars, call the aggregation_function with the sub-df. Take the returned dicts, turn them into a new df as2DMatrix(dtype=None) Return all columns as 2d(nRows, nCols)-numpy matrix. Please use pep 8 conform as_2d_matrix() Default dtype is float64 Raises a ValueError if not all columns could be converted. as_2d_matrix(dtype=None) Return all columns as 2d(nRows, nCols)-numpy matrix. Default dtype is float64 Raises a ValueError if not all columns could be converted. cbind_view(*others) Stack frames next to each other ( column wise ). Take frames with distinct fields, but identical row lengths, and stack them next to each other in order. The new DataFrame shares the values with its parents. convert_type(column_name, value_casting_func) Cast a column into another type copy() Return a deep copy of the DataFrame.
1
pydataframe Documentation, Release 0.1.1
digitize_column(column_name, bins=None, no_of_bins=None, min=None, max=None) Convert a column into a number of bin-ids
dim() Return (rowCount, columnCount)
For R compability.
drop_all_columns_except(*column_names) Remove all columns except those passed as par.ameters
drop_column(column_name) Remove a column from the DataFrame.
get_as_list_of_arrays_view() Return the data storage as [numpy.ma.array, numpy.ma.array,...].
get_as_list_of_lists() Return the data storage as [ [..., ],... ].
This is useful if you want to set from a DataFrame ignoring the column names:
df_one = DataFrame({"A": (1, 2),'B': (3, 4)}) df_two = DataFrame({"b": (1, 2), 'D': (3, 4)}) df_one[:,:] = df_two -> ValueError df_one[:, :] = df_two.get_as_list_of_lists() => DataFrame({"A": (2, 4), "B": (6, 8) })
get_column(column_name) Return a column as a copy of the actual values.
->numpy array
get_column_names() Return the column names (in order).
get_column_view(column_name) Returns a column directly from the internal storage.
->numpy array
get_row(row_idx) Return a row as a dictionary.
get_row_as_list(row_idx) Return a row as a list in order, without row names.
get_value(row_idx, column_name) Return the value of a cell.
->int/str/object... To set, use df[row_idx, column_name] = X
groupby(column_name) Yield (value, sub_df) for all values of column column_name, where sub_df[:,'column_name'] == value for each subset. Basically, itertools.groupby for dataframes. No prior sorting necessary.
has_row(row_name) check whether a certain row name or number exists
impose_partial_column_order(order, last_order=None) Order columns... first those in order, then everything not in order or last_order alphabetically, then last_order
insert_column(column_name, values, position='last') Insert a new column into your DataFrame.
2
Chapter 1. Core classes
pydataframe Documentation, Release 0.1.1
iter_columns() Return an iterator over the columns (arrays).
iter_rows() Return an iterator over the rows (dicts).
iter_rows_old() Return an iterator over the rows (dicts).
iter_values_columns_first() Return an iterator over all values. Iterates first column one, first row, then column one, second row...
iter_values_rows_first() Return an iterator over all values. Iterates first row column one, first row column two...
join_columns_on(other, name_here, name_there) Join to DataFrames on an column with common values
mean(field) Return the arithmetic mean (average) of a column.
mean_and_sem(field) Return mean and standard error of the mean.
mean_and_std(field) Return mean and standard deviation.
rankify_column(column_name, lower_is_better=True) Turn a column into a ranked order 0..len(self)
rbind_copy(*others) Stack frames below each other (rows) Take frames with the same fields, and stack them `below' each other. Gives you a copy of the data.
rename_column(old_name, new_name) Rename a column old_name may also be the number of a column.
set_column(column_name, value) Replace a column, or create a new one. Todo: Unittests
shallow_copy() Return a shallow copy of the DataFrame - ie. shared data columns, but differing objects
sort_by(field_or_fields, ascending=True) Sort a DataFrame by one or more fields. Direction (ascending) needs to be specified for each field. This is not an inplace sort, but returns a new DataFrame! You can sort by one field: sort_by(my_field, ascending=False) or by multiple fields: sort_by([my_fieldA, my_fieldB], [True, False]), but then you will need to pass in the order for each one as well.
turn_into_character(column_name, levels=None) Convert a level column into it's character values
3
pydataframe Documentation, Release 0.1.1
turn_into_level(column_name, levels=None) Convert a column into something that fit's into an R factor
types() Return a tuple of the types of the DataFrame
where(boolean_row_function) Return numpy.array(dtype=bool, data=[boolean_row_function(x) for x in self.iter_rows()) I.e. return a truth array, iterate over all rows and call boolean_row_function on each to fill array.
class Factor A Factor is a numpy array constrained to a few values that each have a unique label. map_level(level) Map a level to the appropriate stored value map_value(value) Map a value to a level (label)
4
Chapter 1. Core classes
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- interaction between sas and python for data handling and
- pandas under the hood
- release 0 1 1 andrew straw florian finkernagel
- program list python dataframe for practical file program
- python pandas quick guide university of utah
- 5 traversing dataframe elements using
- advanced tabular data processing with pandas
Related searches
- 192 1 or 2 33 33 1 0 0 0 1 1 1 default username and password
- 192 1 or 3 33 33 1 0 0 0 1 1 1 default username and password
- 192 1 or 2 735 735 1 0 0 0 1 1 1 default username and password
- 192 1 or 3 735 735 1 0 0 0 1 1 1 default username and password
- 192 1 or 2 372 372 1 0 0 0 1 1 1 default username and password
- 192 1 or 3 372 372 1 0 0 0 1 1 1 default username and password
- 192 1 or 2 64 64 1 0 0 0 1 1 1 default username and password
- 192 1 or 3 64 64 1 0 0 0 1 1 1 default username and password
- 192 1 or 2 142 142 1 0 0 0 1 1 1 admin username and password
- 192 1 or 3 142 142 1 0 0 0 1 1 1 admin username and password
- 192 1 or 2 291 291 1 0 0 0 1 1 1 admin username and password
- 192 1 or 2 221 221 1 0 0 0 1 1 1 username and password verizon