Data Transformation with data.table :: CHEAT SHEET

嚜澳ata Transformation with data.table : : CHEAT SHEET

Basics

Manipulate columns with j

data.table is an extremely fast and memory efficient package

for transforming data in R. It works by converting R*s native

data frame objects into data.tables with new and enhanced

functionality. The basics of working with data.tables are:

EXTRACT

b c

SUMMARIZE

dt[, .(x = sum(a))] 每 create a data.table with new

columns based on the summarized values of rows.

x

Summary functions like mean(), median(), min(),

max(), etc. may be used to summarize rows.

COMPUTE COLUMNS*

setDT(df)* or as.data.table(df) 每 convert a data frame or a list to

a data.table.

a

2

1

a

2

1

dt[1:2, ] 每 subset rows based on row numbers.

a

6

dt[a > 5, ] 每 subset rows based on values in

one or more columns.

dt[, j, keyby = .(a)] 每 group and

simultaneously sort rows according

to values in specified column(s).

COMMON GROUPED OPERATIONS

dt[, .(c = sum(b)), by = a] 每 summarize rows within groups.

Create a data.table

a

2

6

5

dt[, .(b, c)] 每 extract column(s) by name.

b c

a

Subset rows using i

dt[, j, by = .(a)] 每 group rows by

values in specified column(s).

a

dt[, c(2)] 每 extract column(s) by number. Prefix

column numbers with ※-§ to drop.

Take data.table dt,

subset rows using i,

and manipulate columns with j,

grouped according to by.

data.table(a = c(1, 2), b = c("a", "b")) 每 create a data.table from

scratch. Analogous to data.frame().

a

a

dt[i, j, by]

data.tables are also data frames 每 functions that work with data

frames therefore also work with data.tables.

Group according to by

c

3

3

dt[, c := 1 + 2] 每 compute a column based on an

expression.

c

NA

3

dt[a == 1, c := 1 + 2] 每 compute a column based

on an expression but only for a subset of rows.

c d

1 2

1 2

dt[, `:=`(c = 1 , d = 2)] 每 compute multiple

columns based on separate expressions.

dt[, .SD[1], by = a] 每 extract first row of groups.

dt[, .SD[.N], by = a] 每 extract last row of groups.

Chaining

dt[#][#] 每 perform a sequence of data.table operations by

chaining multiple ※[]§.

Functions for data.tables

REORDER

a

1

2

1

DELETE COLUMN

dt[, c := NULL] 每 delete a column.

c

dt[, c := sum(b), by = a] 每 create a new column and compute rows

within groups.

b

2

2

1

a

1

1

2

b

2

1

2

setorder(dt, a, -b) 每 reorder a data.table

according to specified columns. Prefix

column names with ※-§ for descending

order.

* SET FUNCTIONS AND :=

LOGICAL OPERATORS TO USE IN i

<

>

=

is.na() %in%

!is.na() !

CONVERT COLUMN TYPE

|

&

%like%

%between%

b

1.5

2.6

b

1

2

dt[, b := as.integer(b)] 每 convert the type of a

column using as.integer(), as.numeric(),

as.character(), as.Date(), etc..

data.table*s functions prefixed with ※set§ and the operator ※:=§

work without ※ ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download