Data Transformation with data.table :: CHEAT SHEET - BeOptimized

Data Transformation with data.table : : CHEAT SHEET

Basics

Manipulate columns with j

data.table is an extremely fast and memory efficient package

for transforming data in R. It works by converting R��s native

data frame objects into data.tables with new and enhanced

functionality. The basics of working with data.tables are:

EXTRACT

b c

SUMMARIZE

dt[, .(x = sum(a))] �C create a data.table with new

columns based on the summarized values of rows.

x

Summary functions like mean(), median(), min(),

max(), etc. may be used to summarize rows.

COMPUTE COLUMNS*

setDT(df)* or as.data.table(df) �C convert a data frame or a list to

a data.table.

a

2

1

a

2

1

dt[1:2, ] �C subset rows based on row numbers.

a

6

dt[a > 5, ] �C subset rows based on values in

one or more columns.

dt[, j, keyby = .(a)] �C group and

simultaneously sort rows according

to values in specified column(s).

COMMON GROUPED OPERATIONS

dt[, .(c = sum(b)), by = a] �C summarize rows within groups.

Create a data.table

a

2

6

5

dt[, .(b, c)] �C extract column(s) by name.

b c

a

Subset rows using i

dt[, j, by = .(a)] �C group rows by

values in specified column(s).

a

dt[, c(2)] �C extract column(s) by number. Prefix

column numbers with ��-�� to drop.

Take data.table dt,

subset rows using i,

and manipulate columns with j,

grouped according to by.

data.table(a = c(1, 2), b = c("a", "b")) �C create a data.table from

scratch. Analogous to data.frame().

a

a

dt[i, j, by]

data.tables are also data frames �C functions that work with data

frames therefore also work with data.tables.

Group according to by

c

3

3

dt[, c := 1 + 2] �C compute a column based on an

expression.

c

NA

3

dt[a == 1, c := 1 + 2] �C compute a column based

on an expression but only for a subset of rows.

c d

1 2

1 2

dt[, `:=`(c = 1 , d = 2)] �C compute multiple

columns based on separate expressions.

dt[, .SD[1], by = a] �C extract first row of groups.

dt[, .SD[.N], by = a] �C extract last row of groups.

Chaining

dt[��][��] �C perform a sequence of data.table operations by

chaining multiple ��[]��.

Functions for data.tables

REORDER

a

1

2

1

DELETE COLUMN

dt[, c := NULL] �C delete a column.

c

dt[, c := sum(b), by = a] �C create a new column and compute rows

within groups.

b

2

2

1

a

1

1

2

b

2

1

2

setorder(dt, a, -b) �C reorder a data.table

according to specified columns. Prefix

column names with ��-�� for descending

order.

* SET FUNCTIONS AND :=

LOGICAL OPERATORS TO USE IN i

<

>

=

is.na() %in%

!is.na() !

CONVERT COLUMN TYPE

|

&

%like%

%between%

b

1.5

2.6

b

1

2

dt[, b := as.integer(b)] �C convert the type of a

column using as.integer(), as.numeric(),

as.character(), as.Date(), etc..

data.table��s functions prefixed with ��set�� and the operator ��:=��

work without �� ................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Data Transformation with data.table :: CHEAT SHEET - BeOptimized

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches

Data Transformation with data.table :: CHEAT SHEET - BeOptimized

C datatable columns add

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches