Data Transformation with data.table :: CHEAT SHEET
嚜澳ata Transformation with data.table : : CHEAT SHEET
Basics
Manipulate columns with j
data.table is an extremely fast and memory efficient package
for transforming data in R. It works by converting R*s native
data frame objects into data.tables with new and enhanced
functionality. The basics of working with data.tables are:
EXTRACT
b c
SUMMARIZE
dt[, .(x = sum(a))] 每 create a data.table with new
columns based on the summarized values of rows.
x
Summary functions like mean(), median(), min(),
max(), etc. can be used to summarize rows.
COMPUTE COLUMNS*
setDT(df)* or as.data.table(df) 每 convert a data frame or a list to
a data.table.
a
2
1
a
2
1
dt[1:2, ] 每 subset rows based on row numbers.
a
6
dt[a > 5, ] 每 subset rows based on values in
one or more columns.
dt[, j, by = .(a)] 每 group rows by
values in specified columns.
dt[, j, keyby = .(a)] 每 group and
simultaneously sort rows by values
in specified columns.
COMMON GROUPED OPERATIONS
dt[, .(c = sum(b)), by = a] 每 summarize rows within groups.
Create a data.table
a
2
6
5
dt[, .(b, c)] 每 extract columns by name.
b c
a
Subset rows using i
a
dt[, c(2)] 每 extract columns by number. Prefix
column numbers with ※-§ to drop.
Take data.table dt,
subset rows using i
and manipulate columns with j,
grouped according to by.
data.table(a = c(1, 2), b = c("a", "b")) 每 create a data.table from
scratch. Analogous to data.frame().
a
a
dt[i, j, by]
data.tables are also data frames 每 functions that work with data
frames therefore also work with data.tables.
Group according to by
c
3
3
dt[, c := 1 + 2] 每 compute a column based on
an expression.
c
NA
3
dt[a == 1, c := 1 + 2] 每 compute a column
based on an expression but only for a subset
of rows.
c d
1 2
1 2
dt[, `:=`(c = 1 , d = 2)] 每 compute multiple
columns based on separate expressions.
dt[, c := sum(b), by = a] 每 create a new column and compute rows
within groups.
dt[, .SD[1], by = a] 每 extract first row of groups.
dt[, .SD[.N], by = a] 每 extract last row of groups.
Chaining
dt[#][#] 每 perform a sequence of data.table operations by
chaining multiple ※[]§.
Functions for data.tables
REORDER
a
1
2
1
DELETE COLUMN
b
2
2
1
a
1
1
2
b
2
1
2
setorder(dt, a, -b) 每 reorder a data.table
according to specified columns. Prefix column
names with ※-§ for descending order.
dt[, c := NULL] 每 delete a column.
c
* SET FUNCTIONS AND :=
LOGICAL OPERATORS TO USE IN i
<
>
=
is.na() %in%
!is.na() !
CONVERT COLUMN TYPE
|
&
%like%
%between%
b
1.5
2.6
b
1
2
dt[, b := as.integer(b)] 每 convert the type of a
column using as.integer(), as.numeric(),
as.character(), as.Date(), etc..
data.table*s functions prefixed with ※set§ and the operator ※:=§
work without ※ ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- learning the pythonic way
- towards scalable dataframe systems vldb
- lab 5 pandas
- data forge cheat sheet
- with pandas f m a vectorized m a f operations cheat sheet
- release 0 0 2 christopher clarke read the docs
- 234 31 the transpose procedure or how to turn it around
- pandas a foundational python library for data analysis
- data transformation with cheat sheet
- reading and writing data with pandas
Related searches
- cheat sheet for word brain game
- macro cheat sheet pdf
- logarithm cheat sheet pdf
- excel formula cheat sheet pdf
- excel formulas cheat sheet pdf
- excel cheat sheet 2016 pdf
- vba programming cheat sheet pdf
- macro cheat sheet food
- free excel cheat sheet download
- onenote cheat sheet pdf
- punctuation rules cheat sheet pdf
- excel formula cheat sheet printable