Sebkorb.files.wordpress.com



R basics and important commandsHow to set up R:Download RInstall R studio, which is an Editor and makes it much nicer to work with R (although not quite as nice as the Matlab interface)When you need to install new packages, you better do it in R How to master the basics of the R languageRead the manual – very long and boringHave a look at this short list of the most useful R commands – very useful and highly recommended. However for many of these you might have to look up the help in order to use them correctlyUse this website called Quick-RUse the help – type a ? mark followed by the name of the function to read everything about it Have a look at the short list of commands here belowIf you are used to Matlab, watch out for some traps, e.g.:Changes in vectors, matrices, etc. occur in the workspace, but not in the visual output of them (although that issue seems to be resolved in the newer versions of Rstudio). Elements of a vector can have names, which however does not imply that the vector becomes a 2 dimensional matrix (as you added one line with names)Here my preliminary and non-exhaustive list of some of the basic R commands (commented text preceded by #):# get or set working directorygetwd()setwd()# list contents of workspacels()# see all attached packages and objects in workspacesearch()# to see the paths where the packages are storedsearchpaths()# clear allrm(list = ls()) # clear only specific elements from workspacerevome(element1, element2, etc.)# Read data with headersd <- read.table("blablabla.dat", header=TRUE, sep="\t")# or define a directory and indicate it when loading/saving fileswd_Offset <- "/YOURDIRECTORY/" datamat <- read.delim(paste(wd_Offset,"Offset_all_subs.txt", sep=""), header = TRUE) # import data# Install new packages (better to do this directly in R instead of R studio)install.packages("lme4")# Load new packages (do this only once at the beginning of each new session)library(lme4)# explore datanames(d) # if d is a dataframe, get the names of its columnssome(d) summary(d)describe(d)str(d)class(d)# Many R objects have a class attribute, a character vector giving the names of the classes from which the object inherits. If the object does not have a class attribute, it has an implicit class, "matrix", "array" or the result of mode(x) (except that integer vectors have implicit class "integer").# determine type of variabletypeof(d)typeof(d[,1])# check dimensions of matrixnumber_rows <- nrow(matrix)number_cols <- ncol(matrix)dim(matrix) # gives number of rows and columns# check occurrence of each Subject in the matrixtable(d$SubID)# know number of participants (stored in variable id) in a matrix called d:length(unique(d$id))# get number of unique values in a variabletable(d$gender) # but if you have several lines for each participant use:table(unique(d[, c("id", "gender")]) $gender)# sort data by a specific variable (column)newdata <- data[order(yourVariableName),]newdata <- data[order(- yourVariableName),] # same but descending# create vectorvec <- c(1, 2, 3, 1, 2, 3)# replicate elements of a vectorrep(YourVector, NumberReplications)# e.g. to replicate the vector “vec” 3 times:rep(vec, 3)# call sub-element of a vectorvec[5] # which is = 2# generate factor levels# factors are one way to store data in R. they could for example contain the # conditions of your experimentgl(n, k, length = n*k, labels = 1:n, ordered = FALSE)# where n = number of levels, k = number of replications# for example, create a vector with 3 levels and 10 replications:gl(3, 10, labels = c(“Label1”, “Label2”, “Label3”))# if you want to make a data.frame out of this factor:data <- data.frame(gl(3, 10, labels = c(“Label1”, “Label2”, “Label3”)))# change levels of a factor# if you try to replace the content of a factor with a new level, R will give an error message. Instead, you should operate on the levels:# here in the column “time” of the dataframe “datamat_EMG_long” I replace the level "Corr_t1" with “t1”:levels(datamat_EMG_long$time)[match("Corr_t1",levels(datamat_EMG_long$time))] <- "t1"# change factor into something else:R has a number of (undocumented) convenience functions for converting factors:as.character.factoras.data.frame.factoras.Date.factoras.list.factoras.vector.factorBut annoyingly, there is nothing to handle the factor -> numeric conversion.(see this link)And according to this link: To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).# create matrix matrix(data, nrow, ncol, byrow)# The data argument is usually a list of the elements that will fill the matrix. The nrow and ncol arguments specify the dimension of the matrix. Often only one dimension argument is needed if, for example, there are 20 elements in the data list and ncol is specified to be 4 then R will automatically calculate that there should be 5 rows and 4 columns since 4*5=20. The byrow argument specifies how the matrix is to be filled. The default value for byrow is FALSE which means that by default the matrix will be filled column by column.# transform into data.frame# data frames are a type of object in R. They are table with lines for sub and # columns for variablesdata <- data.frame(matrix)# create a data frame from scratch with variable names# for example, create a data frame with 3 variables and 30 lines:data <- data.frame(NameVariable1 = gl(NumberLevels, NumberRepetitions, labels=c("Label1","Label2", "Label3")), NameVariable2 = factor(rep(1:10, 3)), NameVariable3 = 1:30)# call sub-elements of a matrix/data frame# for example, select only the lines corresponding to Condition 1condition_blabla <- data[data[, "Condition"] == 1, ]# or select lines with Condition == 1 and where Emotion is not Anger:condition_blabla <- data[data[, "Condition"] == 1, & data[,”Emotion”] != “Anger”]# 3 ways to select a specific column, use either its name:condition_blabla <- data[, “VariableName” ]# or its number:condition_blabla <- data[, 3 ]# or use the $ sign:condition_blabla <- data$VariableName#to select rows 2, 4, and 6 with all the columns:condition_blabla <- data[c(2,4,6), ]# or use the function SUBSETnewdata <- subset(data, REJECTED == 0) # take only trials with 0 in column REJ.# or use a Boolean variableboolean.var <- (data$emotion == "Happy" & (data$RATINGJOIE > data$RATINGCOLERE)==TRUE)data[boolean.var, 20] <- 1 #put 1 into col 20 for trials of the Boolean.var# change name of one column in matrix(data_long_2)[7] <- "EMG"or:colnames(data_long_2)[35:37] <- c("Corr", "Orb", "Zygo") # attach/detach# instead of calling the column (variable) of a data.frame using for example the # $ sign, you can attach the entire data frame and and access a variable by just # typing its name:attach(data)VariableName# this creates an invisible copy of the variable in R. So be aware that whatever # you change to this attached variable does not change the same variable in the# original data frame# when no longer needed, detach:detach(VariableName)# get number of rows and columnsnrow(x) ncol(x)# transpose a vector, matrix, or data framet(data)# sum or mean of a row/columnrowSums(x, na.rm = FALSE, dims = 1)rowMeans(x, na.rm = FALSE, dims = 1)colSums(x, na.rm = FALSE, dims = 1)colMeans(x, na.rm = FALSE, dims = 1) # x must have at least 2 dimensionse.g.: colMeans(data)# compute means per group and conditione.g. if your column Condition contains 3 levels, and your column Emotion contains 2 levels, compute the mean for each celldata <- aggregate(data$YOURDEPENDENTVARIABLE, list(data$Condition, data$emotion), mean)# or use ddply:summary <- ddply(data, .(factor1, factor2), summarise, val = mean(DV))# to apply it to several columns:summary <- ddply(data_EMG, .( factor1, factor2), colwise(mean, c("var1", " var2", "var3” ) ) )# Centering variables_IMPORTANT: Only within-subjects IVs need to be centered, NOT the between-subjects IVs (is this true?), and NOT THE DVs!!_Categorical IVs: When a factor has more than 2 levels, lm.setContrasts is a better option to define the contrasts (and center the variable). You can do dummy coding to define orthogonal contrasts# center a continuous variabled$XC <- scale(d$X, center = TRUE, scale = FALSE)# Center a continuous within-subjects IVs: center SEPARATELY for each subject (cluster-mean centering)data_authenticity_no_bilat_clean <- ddply(data _clean, "Subject", transform, YOUR-WITHIN-SUB-IV-C = YOUR-WITHIN-SUB-IV - mean(YOUR-WITHIN-SUB-IV, na.rm = T)) # center a categorical variable# give +0.5 to males, and -0.5 to females:data2$Sex_C <- ifelse(data2$Sex == "male", .5, -.5) # ifelse() can be nested in many ways:ifelse(<condition>,<yes>,ifelse(<condition>,<yes>,<no>))ifelse(<condition>,ifelse(<condition>,<yes>,<no>),<no>)ifelse(<condition>,ifelse(<condition>,<yes>,<no>),ifelse(<condition>,<yes>,<no>))ifelse(<condition>,<yes>,ifelse(<condition>,<yes>,ifelse(<condition>,<yes>,<no>)))# for example, when the variable TMS has 3 levels:data_long$TMS_C <- ifelse(data_long$TMS == "VTX", 0, ifelse(data_long$TMS == "M1", 1, ifelse(data_long$TMS == "SMS", -1, ""))) # ifelse with multiple conditions# concatenate multiple conditions that need to be met using ‘&’:data_OAS_Side$wrong_OAS_RESP <- ifelse(data_OAS_Side$OASSide =="left" & data_OAS_Side$sideRESP == "{RIGHTARROW}" & data_OAS_Side$side_laterRESP == "", 1# for loopfor (i in 1 : 10) {whatever <- dothis}# if if (p == 1) { dothis datamat <- data_temp } else if (p == 12) { dothat } else dothatotherthing# for loop with if (e.g. to concatenate data files from participants)subjects <- 1:32for (participant in 1 : length(subjects)) { sub <- subjects[participant] if (participant == 1) { data_temp <- read.delim(paste(sourceWD,"results_sub_", sprintf("%02d", sub), ".txt", sep=""), header = TRUE) # import data datamat <- data_temp # rm(data_temp) } else if (participant == 12) { data_temp_a <- read.delim(paste(sourceWD,"results_sub_", sprintf("%02da", sub), ".txt", sep=""), header = TRUE) # sub 12 has an ‘a’ and a ‘b’ data file data_temp_b <- read.delim(paste(sourceWD,"results_sub_", sprintf("%02db", sub), ".txt", sep=""), header = TRUE) # import data b datamat <- rbind.data.frame( datamat, rbind.data.frame(data_temp_a, data_temp_b) ) rm(data_temp_a, data_temp_b) } else if (participant == 14) { data_temp_a <- read.delim(paste(sourceWD,"results_sub_", sprintf("%02da", sub), ".txt", sep=""), header = TRUE) # sub 14 only has an ‘a’ part datamat <- rbind.data.frame( datamat, data_temp_a ) rm(data_temp_a) } else if (participant != 1 && participant != 12 && participant != 14) { data_temp <- read.delim(paste(sourceWD,"results_sub_", sprintf("%02d", sub), ".txt", sep=""), header = TRUE) # import data datamat <- rbind.data.frame(datamat, data_temp) rm(data_temp) }}# find trials of a specific subject (using %in% in a for loop)subjects <- 1:32for (participant in 1 : length(subjects)) { sub <- subjects[participant] data_sub <- datamat_clean[datamat_clean$sub %in% sub,] … }# concatenate vectorsnewvector <- c("X", "Y", "Z")d[, c("X", "Y", "Z")] # extract the same variables out of d and concatenate them# concatenate columns from matricesdata_new <- cbind(data1, data2[,3:18])# bind columns to matrix (without changing to double)cbind.data.frame()# merge 2 data-frames based on a variable (e.g. subjectnum)# works even when the 2 data frames have different number of lines per subjectDATA <- merge(MATRIX1, MATRIX2, by = c("subjectnum", "other variable"));# Reshape Data (very cool!)# The reshape2 package contains two general functions for reshaping data:# melt(), which takes multiple columns of data and transforms them into multiple rows,# and cast(), which takes several rows and transforms them into multiple columns.# There are two versions of cast() - acast() for vectors, matrices, and arrays and dcast() for dataframes.# meltmelt(data, id.vars, measure.vars, variable.name = "variable", ..., na.rm = FALSE, value.name = "value") # A general flexible tool is the by() function, which simply applies a function according to a grouping factorby(d[, c("X", "Y")], d$id, cor) # Within-participant correlations of X and Y# do whatever you want to a data.frametransform# apply any function to all subjectsddply(.data, .variables, .fun = NULL, ..., .progress = "none", .drop = TRUE, .parallel = FALSE)# e.g. to center Variable1 for each participant (they are in variable “id”): d <- ddply(d, "id", transform, Variable1C = Variable1 - mean(Variable1, na.rm = T)) # or count number of times a vector in a matrix contains the word Smile or Frown, and separate the counts by participant and condition:smiles_and_frowns <- ddply(data_clean_EMG_500ms_after_SO, .(Condition, participant), summarise, Smile = sum (SubjExpression=="Smile"), Frown = sum(SubjExpression=="Frown") )# Save a matrix as a tab-delimited .txt filewrite.table(mat, file="mymatrix.txt", sep="\t")# w/o row nameswrite.table(mat, file="mymatrix.txt", sep="\t" , row.names = F) # find outliers (> 3 SDs above or below M of each subject)datamat_clean <- datamat# add columns to mark outliersdatamat_clean$outlier_rt_wanting <- matrix( 0, nrow(datamat_clean), 1) datamat_clean$outlier_rt_liking <- matrix( 0, nrow(datamat_clean), 1)for (participant in 1 : length(subjects)) { sub <- subjects[participant] where <- which(datamat_clean$sub == sub) # find trials of that subject M_rt_wanting <- mean(datamat_clean[where, "rt_wanting"], na.rm = T) SD_rt_wanting <- sd(datamat_clean[where, "rt_wanting"], na.rm = T) M_rt_liking <- mean(datamat_clean[where, "rt_liking"], na.rm = T) SD_rt_liking <- sd(datamat_clean[where, "rt_liking"], na.rm = T) datamat_clean$outlier_rt_wanting <-as.double(ifelse(datamat_clean$rt_wanting >= ( M_rt_wanting + 3 * SD_rt_wanting), 1, ifelse(datamat_clean$rt_wanting <= (M_rt_wanting - 3 * SD_rt_wanting ), 1, 0 ))) datamat_clean$outlier_rt_liking <- ifelse(datamat_clean$rt_liking >= ( M_rt_liking + 3 * SD_rt_liking), 1, ifelse(datamat_clean$rt_liking <= (M_rt_liking - 3 * SD_rt_liking ), 1, 0 )) rm(where, M_rt_wanting, M_rt_liking, SD_rt_wanting, SD_rt_liking)}# display something in Console (e.g. number of outliers)cat("number outliers rt wanting = ", sum(datamat_clean$outlier_rt_wanting) )# making a vector with numbers 1 to 10 increasing by onevec <- 1:10# making a vector with numbers 1 to 10 and increments other than onevec <- seq(1, 10, 2) # increase by 2vec <- seq(1, 10, 3.1) # increase by 3.1# Graphing: create a scatter plot matrixuse the function scatterplot.matrix, in the library(car)e.g. scatterplot.matrix(~x+y+drat+wt|cyl, data=mtcars, main="Three Cylinder Options")## barplot with errorbars using ggplot (2 IVS AND 1 DV)library(ggplot2)wd <- getwd()name <- paste(wd, "/YOURFILENAME.pdf", sep = "") pdf(name, width = 8, height = 8) ggplot(YOURDATA, aes(x=IV1, y=DV, fill=IV2)) + stat_summary(fun.y="mean", geom="bar", position=position_dodge()) + coord_cartesian(ylim = c(mean(YOURDATA$DV, na.rm = FALSE) -1, mean(YOURDATA$DV, na.rm = FALSE) + 1)) + stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge()) + xlab("VAR1") + ylab("DV") + ggtitle("YOUR TITLE")dev.off()# plot 2-way interaction with ggplotsummary <- ddply(data,.(factor1, factor2),summarise, val = sum(DV))ggplot(data, aes(x = factor(factor1), y = DV, colour = factor2)) + # geom_boxplot() + geom_point(data = summary, aes(y = val)) + geom_line(data = summary, aes(y = val, group = factor2)) + theme_bw()# lm () function# To fit a GLM to a set of variables in a data frame, we use the lm() function# Generic syntax: lm(Dependent Variable ~ Independent Variables)# This function is the equivalent to fitting the following model: Y-hat = b0 + e# Y-hat = Predicted Value# b0 = Parameter Estimate (Intercept in this case)# e = Error or residual# The formula portion is specified a bit like you would mathematically specify a line# It uses the following syntax: x1 + x2 + . . . xn# The intercept is implied, but can be specified explicitly through adding '+ 1'# The intercept can also be suppressed by adding '+ 0'# One Parameter Modelm1 <- lm(dv ~ 1, data=d) # Fit a one-parameter model (m1 = model 1)summary(m1)# Alternative way to write the same command: m1 <- lm(d$dv ~ 1)# You could also use d$variableName within the lm() function but when you have several variables it is much easier to use data = dnames(m1) # This tells us the various attributes of m1m1$coefficients # The call that returned this lm object: coefficients (b0)# DATA = MODEL + ERRORd$dv # DATAd$fv <- m1$fitted.values # MODEL: fitted values (Y-hat)d$e <- m1$residuals # ERROR: residuals (error)d[, c("dv", "fv", "e")] # data = fitted value + error# Calculate the sample standard deviation# Use the residuals of the model and the residual df # sd <- sqrt(SSE/df)SSE <- sum(m1$residuals^2) # Sum of Squared Error (SSE)SSEDF <- m1$df.residual # Denominator dfDFSD <- sqrt(SSE/DF)SD # Residual standard error (or standard error of the estimate)# Standard Error = sd / sqrt(n)SD/sqrt(8) summary(m1)# Calculate the t-value = Parameter Estimate / Standard ErrortValue <- mean(d$dv) / (SD / sqrt(8))# P-Value: Test the probabilty that this tValue (or a more extreme one) # would be found in a random sample of n=8 from the populationpt(tValue, df = m1$df.residual, lower.tail = FALSE) * 2 # p-value# Generate a confidence interval for the interceptconfint(m1) # 95% Confidence intervalconfint(m1, level = .95) # If you want a different confidence interval you can specify the value# fit a linear model for each participant!olsmod <- lmList(reaction ~ deviance | id, data = d)# make predictions based on a regressionmodel <- lm(Y ~ X, data = d)predict(model, data.frame(data$X = point for which you want prediction))lmer() function from the lme4 package# to fit an LMM on data in the long format (every line = one trial)library(lme4) # load packageslibrary(lmerTest)# define model, DV is regressed on the main effects of 2 fixed effects, and their interaction; include intercepts by subject and stimulus as random effects:mod1 <- lmer(DV ~ IV1 * IV2 + (1 | sub) + (1 | stim), data = YOURDATA)summary(mod1)anova(mod1) # post-hoc tests: pairwise comparisonslibrary(lsmeans) # load package lsmeans# use lsmeans from the lsmeans and not from the lmerTest packagelsmeans = lsmeans::lsmeans # pairwise comparisons between levels of IV1 and IV2, with Bonferroni corrleastsquare = lsmeans(mod1, pairwise ~ IV1:IV2, adjust="bon")Statistical functions in RFunction in RPackageUseTheoryrcorrHmiscComputes a matrix of Pearson’s r or Spearman’s rho rank correlation coefficients for allpossible pairs of columns of a matrixlmenlmeLinear mixed-effects modelMixed-effects models describe a relationship between a response variable and some of the covariates that have been measured or observed along with the response. They incorporate fixed-effects parameters (e.g. your experimental conditions) and random effects, which are unobserved random variables. In mixed-effects models at least one of the covariates is a categorical covariate (e.g. subjects, or gender). In a linear mixed model the unconditional distribution of the random effects (B) and the conditional distribution (Y |B = b), are both multivariate Gaussian (or “normal”) distributions.Lmer (and lm) can be used to run repeated measures ANOVA. For that, using lmer, you need to include all the within-subjects IVs in the random effects!!lmerlme4Linear mixed-effects modellm()General linear modelThe general linear model incorporates a number of different statistical models: ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The general linear model is a generalization of multiple linear regression model to the case of more than one dependent variable. Hypothesis tests with the general linear model can be made in two ways: multivariate or as several independent univariate tests. In multivariate tests the columns of Y are tested together, whereas in univariate tests the columns of Y are tested independently, i.e., as multiple univariate tests with the same design matrix.Multiple linear regression is a generalization of linear regression by considering more than one independent variable, and a specific case of general linear models formed by restricting the number of dependent variables to one.An application of the general linear model appears in the analysis of multiple brain scans in scientific experiments where Y contains data from brain scanners, X contains experimental design variables and confounds. It is usually tested in a univariate way (usually referred to a mass-univariate in this setting) and is often referred to as statistical parametric mappingglm()Generalized linear modelIf the errors do not follow a multivariate normal distribution, generalized linear models may be used to relax assumptions. Ex: ordinary least squares, logistic regression, Poisson regression, gamma regressionglmerlme4Generalized linear mixed effects model (GLMM)Generalized linear mixed models (or GLMMs) are an extension of linear mixed models to allow response variables from different distributions, such as binary responses. Alternatively, you could think of GLMMs as an extension of generalized linear models (e.g., logistic regression) to include both fixed and random effects (hence mixed models).lmertestTests for random and ?xed effects for linear mixed effect models (lmer objects of lme4 package), i.e get the p values for F statisticsstanrstanBayesian LMMs First write a model in a text editor and save it as .stan. In that you can define the priors. then fit the model to the data and check the posteriors. For info see blog by Shravan Vasishth at Uni Potsdam, and . There is also a paper with the title “How to become a Bayesian in eight easy steps: An annotated reading list”. This tells you which books/articles to read, see also blog ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download