An introduction to S-plus - Imperial College London



An introduction to S-plus

These notes are adapted notes written by Dr. Sara Morris of the Department of Epidemiology and Public Health, Imperial College School of Medicine, and Dr Gavin Shaddick of the Department of Mathematics, University of Bath.

Table of Contents

Table of Contents 2

A brief introduction 3

Getting started 4

Simple arithmetic 4

Simple numeric functions 5

Objects in S-plus 5

Object assignment 5

Managing objects 5

Logical values 6

Vectors 7

Sequences 7

Vector arithmetic 8

Extracting elements of a vector 9

Simple vector argument functions 9

Matrices 10

Creating matrices 10

Matrix subscripts 10

Direct assignment in matrices 11

Binding bits together 11

Matrix arithmetic 12

Functions for working with matrices 12

Character matrices 13

Reading in data 13

Missing data values 14

Dataframes 14

Lists and factors 16

Factors 19

Graphics 20

Basic plotting 20

Basic Statistics 21

Univariate data summaries 21

Bivariate data summaries 21

Traditional statistical tests 22

Generating random variates 22

Scripts 23

User defined functions 23

A brief introduction

S-plus is an integrated suite of software facilities for data manipulation, calculation and graphical display. The Windows version of S-plus that you will be using offers both a menu-based interface and a command-line interface. The command-line interface is more flexible, and allows the user to write his/her own functions. Most of the functions you will be using during this practical have been specially written and are not supplied as standard with the software. These introductory notes will focus on how to use the command-line language; please refer to the on-line Splus help menu for the corresponding dialog box options.

Splus consists of a series of objects (which can contain data) on which functions (which are themselves objects) are performed. One of the simplest forms of object is the vector, which is an ordered collection of numbers or characters.

As an example, we can set up a simple vector of numbers and find the mean. To set up a vector, x, consisting of the three numbers 10, 12 and 15 we use the concatenate function, c().

X sin(3.14159) # sin of pi

[1] 2.65359e-06 # close …

> sin(pi) # pi can be used as a given constant

[1] 1.224647e-16

Other common mathematical functions are exp, log and abs.

These functions can be nested and combined as in sqrt(sin(45*pi/180)). Note the use of parentheses to explicitly determine the arguments of each function, sqrt and sin.

Objects in S-plus

S-plus is what is known as ‘object-orientated’ software; the word object is synonymous with the word ‘thing’. So anything with a name in S-plus is an object, i.e. functions, vectors, matrices, lists and so on.

Object assignment

You can assign a value to an object with the x x

[1] 1.41424

Typing the name of an object prints the value of it to screen. You can use x in arithmetic operations such as x**3 and sin(x). Object names must start with a letter and may contain letters, numbers and dots, but not underscore ‘_’ characters.

S-plus is case sensitive. If you create an object with a name already used by S-plus you will get a warning, either when you make the assignment or when you call the object.

For example, if you create an object called c, later you will see warning: looking for object of mode function, ignored one of mode numeric. This is because there is already an S-plus function named c, namely the concatenate function that puts data into a vector.

Managing objects

When you first make an assignment, the object is stored in the _Data directory that is created when you first started S-plus. You can list the objects in this directory from within S-plus using the ls() function.

>ls()

[1] “.Last.value” “x”

The result is a character vector of the names of the objects (those beginning with a dot are S-plus housekeeping files). To remove an object, use rm().

> rm(x)

To make a copy, just assign it to a new object, i.e. xnew search()

[1] “_Data”

[2] “//splus//.Functions”

[3] “//splus//library//trellis//_Data”

You can list the objects in any of those directories using an optimal argument to the search() function, called pos, for position:

> ls(pos=2)

[1] “%*% “ “%*% .default“ … # a very large vector of S-plus function names



[1273] “zs.p” “zs.s” “zs.u” “zs.xbar”

You can add new places (directories) to your search path using the attach() and library() functions. S-plus will then be able to access objects in these other directories.

Logical values

S-plus enables you to compute with Boolean or logical values. A logical value is either True or False (1 or 0 if expressed in numeric terms).

> x x > 10 # is x strictly greater than 10 ?

[1] F

> x x == 10 # test for equality, use ==, is x=?

[1] T

> x x * 1 # numeric operand will coerce

# T to 1, F to 0

[1] 1

> (!x) # not x

[1] F

Vectors

A vector is simply a collection of scalar objects. In S-plus, vectors are row vectors (see later for column vectors). To create a simple vector, use the c() function (for 'concatenation').

For example

> x x

[1] 2 3 5 7 11

> char x x < 5

[1] T T T T F F F F F F

> sum(x1

Sequences

Use the a:b operator to create sequences of numbers and the seq() function for more complicated ones.

> xx xx xx

[1] 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83

[19] 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65

[37] 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47

[55] 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29

[73] 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11

[91] 10 9 8 7 6 5 4 3 2 1

[1] 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83

[19] 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65

[37] 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47

[55] 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29

[73] 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11

[91] 10 9 8 7 6 5 4 3 2 1

> seq(4, 14, by = 3) # from 4 to 14 with gaps of 3

[1] 4 7 10 13

> seq(4, 14, length = 3) # three numbers evenly spaced between 4 and 14

[1] 4 9 14

To make replicates of numbers, use rep as in

> rep(2, 4) # repeat 2, 4 times

[1] 2 2 2 2

> rep(c(2,6), 2) # repeat (2,6) twice

[1] 2 6 2 6

> rep(c(2,6), c(2, 4)) # repeat 2, twice and 6, 4 times

[1] 2 2 6 6 6 6

> rep(c(2,6), rep(3, 2))# repeat 2, 3 times and 6, 3 times

[1] 2 2 2 6 6 6

Exercise:

What do you expect the following to do? Try to work it out and then try it in S-plus

i) rep(4,4)

ii) rep(1:5,5)

iii) rep(1:5,c(2,2,2,5,5))

iv) rep(1:3,3:1)

Vector arithmetic

All the mathematical operators used on numeric scalars can be used on numeric vectors. The vectors are operated on element-by-element to return a vector of the same length.

> x x * 2 # times each element by 2

[1] 2 4 6 8 10 12 14 16 18 20

> x * x # x squared

[1] 1 4 9 16 25 36 49 64 81 100

> y x + y # shorter vector repeated to be the right length

[1] 2 4 4 6 6 8 8 10 10 12

> x + 2

[1] 3 4 5 6 7 8 9 10 11 12

Notice that when two vectors have different lengths, the shorter is repeated to match the longer one. This is called the recycling rule. If the length of the shorter vector is not a multiple of the length of the other a warning is printed.

Exercise:

What do you expect the following to do ? Try to work it out and then try them in S-plus

x xx[4:7]

[1] 97 96 95 94

> xx[c(2, 3, 5, 7, 11)]

[1] 99 98 96 94 90

> xx[c(1:3, 98:100)]

[1] 100 99 98 3 2 1

> want xx[want]

[1] 100 100 99 99

You can use negative subscript expressions to omit elements from a vector

> x x[-4]# x without the 4th value

[1] 1 2 3 5 6

You can use several negative values but you cannot mix negative and positive in the same expression. When using logical subscript expressions, a T selects the element and a F omits it:

> x x x[x x[,1, drop = F] # all rows, first col and keep matrix structure

[,1]

[1,] 2

[2,] 3

[3,] 5

Exercise:

How would you create a new matrix from x, but without the second row?

Using the matrix, mat, you have created

i) Extract the number in the middle cell

ii) Extract the value in the upper left cell and multiply it by the number in the bottom right cell

iii) Multiply all the numbers together

iv) Create row and column totals

Direct assignment in matrices

You can use subscripts to assign new values directly to a matrix:

> x[3,1]

[1] 5

> x[3,1] x[2,]

[1] 3 11

x[2,] x

[,1] [,2]

[1,] 2 7

[2,] 45 99

[3,] 178 13

Binding bits together

Paste extra rows and columns onto an existing matrix using the rbind and cbind functions, respectively. They are also a simple way to make a vector a matrix:

> rbind(1:3)

[,1] [,2] [,3]

[1,] 1 2 3

> cbind(1:3) # column vector

[,1]

[1,] 1

[2,] 2

[3,] 3

> rbind(c(10,20), x[1,]) # gives two rows

[,1] [,2]

[1,] 10 20

[2,] 2 7

> cbind(1:3, x[,1]) # gives two cols

[,1] [,2]

[1,] 1 2

[2,] 2 45

[3,] 3 178

The example above bind vectors together, but the arguments may also be matrices, or a mix.

Matrix arithmetic

In general, functions and operators work on matrices element-by-element, so x+2 adds two to each element and x*2 doubles each element. For proper matrix multiplication,

use the \%*\% operator. Of course, matrices must be compatible for this.

Functions for working with matrices

The dimensions of a matrix are returned by the dim function and the inverse returned by the solve function. The transpose of a matrix is returned by t(). Cross- and outer products are obtained with crossprod and outer.

See a matrix and you might be tempted to write a loop to access the elements of it or to apply a function to rows/columns of it. Since loops are costly in S-plus when large, there is a special function, written to be more efficient than looping, called apply, which evaluates a given function on each row or column of a given matrix. Even this may be less efficient than finding some other way. For example, given a matrix x, the row sums might be obtained using (in increasing order of efficiency):

# solution using a for loop

> rowsum for (i in 1:nrow(x)){

rowsum[i] rowsum rowsum heart names(heart)

[1] "district91" "age" "pop" "deaths""depr"

> heart[[2]] # list notation

[1] 45 44 43 46 41 42

> heart$age # named list notation

[1] 45 44 43 46 41 42

> heart[,2] # matrix notation

[1] 45 44 43 46 41 42

> heart[,'age'] # name in matrix notation

[1] 45 44 43 46 41 42

You can write new named columns with, e.g.

> heart$depr attach(heart) # put heart in position 2 of the search list

> age # same as heart$age before

1 2 3 4 5 6

45 44 43 46 41 42

If you go search() now, you will see the data frame below your ‘_Data’. Therefore an object called age in your ‘_Data’ will be found first. Use detach(position_number) (in this

case 2), to reverse the attach.

To add a new row to a data frame you can use rbind. Here is a silly example, but you'll see why in a minute.

> heart newheart dim(heart)

[1] 7 5

> dim(newheart)

[1] 6 5

Earlier advice to build up complicated expressions works in reverse here. To understand how the expression above works, evaluate a bit of it at a time:

> c(as.list(heart), list(sep = "\t"))

> do.call("paste", c(as.list(heart), list(sep = "\t")))

> duplicated(do.call("paste", c(as.list(heart), list(sep = "\t"))))

If you want to assign or change the names attribute of a data frame or matrix, use the dimnames function and assign it a list of two character vectors where the first is the vector of row names and the second column names. The example below shows how to change the columns names to the letters 'a' to 'e'.

> dimnames(heart) # what dimnames at present

> dimnames(heart)[[2]] list.example

[[1]]:

[1] 10 20 30

[[2]]:

function(x)

{

x ages ages

$males:

[1] 23 22 19 23 34

$females:

[1] 17 22 21

> names(ages)

[1] "males" "females"

> length(ages)

[1] 2

To subscript a single list item you can use the names, prefixed with a $ sign, or an index, in double square brackets, as in the following

> ages$males

[1] 23 22 19 23 34

> ages[[2]] # the second item

[1] 17 22 21

Notice how the result of ages[[2]] prints out as a vector (which it is), which can itself be subscripted as usual

> ages[[2]][2]

[1] 22

> length(ages[[2]]) # length of the second item

[1] 3

If you want to find the length of all the elements of ages you can use length(unlist(ages)). Arithmetical operations on the whole list don't work in the same way as matrices because lists can handle a mix of character and numeric elements. Of course, when the list items are vectors (as in this example) you can do arithmetic on them as normal. E.g.

> ages+1

Error in ages + 1: Non-numeric first operand

Dumped

> ages[[1]] + 1

[1] 24 23 20 24 35

> ages[[2]]*2

[1] 34 44 42

Say after one year you want to update ages by adding one to each element, you could use

> ages2 ages2

$males:

[1] 24 23 20 24 35

$females:

[1] 18 23 22

Alternatively, there is an apply type function lappy to apply a given function to each list item.

Suppose you want the sum of the elements in each list item:

> lapply(ages, sum)

$males:

[1] 121

$females:

[1] 60

To add one to each element you can't use lapply(ages, +1) as there is no function called '+1', but you can write a new function which adds one to a vector, e.g.

> add1 ages$cats ages

$males:

[1] 23 22 19 23 34

$females:

[1] 17 22 21

$cats:

$cats$males:

[1] 12 5 2

$cats$females:

[1] 5 2

> lapply(ages, add1)

Error in x + 1: Non-numeric first operand

Dumped

The add1 function may also have optional arguments which can be passed through from lappy in the same way as in the apply example.

The following expressions for ages are all valid:

> ages$cats$males

[1] 12 5 2

> ages[[3]][[2]]

[1] 5 2

> ages[[3]]$females

[1] 5 2

as is direct assignment:

> ages[[3]]$females[2] ages[[3]]$females

[1] 5 12

The following is an example of an unnamed list, and how to add and change names:

> cats cats

[[1]]:

[1] "tabby" "siamese""tortoishell"

[[2]]:

[1] "lion" "lynx" "cheetah"

> names(cats)

NULL

> names(cats) cats$scary

[1] "lion" "lynx" "cheetah"

> names(cats)[2]

[1] "scary"

> names(cats)[2] ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download