CSSS 508: Intro to R - Carnegie Mellon University
CSSS 508: Intro to R
3/03/06
Logistic Regression
Last week, we looked at linear regression, using independent variables to predict a continuous dependent response variable.
Very often we want to predict a binary outcome: Yes/No (Failure/Success)
For example, we may want to predict whether or not someone will go to college or whether or not they will be obese or whether or not they will develop a hereditary condition.
We use logistic regression to model this scenario. Our response variable is usually 0 or 1.
The formula is:
P(Y = 1) = 1 / (1 + exp[-(B0 + B1x1 + B2X2 + B3X3 + …+ BpXp)])
More simply: f(z) = 1 / (1 + exp(-z)) where z is the regular linear regression we’ve seen.
The behavior of f(z) (or P(response = 1)) looks like:
[pic]
Note that when z = 0, P(response = 1) = 0.5.
We have no information from the covariates, and so it’s essentially a coin flip.
High z, high chance of a response of 1. Low z, low chance of a response of 1.
In R, we model logistic regression using generalized linear models (glm).
This function allows for several different types of models, each with their own “family”.
For us, the family just means that we specify the type of response variable we have and what kind of model we would like to use.
help(glm)
The arguments we’ll look at today are: formula, family, and data.
Formula and data are the same as used before in linear regression. If you are working with a data frame, you can type in the formula y~ x1 + x2 + …+xp and then data = the name of your data frame. If you have a variable defined for each term in your formula, you just need to include the formula argument.
For logistic regression, family = binomial.
Recall that a binomial distribution models the probability of trials being successes or failures (like our response variable).
Let’s try it on the low infant birth weight data set.
library(MASS)
help(birthwt)
Our variables:
low: 0, 1 (Indicator of birth weight less than 2.5 kg)
age: mother’s age in years
lwt: mother’s weight in lbs
race: mother’s race (1 = white, 2 = black, 3 = other)
smoke: smoking status during pregnancy
ptl: no. of previous premature labors
ht: history of hypertension
ui: presence of uterine irritability
ftv: no. of physician visits during first trimester
bwt: birth weight in grams
First attaching the data set to R so we can access the individual variables:
attach(birthwt)
All variables have a natural ordering except race. Race has been coded as integers. If we leave it as integers, the model will return a number that is associated with a change from white to black or a change from black to other. We want to remove this order from the race categories.
race2 ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- optimal designs for binomial and multinomial regressions
- multinomial logit sarkisian
- csss 508 intro to r carnegie mellon university
- bauer college of business
- home charles darwin university
- multinomial logistic regression ibm spss output
- east carolina university
- the macml estimation of the mixed multinomial logit model
- attitude to the long haul trip intention
Related searches
- intro to philosophy pdf
- intro to philosophy notes
- intro to ethics quizlet
- intro to finance pdf
- intro to business online textbook
- intro to finance textbook
- intro to philosophy textbook pdf
- intro to business
- intro to biology games
- intro to philosophy study guide
- intro to philosophy class
- intro to project management pdf