MIDTERM 2 STUDY GUIDE:



BIOSTATISTICS 201B MIDTERM STUDYGUIDE:

Here are some notes and study suggestions for the midterm exam:

1) What To Bring: You should bring a writing instrument, a calculator (one capable of taking logs and exponentials-your phone is acceptable), your cheat sheets, and of course a well-rested brain! I will provide anything else you need including the pages to write the problem solutions, scratch paper and any relevant printouts or statistical tables.

2) Cheat Sheet Guidelines: The exam is closed book, closed notes. However you may bring two 8.5 by 11 pieces of paper with anything you want on the front and back (4 total sides of paper). I don’t care what format it is in, how small you write, etc. The purpose of making the exam closed book is not to deprive you of formulas you may need but rather to (a) force you to synthesize the material, (b) minimize the amount of rustling and page shuffling during the exam and (c) to allow me to make the exam easier (

3) Time and Location: The exam will take place Friday, February 12th, from 9:00-10:50 in our usual class room, CHS 43-105A. In other words we are using the lecture and discussion time for the exam. There will be no computer lab on Friday.

4) Exam Review: I will do a review session before class on Wednesday, February 10th, from 8-9 in our regular lecture hall. This is still(!) pending final confirmation of the room availability but there isn’t anyone scheduled in it at that time so it should be fine. There will be no new computer lab material during the exam week but Nadia will be available during Thursday’s lab times to answer questions. I will have office hours Monday 10-11 and Thursday 9-10 times and will be checking e-mail regularly. I may have some additional availability on Thursday—I will keep you posted.

5) What is Covered: This exam covers (surprise!) all the material since the start of the quarter up through calibration and predictive accuracy (namely the lecture from Monday, February 8th). This material matches with HWs 1-3. The major topics include classical non-parametric tests, permutation tests and the bootstrap, principles of maximum likelihood estimation and generalized linear models, the basic logistic regression model and its interpretations, and methods for evaluating model performance (e.g. plots, Hosmer-Lemeshow test, pseudo-R2, sensitivity/specificity/ROC curves). The emphasis will be much more on interpretation of printouts and graphs and answering conceptual questions than on hand calculations. However I would expect you to be able to do things like calculate an odds ratio from a contingency table, get an OR or its confidence interval from the coefficient estimates in a logistic model and vice versa, convert a log odds to a probability and compute a likelihood ratio chi-squared statistic given the log likelihoods for two models. You should know when to use and how to evaluate each of the different types of models we have learned.

6) Practice Problems: I have posted the 201B midterms from 2012-2015 on the web site along with their solutions. For the 2012 midterm exam you do not need to worry about Question 2 on which deals with ordinal and multinomial logistic regression which we will not have had an assignment on before the test. (The topic ordering in 201A/B was a little different that year.) All of the 2013 midterm is fair game except part 2j on model adjustment. The 2014 and 2015 exams should be doable in their entirety. Beyond these old exams, the best source of additional practice the warm-up problems provided in the homework sets. If there is something you are feeling weak on for which you would like additional practice let me know and I’ll try to make some suggestions. Another good study technique is to get a printout for any of the models/data sets from the homework (or for that matter to make up your own data set!) and make sure you can give the appropriate real-world interpretation for EVERY number on the printout.

7) Detailed Topic List: Here is a more detailed list of things to study. I do not guarantee I haven’t missed anything—let me know if there’s something you think I should add.

• Classical Non-Parametric Statistics: The major methods we learned were Spearman’s rank correlation, the Wilcoxon rank-sum test (for comparing two independent groups), the sign test and Wilcoxon signed-rank test (for comparing two matched groups) and the Kruskal-Wallis test (for comparing 3 or more groups.) You should understand the rationale behind tests based on ranks, which parametric tests the above tests correspond to, when these tests are more (or less) appropriate than the corresponding parametric tests, what hypotheses go with these tests and what their basic test statistics are. It is unlikely I would want you to do a full-blown calculation for such a test but I would expect you to know the basic procedure and how to interpret the corresponding printouts.

• Permutation Tests and the Bootstrap: Understand the basic idea behind a permutation test (generating an appropriate null distribution for your test) and in particular how this relates to rank tests. For the bootstrap you should understand the basic principle (resampling from your observed sample parallels sampling from the original population, allowing you to build up the empirical distribution of your test statistic) and how this differs from a permutation test (in that it gives you an approximation to the true as opposed to null distribution.) You should be able to describe the conceptual algorithm you would use to get a bootstrap estimate, standard error or confidence interval for a statistic of interest and when the bootstrap methods are particularly useful. Calculations are generally impractical without a computer except in cases with very small sample sizes.

• Maximum Likelihood Principle and Generalized Linear Models: Understand conceptually what a likelihood function is and why maximizing it should lead to good parameter estimates. (I would not expect you to derive or maximize a likelihood function during an exam.) Understand the three major components of the generalized linear model (distribution of Y, link function, systematic component involving the predictors) and how we vary these to get different models.

• Logistic Regression: Know what the basic model is and how to use it to obtain predicted probabilities; know how to interpret the regression coefficients and their confidence intervals on the log odds scale; know how to exponentiate them to get odds ratios and the corresponding confidence intervals and interpret these appropriately for indicator variables, continuous variables, and changes of more than one unit; understand the basic idea behind the transformation principle; know how to test the overall model (likelihood ratio chi-squared test), the individual variables (LR chi-squared test or Wald test) and how to compare two nested models (LR chi-squared test); know what is meant by the terms null and saturated models and deviance and why these concepts are important; know how to assess goodness of fit using techniques such as the Hosmer-Lemeshow test; know what is meant by calibration and predictive accuracy and how these relate to goodness of fit ideas; be familiar with pseudo R2, sensitivity, specificity and the ROC curve; know how to handle interactions and transformation in logistic regression.

• STATA/SAS commands: I will not ask you to give me the exact command to obtain a particular kind of output. However if I give you a printout with the command lines I would expect you to be able to follow it and correctly interpret what the command did and I would expect you to be able to describe in a general way how you would use one of the packages to obtain the necessary information.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download