PPD 404 - University of Southern California



PPD 404x Instructor: Hyung-woo Lee

Summer 2009 Office Hours: M & W, 4:00-5:00 PM

M & W 5:30-9:40 PM e-mail: hyungwol@usc.edu

RGL 215 Office:

Statistics for Policy, Planning, and Development

Goals:

Within this course, you will learn about the practical uses of statistics in social science, public policy, management, and everyday life. The course focuses on understanding the conditions under which various statistical techniques may be properly used as well as understanding the results of statistical analyses. Very little emphasis will be placed on learning formulas and equations per se. Since virtually all of the computation of statistics is done with computers, some time will be devoted to becoming familiar with statistical software.

On examinations, some manual calculations will be required. You will need a hand calculator having a square- root function. Two sets of computer-based exercises will be required. The exercises will be performed using SPSS. The software may be run at workstations in the university’s public user areas in Waite Phillips Hall (b-34), King Hall (Room 200), and the Leavey Library. Homework will not be assigned, however optional exercises for reviewing each major statistical technique are available at the end of each day’s slide presentation. In order to understand how statistics are used in research, students will be required to analyze, and present to class, an article from a scholarly journal.

At the end of this course, you will be familiar with several types of statistics and be able to interpret statistical findings. You will be able to evaluate statistics presented in scholarly journals and prepare yourself for future quantitative research projects and advanced statistical courses.

Requirements:

You are expected to attend every lecture and complete all assigned reading and homework assignments. I recommend reading the assigned chapter both before the class meets and after the lecture to reinforce the materiel. Remember, meeting twice a week will present us with a special challenge; specifically as it relates in the amount of information you will exposed to in such a short period of time. Learning statistics is like learning a foreign language; both require daily attention. It is therefore expected that you will review all material presented the previous week and come with any questions the following class session.

PowerPoint slides will be posted on the internet (BlackBoard). Please arrive and leave class on time, and keep all pagers and cell phones muted during class and exams.

Grading:

There will be three examinations, two during the semester and one during finals week. Each will consist of manual calculations and multiple-choice questions. None of the examinations will be cumulative. That is, the second examination will deal directly with only those topics covered after the first examination and the “final” examination will cover material introduced after the second midterm examination. Make-up examinations will be given only under extraordinary circumstances.

NO curve is used in determining final course grades. The university’s grading scale is employed. To receive the grade “A” you must average 90 percent on the eight graded activities; to obtain a “B” you must end the semester with the average of 80 percent; etc. NO additional work for “extra credit” is available.

Four computer exercises will be required, two using Microsoft Excel, two using SPSS. Additionally, each student will complete a research project. Dates and exact weights of all required activities are as follows:

|Task |Due Date |Weight |

|Exercise One Due |6-1-08 |2.5 percent |

|First Examination |6-3-08 |20 percent |

|Exercise Two Due |6-8-08 |5 percent |

|Second Examination |6-17-08 |25 percent |

|Exercise Three Due |6-24-08 |2.5 percent |

|Research Project |6-24-08 |10 percent |

|Exercise Four Due |6-24-08 |5 percent |

|Final Examination |6-29-08 |30 percent |

No grading of Incomplete (IN) will be granted for this course except in the event of an extreme and unavoidable emergency. Please note that university policy requires that all work except the final examination must be complete before students are eligible to request permission to take and incomplete. It is YOUR responsibility to complete the necessary paperwork if you decide to drop the class.

Students with Disabilities:

Any student requesting academic accommodations based on a disability is required to register with Disability Services and Programs (DSP) each semester. A letter of verification for approved accommodations can be obtained from DSP. Please be sure that this letters in the hands of the instructor before the first midterm examination. DSP is located in STU301. Hours are 8:30 A.M.-5:00 P.M., Monday through Friday. The phone number for DSP is (213) 740-0776.

Required Text

Sirkin, R. Mark. 2006. Statistics for the Social Sciences. Third Edition. Thousand Oaks, CA: Sage.

Optional Text (Not Required)

Field, Andy. 2009. Discovering Statistics Using SPSS. Third Edition. Thousand Oaks, CA: Sage.

Objectives:

The specific objectives of this first course in statistical analysis are:

- To Develop an understanding of the basic statistical techniques

- To learn to choose a statistic appropriate for a particular analytical task

- To be able to translate the results of statistical analyses into a conclusion regarding the questions posed

- To introduce students to statistical software

- To show students the importance of statistics for their future work/school experience(s)

- To show how statistical techniques are used in scholarly journals

Overview of research project-

The research project is intended to give students a glimpse about how statistics are used in research and how the results are presented in scholarly journals. Students will be required to find a research article in a scholarly journal (i.e. Academy of Management, PAR), review it, and present the results to the class. The article must use a statistical technique discussed in class. Students will be required to give a 5-8 minute presentation regarding their findings. The presentations should be professional and may include PowerPoint, and/or handouts. Since different articles must be used, students are encouraged to sign up early. Presentations will take place after Examination 2.

Course Schedule

May 20, 2009-

Introduction to the Course

The first class session will be devoted to an overview of the course including discussions of required texts, computer exercises and data sets, examinations and grading policies, and expectations of students. An overview of research focusing on the role of statistics will also be presented. Two types of variables will be introduced, discrete and continuous. Two types of statistics- descriptive and inferential- will be distinguished. There will also be a review of fundamental algebraic operations as well as of significant digits and rules for rounding.

Sirkin, Statistics for the Social Sciences, pp. 1-27

The Distribution of Single Variables

This session introduces several techniques for understanding the configuration of single variables. These include one-way frequency distributions, bar charts, histograms, and quantiles. Techniques of exploratory data analysis (EDA) such as stem-and-leaf plots will be introduced.

Sirkin, Statistics for the Social Sciences, pp. 33-54, 63-78 and 109-113

Measures of Central Tendency

Descriptive statistics allows you to do three things: describe the “typical” value or category in a set of values or categories; describe the amount of variation in the values or across the categories of a single variable; and describe the association between two variables. We will examine the meaning and interpretation of measures of central tendency including the modal category for discrete variables and the mode, the median, and the mean for continuous variables. We will also discuss criteria for selecting the appropriate measure of central tendency under various conditions. To do this, we will need to understand the concepts of skewness and kurtosis.

Sirkin, Statistics for the Social Sciences, pp. 81-119

May 25, 2009- No School

May 27, 2009-

Measures of Variation

In our second session on descriptive statistics, we will concentrate on measures of variation. For discrete variables, we will introduce the Index of Qualitative Variation (IQV). For continuous variables, the standard deviation and the variance will be presented. Z-scores will also be presented.

Sirkin, Statistics for the Social Sciences, pp. 125-139

Crosstabulation

A third type of description expresses the association or relationship between two variables. When both are discrete variables, association is described through a two-way frequency distribution (or contingency table) more commonly known as a crosstabulation. One technique for such description creates expected cell frequencies and compares them with the actual cell frequencies. This approach is the basis for the Pearson Chi Square statistic measure of association.

Sirkin, Statistics for the Social Sciences, pp. 147-162, 174-178, 383-393

Measures of Association for Discrete Variables

Other approaches for describing the association between two discrete variables exist, the most important being those based upon a proportional reduction of error (PRE) algorithm. To further complicate matters, discrete variables may have either ordered or non-ordered categories. Yule’s Q, Cramer’s V, Gamma, Lamda, and Somers’ d statistics are introduced. A key to understanding their application is to understand the concepts of concordant, discordant, and tied pairs.

Sirkin, Statistics for the Social Sciences, pp. 351-375

June 1, 2009-

Using SPSS

Some important functions of SPSS will be reviewed.

The Chi-Square Distribution

The second major statistical task is the making of inferences. This requires a statistic with a known sample distribution. The Chi-Square distributions are one example of a set of sampling distributions whose properties are known. A series of new concepts is introduced including degrees of freedom, alpha levels, critical values, and p-values.

Sirkin, Statistics for the Social Sciences. Pp. 196-200, 394-397, and 410-423

Review

A review of the main ideas to date. This is your chance to ask questions before the first examination.

FIRST EXCEL EXERCISE DUE

June 3, 2009-

FIRST MIDTERM EXAMINATION

The examination will consist of multiple-choice questions, problems requiring calculations, and some interpretation.

The Normal Distribution

The logic of inferential statistics rests on knowing the characteristics of sampling distributions. When samples that are both large and random are involved, sampling distributions are assumed to be normally distributed. Hence, we need to understand thoroughly the characteristics of normal curves. Topics to be covered include the Central Limit Theorem, the relationship between areas under the normal curve and probability levels, and the estimation of standard error in large-sample situations.

Sirkin, Statistics for the Social Sciences, pp. 237-237 and 239-247

Estimation

The logic of statistical inferences is extended here to the process of estimation. Key concepts include standard errors, point estimates, and upper (UCL) and lower (LCL) confidence levels.

Sirkin, Statistics for the Social Sciences, pp. 254-258

June 8, 2009-

Testing Hypotheses

The logic of statistical inference is extended again, this time to the process of testing hypotheses. Key concepts include the null and alternate hypotheses, the standard error, the critical value, and the region(s) of rejection. We will also discuss the difference between one-tailed and two-tailed significance tests and between Type I and Type II errors. Student’s t-distribution used in testing hypotheses with small random samples will also be described.

Sirkin, Statistics for the Social Sciences, pp. 198-200, 237-239, 247-254, 296-300, and 407-410.

Testing Hypotheses about the Difference Between Two Means

One of the most common applications of hypothesis tests is in deciding whether or not the difference between the values for a statistic from two subsamples is statistically significant. Often the sample statistic involved is a mean (e.g. average GPA) from two different subsamples (e.g. female and male students) drawn from the same universe (e.g. USC undergraduates). With large random samples, the sampling distribution for the values of the difference between two means is assumed to be normal (i.e. Central Limit Theorem is assumed to hold). The two subsample variances must be combined in order to estimate the standard error of the difference.

Sirkin, Statistics for the Social Sciences, pp. 201-218

FIRST SPSS EXERCISE DUE

June 10, 2009-

The t-test: Independent and Paired (Dependent) Samples

When the random samples whose means to be compared are small in size, the test for the significance of difference must be performed using the t-distribution. The only complication is in estimating the standard error of the difference. Because samples are small, differences in size are weighted. The resulting estimate is called the “pooled” estimate of the standard error of the difference. Another problem in the small-sample situation is the possibility of extreme differences in subsample variances. A test for detecting the presence of such extreme differences is explained, and an alternative way for calculating the t-test is described. A second t-test, appropriate for comparing paired (dependent) samples, is introduced.

Sirkin, Statistics for the Social Sciences, pp. 271-296

Testing Hypotheses about Differences among Several Means

The techniques for testing hypotheses covered thus far are limited to comparisons of two subsamples. When the situation calls for comparison of three or more subsamples, it is preferable to make one simultaneous test rather than several two-subsample comparisons. (The reason has to do with the increased likelihood of committing Type I errors when making multiple comparisons). The general procedure is called the Analysis of Variance (ANOVA, for short). The procedure compares two estimates of the population variance constructed using different methods.

Sirkin, Statistics for the Social Sciences, pp. 309-323

June 15, 2009-

One-Way Analysis of Variance

Two estimates of the population variance in ANOVA models are known as the mean square (i.e. variance) “between” (or “model” mean square) and the mean square (variance) “within” (or “error” mean square). We will present algorithms for estimating both. The ratio between them is known as the F-test. The sampling distributions for evaluating the F-test are known collectively as the F-distributions. Having evaluated the differences among treatment groups (subsamples) in overall terms with the F-test, it is often useful to “decompose” the evaluation to find the source of any differences among the subsamples. Both a priori and post hoc comparison tests exist. One of the post hoc tests is explained.

Sirkin, Statistics for the Social Sciences, pp. 324-340

Analysis of Covariance

Analysis of variance makes many assumptions, most of which can only be met by gathering data under controlled experimental conditions. Where experimental designs are out of the question (as is typically the case in program evaluation and policy analysis), results of ANOVA are meaningless. One method of compensating for the absence of experimental control is the introduction of one or more continuous variables as covariates. The result is a simple extension of ANOVA called the analysis of covariance.

Sirkin, Statistics for the Social Sciences, pp. 341-344

Review

This will be a review of the main ideas since the first examination. It is your chance to ask questions before the second examination.

June 17, 2009-

SECOND MIDTERM EXAMINATION

The examination will consist of multiple-choice questions, problems requiring calculations, and some interpretation.

Linear Regression with Two Variables

This is the first of several sessions leading up to the topic of causal modeling. We begin by showing how associations between two continuous variables can be described graphically in a scatterplot. If the relationship between the two variables is a linear one, then the information displayed in the scatterplot can be stated mathematically in the form of a general linear model. The meaning of the constants in the model (i.e. the intercept, and the slope) are explained and their statistical use illustrated.

Sirkin, Statistics for the Social Sciences, pp. 457-471

June 22, 2009-

Correlation

Other descriptions of associations can be derived from the general linear model. Two of these have special importance: the Coefficient of Determination, and the Pearson product-moment correlation coefficient. The interpretation and use of each are discussed.

Sirkin, Statistics for the Social Sciences, pp. 451-457

Significance Tests for Regression Models and Their Coefficients

The overall significance of the general linear model is tested using the F-distribution. Parameter estimates are tested for statistical significance using Student’s t-distributions. The meaning of both tests is explained.

Sirkin, Statistics for the Social Sciences, pp. 480-489

Multiple Contingency-Table Analysis

This is an introductory session on the mechanics of modeling casual hypotheses. We begin by characterizing “cause” and “effect” merely as labels created and used by human beings. Covariation, time order, and nonspuriousness are three criteria for determining the appropriate use of these labels.

Sirkin, Statistics for the Social Sciences, pp. 162-178

June 24, 2009-

Introduction to Multiple Regression

The process of causal modeling with continuous variables involves extending the general linear model rather than working with contingency tables. Introducing two or more “right side” variables changes the meaning of the resulting regression coefficients in an important way. Some statistical applications call for the comparison of the values of the coefficients. To do so requires that those values be standardized. The result is the standardized coefficient, commonly referred to as the Beta coefficient.

Sirkin, Statistics for the Social Sciences, pp. 525-526

Meeting the Requirements of Multiple Regression I

Extending the general linear model to include multiple right-side variables also increases the number of assumptions that need to be met in order to produce meaningful results. We will discuss several of these assumptions including linearity, additivity, absence of multicollinearity, homoscedasticity, continuous right-side variables, and the absence of specification errors. Techniques for evaluating whether or not these assumptions have been met and for adjusting the multiple regression models when they are not met are explained. Emphasis is on the regression diagnostic available with computer software.

Meeting the Requirements of Multiple Regression II

This session will review the requirements of multiple regression part 1, and finish the introduction of assumptions that need to be met in order to produce meaningful results using multiple regression analysis.

SECOND EXCEL EXERCISE DUE

SECOND SPSS ASSIGNMENT DUE

RESEARCH PROJECT DUE / PRESENTATIONS

REVIEW FOR FINAL

June 29, 2009-

FINAL EXAMINATION 6pm-8pm

An examination covering the final third of the course. It consists of multiple choice, and interpretations.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download