INTRODUCTION TO TEST THEORY



INTRODUCTION TO TEST THEORY

Education 252

Spring 2006

MWF 9:00-10:30, School of Education Building, room 208

Instructor: Edward Haertel haertel@stanford.edu

office: e331 650-725-1251

home: 650-494-6432

(Office hours by arrangement, individual or small group)

Teaching Assistant: Xuejun (Ina) Shen xuejuns@stanford.edu

cell: 650-387-1226

This course introduces classical test theory, including definitions and formulas for test reliability, standard error of measurement, and related statistics. Additional topics include test validity, item statistics useful in test construction, score scales and norms commonly used in educational testing, item bias and test bias, and ideas of fairness and equity in educational and psychological testing. Factor analysis as well as the major extensions and alternatives to classical test theory, generalizability theory and item response theory (latent trait theory), are briefly introduced.

This course is intended to equip students to read the literature in their own substantive areas more critically, to use tests more intelligently in research, and to pursue further studies in psychometrics. It is prerequisite to Education 353A (Item Response Theory) and Education 353C (Generalizability Theory). Although the focus of the course is on paper-and-pencil measures of cognitive abilities and academic achievement, most of the concepts and methods developed apply equally to performance testing, as well as the assessment of attitudes and personality constructs, ratings based on systematic observations, and other kinds of assessments of individuals or groups.

Materials

The text for the course is Introduction to Classical and Modern Test Theory,

by Linda Crocker and James Algina (1986).

Students are also encouraged to pursue the readings listed in the attached bibliography. These are intended both to provide additional review and practice for students especially insecure about their statistical preparation, and to offer more thorough coverage of selected topics, for those who wish to go beyond what is provided in the text or who are concerned with specific kinds of applications of measurement theory. In previous years, optional readings have been placed on reserve. The library has found, however, that the demand for optional readings is generally low, and that the restricted loan period for materials on reserve inconveniences those students who do make use of them. For the past several years, I have followed the library's recommendation in not placing these materials on reserve, and students have not reported any difficulties. If access to or competition for these materials should pose any problem, please let me know, and I'll try to locate additional copies for you to use.

Course Meetings, Requirements, and Grading

Attendance is expected at lectures, 9:00-10:30.[1] I have found that scheduled office hours do not provide sufficient flexibility to accommodate students' varied schedules, which is why I have listed "office hours by arrangement, individual or small group." Please do not hesitate to email me to set up a time to meet if necessary. Early in the quarter, on some Fridays when there is no lecture scheduled, Xuejun and I plan to conduct optional discussion sections for students wishing extra help or practice with course material.

The course is listed as "3-4 units." For most students, this should be a four-unit course, but enrollment for three units is offered as a courtesy for those with limited tuition grants or stipends. Enrollment for 3 or 4 units makes no difference in the work expected.

The satisfactory/no credit (+/NC) option is available for this course.

Brief in-class quizzes and homework exercises will be assigned to provide practice and to check comprehension. Completion of these quizzes and exercises is required, but grades will be based solely on the take-home midterm exam and take-home final exam, with more weight on the final.

This course is not designed for students with exceptionally strong mathematical and statistical preparation, but such students may nonetheless desire a systematic introduction to measurement theory. If you fall into this category, please see me individually to arrange some special project in connection with the course, such as a brief paper on some measurement topic, a critique of measurement methods in one or more pieces of published research, a critique of the measurement of some psychological construct, or a review of one or more published tests. I will be happy to arrange one or more additional units of credit for such special projects in conjunction with the course, if you wish. Small groups of students are also welcome to undertake such projects.

Students with Documented Disabilities

Students who have a disability that may necessitate an academic accommodation or the use of auxiliary aids and services in a class must initiate the request with the Disability Resource Center (DRC). The DRC will evaluate the request with required documentation, recommend appropriate accommodations, and prepare a verification letter dated in the current academic term in which the request is being made. Please contact the DRC as soon as possible; timely notice is needed to arrange for appropriate accommodations. The DRC is located at 563 Salvatierra Walk (phone 723-1066; TDD 725-1067).

Tentative Course Schedule

Date Topic Readingsa

W 4/ 5 Introduction and Overview; Notation C&A 3-15

F 4/ 7 AERA/NCME Annual meetings: Class cancelled

M 4/10 AERA/NCME Annual meetings: Class cancelled

W 4/12 Properties of composite scores C&A 16-50, 60-64, 87-101; WE&C 56-73, Ch. 12

F 4/14 The classical (weak) true-score model C&A 36-42(review),105-30; WE&C 178-192;

Gulliksen Chs. 2, 3

M 4/17 Estimating reliability (1) C&A 131-43; Feldt & Brennan 105-113;

Gulliksen Chs. 6-7

W 4/19 Estimating reliability (2); stratified alpha Feldt & Brennan 113-118; Feldt (1990)

F 4/21 Optional discussion section

M 4/24 Estimating reliability (3); alternative coefficients Haertel draft chapter excerpt (to be distributed in class)

W 4/26 Useful applications (1) C&A 143-56; Gulliksen Chs. 8,10

F 4/28 Optional discussion section

M 5/ 1 Useful applications (2); rel. of group means Feldt & Brennan §§ 2.7.1, 2.8.8

W 5/ 3 Conditional standard errors of measurement Feldt & Brennan §§ 2.8.3

F 5/ 5 Optional discussion section

M 5/ 8 Generalizability theory C&A 157-91; Shavelson & Webb

W 5/10 Validity (1) C&A 217-42; Campbell & Fiske (1959);

APA Standards

F 5/12 Optional discussion section

M 5/15 Validity (2); MIDTERM DUE Messick (1989, 1995)

W 5/17 Factor Analysis C&A 287-308

F 5/19 No class

M 5/22 Test construction and item analysis C&A 66-86, 311-38; Millman & Greene;

WE&C 196-203

W 5/24 Scores, scales, and norms C&A 399-409,431-32,438-55; Petersen, Kolen,

& Hoover

F 5/26 No class

M 5/29 Memorial Day Observed (University Holiday)

W 5/31 Item Bias, Test Bias, & Equity C&A 267-78, 283-85

F 6/ 2 Item response theory C&A 339-50,352-54,361-71

M 6/ 5 No class

W 6/ 7 Testing and Educational Policy

W 6/14 TAKE-HOME FINAL EXAMINATION DUE, 12:00 noon

_________________________________________________________________________

a"C&A" refers to Crocker and Algina (1986) and “WE&C” to Welkowitz, Ewen, and Cohen (2000). Written examinations will cover both lectures and reading assignments from C&A. References to WE&C are to provide additional explanatory material for students who lack a strong statistical background. The remaining readings treat specific topics in greater depth.

References

American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME). (1999). Standards for educational and psychological testing. Washington, D.C.: American Psychological Association.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.

Crocker, L., and Algina, J. (1986). Introduction to Classical and Modern Test Theory. New York: CBS College Publishing.

Feldt, L.S. (1990). The sampling theory for the intraclass reliability coefficient. Applied Measurement in Education, 3, 361-367.

Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105-146). New York: Macmillan.

Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: Macmillan.

Messick, S. (1995). Validity of Psychological Assessment: Validation of Inferences from Persons' Responses and Performances as Scientific Inquiry into Score Meaning. American Psychologist, 50, 741-749.

Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 335-366). New York: Macmillan.

Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York: Macmillan.

Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.

Welkowitz, J., Ewen, R.B., Cohen, J. (2000). Introductory statistics for the behavioral sciences (5th Ed.).Orlando: Harcourt Brace College.

-----------------------

[1] The course is listed as 9:00-10:50 due to constraints in Axess. Meeting time will be 9:00-10:30. Nominal meeting time for a four-unit course would be four 50-minute sessions/week. We will ordinarily have two 90-minute lectures/week, plus optional 90-minute discussion and review sessions as needed. The MWF schedule will allow for flexibility around missed sessions due to AERA, as well as scheduling of discussion section, etc.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download