ENG 402(3)



ENG 401(3)

June/30/2003

Iri-Noda-San

(IRINODA, Katsutoshi)

Research Paper

Can you correctly answer the reading comprehension test in SAT

without reading the passage?

Reviewing the debate between Katz & Lautenschlager (The University of Georgia) and Freedle & Kostin (Educational Testing Service)

The author’s attitude toward children appears to be one of

A) concern for the development of their moral integrity

B) idealization of their inexperience and vulnerability

C) contempt for their inability to accept unpleasant facts

D) exaggerated sympathy for their problems in daily life

E) envy of their willingness to learn about morality

(College Board, 1983)

This is one of the test items in the past SAT. It comes with a particular reading passage and the test-takers are asked to choose the best answer. In the example above, however, the whole reading passage is omitted. Nonetheless, isn’t it true that you are likely to get the answer correctly just reading the stem and choices, is it? Which will you choose?

In an experiment (Karz, et al., 1990), without the passage, 83% of examinees choose (A), while no more than 7% choose any of the incorrect choices. The correct choice is (A). I could choose (A) myself and I am very sure that you have chosen the same one. How could I do that? My answer is, the others are simply unlikely (1).

In the first paper for this course, I have reviewed major journals in language testing. With the growth of the field for these decades, there have appeared a huge number of issues there. As a case study, in this paper I am going to deal with the debate in the 1990s between Stuart Katz and Gary J. Lautenschlager of University of Georgia and Roy Freedle and Irene Kostin of Educational Testing Service (ETS). The debate itself centers around the validity of reading comprehension tests, and it starts when the former scholars criticized the SAT on the basis of their findings that many of the questions could be answered without reading the passages at all. How it can be true of course interests us, so it would be worthwhile to introduce this debate.

If we can answer the questions without reading the passage, then what is the value of the omitted reading passage itself? Is the SAT a valid test for measuring reading comprehension? This is the very question that Karz & Lautenschlager have been studying over a decade.

Katz, Lautenschlager, Blackburn, & Harris (1990) raise the issue in the research article “Answering reading comprehension items without passages on the SAT” in Psychological Science. They conduct the two experiments in this study. In the first experiment, the examinees in the control condition (P Group) score 69.6 on the average (out of 100), whereas the other two groups who take the test without the reading passage score 45.8 (NP/C Group; No Passage/Coaching) and 46.6 (NP Group; No Passage) respectively. Since the RC question has five choices, the chance level of the score should be one-fifth of the full score, which is 20 in this case, or less; for low levels of ability, performance on well-constructed items may fall below chance, because the examinee succumbs to distractors (Donlon, 1984; Lord, 1974, 1980. Italics in the original). Pearson product-moment correlations between their scores in the experiment and their original SAT Verbal scores are also high; r(16) = .68 (p < .005), r(15) = .56 (p < .025), and r(27) = .51 (p < .005), respectively (2). The item analysis reveals that 69 and 72 of the 100 test items in the NP/C and NP conditions, respectively, have p values (proportion correct) exceeding .3, while nearly half have p values exceeding .5. Better-than-chance performance of individual items, therefore, is a frequent occurrence when the passages are missing.

In the second experiment, where they employ a larger number of participants who are much closer to the average of the normative samples in their SAT-Verbal scores, they obtain a similar result. The examinees in the control condition score 56.8 (out of 100), while those in the NP Group score 37.6. Pearson product-moment correlations between their scores in the experiment and their original SAT Verbal scores are also high; r(51) = .88 (p < .001), r(73) = .72 (p < .001), respectively. The item analysis reveals that 61 of the 100 test items in the NP condition have p values (proportion correct) exceeding .3 (30%), and nearly a third have p values exceeding .5 (50%).

To sum up, these findings indicate that though the examinees who have access to the reading passage naturally score higher than those who have no access, the latters still get about twice as high scores as a mere chance (20 out of 100). All of the examinees show strong correlations between the scores when they take the test with the reading passage and those without. The majority of the items are answered correctly without the passage. Karz et al. conclude that “performance on the RC task, therefore, would appear to depend substantially on factors having nothing to do with understanding the prose passages normally accompanying test questions (pp. 125-126)”. This is a serious problem because “ETS instructs examinees to answer questions on the basis of passage content, not personal opinion or knowledge. Despite these instructions, our findings show that proscribed information can often be used to advantage, and prescribed information often ignored without disadvantage (p. 126)”.

Katz, Blackburn, and Lautenschlager (1991) is designed to supplement the findings above. In case that examinees used information from other items (“cognates”) belonging to the same passage, they make the comparison between the NP condition and the Quasi-randamized (NP/QR) condition, where the test items are presented in a quasi-randamized order to prevent the use of overlapping information among cognates; no more than two items from the same passage appear on the same exam battery. The adjusted means for the NP and NP/QR conditions, respectively, are 35.7 and 34.1. Pearson product-moment correlations between their original SAT-Verbal scores and the scores in this experiment are high; r(62) = .64, p < .001 for the NP condition, and r(57) = .62, p < .001 for the NP/QR condition. This result convinces them that information obtained from cognates, i.e., from test items belonging to a single passage, plays little part in overall test performance.

Furthermore, Katz and Lautenschlager (1994) explore the data for the SAT-I (the revised SAT from 1994), ACT, and GRE. A major finding is that the GRE RC task is less vulnerable to nonpassage factors than are the other tasks (SAT-I and ACT). The mean percentage correct on the GRE in the NP group is 27.1%, which exceeds chance by only 7%; the one in the P group is 46.4%. Both in the SAT-I and the ACT, their previous findings are endorsed; the mean percentage correct on the SAT-I is 74.9% in the P group and 42.9% in the NP Group, while the one on the ACT is 69.5% and 48.9% respectively. Remember that the chance is 25% in the ACT because there are four, not five, choices for each item in the ACT.

Then, Freedle & Kostin (1994) of the ETS challenge these Katz & Lautenschlager’s findings in the form of reply to Karz et al. (1990) on Psychological Science. Their chief claim is to demonstrate that “some of these criticisms [of the validity of reading tests, with Karz et al. (1990) in mind] not only are too extreme, but are, in fact, incorrect (p. 107)”. According to Freedle and Kostin, the results of Karz et al. underscore the prevailing viewpoint of an influential study by Drum, Carfee, and Cook (1981), who divide several predictor variables into two broad categories: item variables and text variables. They insist that without further critical analysis, this finding are widely cited as evidence against the construct validity of multiple-choice tests. That is, the fact that an item variable is by far the strongest predictor of difficulty comes to be viewed as evidence against construct validity. However, there is a third category; text-by-item overlap variable.

“Suppose some words that occur throughout the reading passage are also used in one of more of the item options. One might hypothesize that people tend to choose an option if it contains more text words than some other option. This variable represents text-by-item overlap because it requires examination of both the item content and the passage content in order to enter the appropriate lexical counts (p. 107)”.

So, according to their argument, the strongest predictor of difficulty in the RC tests is the text-by-item overlap variables, which fact is empirically proven by their accumulated researches on the prediction of reading comprehension item difficulty in standardized tests Freedle and Kostin (1991, 1992, 1993; each of which study display the dominance of the text-by-item overlap variables in the SAT, the GRE, and the TOEFL, respectively). Because Karz et al. (1990) do not take this variable into account, their criticism toward the validity of the SAT (and other tests) are to be rejected.

However,

Katz & Lautenschlager (1995) end their commentary with this sentence, “”The SAT reading task and others like it indeed appear to be flawed psychometric instruments (p. 127)”.

Katz & Lautenschlager (2001)

Katz, Johnson, & Pohl (1999)

Powers & Wilson (1993)

An implication of this debate is that it symbolizes the long history of debates between the academics and ETS. Put simply, it seems that when the scholars criticizes the tests made by ETS (which happens very frequently), ETS always tries to defend the validity of their own tests by resorting to the resources that are compatible with their argument. This is probably because the ETS tests like the SAT are such big enterprises that they try to maintain their authority. Consequently, ETS always claims their validity while the scholars turn away from them. Communication breaks down. To review this particular debate is a good opportunity to reconsider these long-deteriorated relations between scholars and ETS, and to start to think about the more fruitful terms for the two parties concerned.

NOTE

(1) Katz and Lautenschlager (1994) more neatly have it that “[t]he item just discussed is flawed because not a single incorrect choice characterizes a socially appropriate attitude toward children, whereas the correct choice does (p. 304)”.

In a unique test-prep book that claims SAT is the Slimy and Atrocious Torture made by the Evil Testing Servant, Abaluck et al. (2002) explicate some of the similar techniques for reading comprehension (hereafter RC) tests in SAT. In all the examples they look at about the art passage (the passage about an art – literature, painting, sculpture, crafts, music, etc.), the author has a positive attitude toward the artist or artist form. The author might have some specific criticisms, but the overall point of the passage will be complimentary (p. 52). They also insist that the minority passage (the passage about the minority) make the SAT easier for everyone because the ethnic passage is incredibly predictable. The ETS is going to say good things about the minority group, and that’s the whole point of passage. So, for example, the real SAT item like below is considered “giveaway”.

The author’s attitude toward the Chinese achievements mentioned in lines 1-45 is best described as one of

(A) disbelief (B) admiration (C) anxiety (D) ambivalence (E) apathy

The only one of these choices the expresses a clearly positive attitude toward the Chinese is (B), so (B) is the right answer (p. 53).

(2) r(16) = .68 (p < .005), r(15) = .56 (p < .025), and r(27) = .51 (p < .005), respectively. These figures are interpreted this way. r is the value of Pearson product-moment correlation. The value ranges from –1.00 to +1.00. r = .68 means that the correlation is 68% in the positive, which in turn implies that the same relation is to be found 68 times out of 100 times if experimented. The figure in the parenthesis after r is defined as N-1, with N the number of participants. p stands for the certainty of this correlation. p < .005 means there is 99.5% certainty, while p < .025 means there is 97.5% certainty. For the detailed interpretation and use of Pearson product-moment correlation, refer to introductory statistics textbooks (e.g., Brown, 1996, 153-166).

(3) “[At] a special session of the Executive Committee of the Conference on College Composition and Communication (CCCC), to which I had been elected. Incredibly, the ETS developers of the TSWE were asking the CCCC, which represents those who teach composition courses in American colleges and universities, to approve the test, as a needed support for college writing programs. Only in retrospect does the lack of communication at that meeting seem comic; powerful emotions and strong language dominated at the time. The CCCC leaders talked of sociolinguistics and the importance of writing as a critical thinking on tests and in the schools; the ETS staff talked of administrative convenience and the practical needs of writing programs. CCCC representatives talked about helping students learn, whereas ETS staff talked about inexpensive testing of standard English. My memory of just how things concluded is a bit dim, clouded not only by time but also by my own fervor and rhetoric as a participant. I think the ETS staff finally stormed out of the room, while the Executive Committee unanimously passed a strong resolution condemning the new test. Subsequently, virtually every other professional organization in English endorsed that resolution or passed one like it. Despite this opposition, the test became, and still remains, a steady thorn in the side of writing faculty, demeaning their work wherever it is used. […] To its credit, ETS has recently announced that it will phase out the TSWE, and many of us will be watching to celebrate its demise – and to see what may replace it.”

White. (1994). pp. 277-78.

Works Cited

Abaluck, J., Berger, L., Colton, M., Mistry, M., Prabhuswamy, S., and Rossi, P. (2002). Up your score: The underground guide to the SAT (2003-2004 Edition). New York: Workman Publishing Company.

Brown, J. D. (1996). Testing in language programs. Englewood Cliffs, NJ: Prentice-Hall Publishers.

College Board. (1983). 10 SATs. New York: Author.

Donlon, T. F. (1984). The College Board technical handbook for the scholastic aptitude test and achievement tests. New York: College Entrance Examination Board.

Drum, P. A., Calfee, R. C., & Cook, L. K. (1981). The effect of surface structure variables on performance in reading comprehension tests. Reading Research Quarterly, 16, 486-514.

Freedle, R., and Kostin, I. (1991). The prediction of SAT reading comprehension item difficulty for expository prose passages (ETS Research Rep. RR-91-29). Princeton, NJ: Educational Testing Service.

Freedle, R., and Kostin, I. (1992). The prediction of GRE reading comprehension item difficulty for expository prose passages for each of three item types: Main ideas, inferences, and explicit statements. (ETS Research Rep. RR-91-59). Princeton, NJ: Educational Testing Service.

Freedle, R., and Kostin, I. (1993). The prediction of TOEFL reading item difficulty: Implications for construct validity. Language Testing, 10, 133-170.

Freedle, R., and Kostin, I. (1994). Can multiple-choice reading tests be construct-valid? A reply to Katz, Lautenschlager, Blackburn, and Harris. Psychological Science, 5, 107-110.

Katz, S., Blackburn, A. B., & Lautenschlager, G. (1991). Answering reading comprehension items without passages on the SAT when items are quasi-randomized. Educational and Psychological Measurement, 51, 747-754.

Katz, S., & Lautenschlager, G. (1994). Answering reading comprehension questions without passages on the SAT-I, ACT and GRE. Educational Assessment, 2, 295-308.

Katz, S., & Lautenschlager, G. (1995). The SAT task in question: Reply to Freedle and Kostin. Psychological Science, 6, 126-127.

Katz, S., & Lautenschlager, G. (2001). The contribution of passage and no0passage factors to item performance on the SAT reading task. Educational Assessment, 7(2), 165-176.

Katz, S., Lautenschlager, G., Blackburn, A. B., & Harris, F. (1990). Answering reading comprehension items without passages on the SAT. Psychological Science, 1, 122-127.

Katz, S., Johnson, C., & Pohl, E. (1999). Answering reading comprehension items without the passages on the SAT-I. Psychological Reports, 85, 1157-1163.

Lord, F. M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika, 39, 247-264.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

Powers, D, E., & Wilson, S. T. (1993). Passage dependence on the new SAT reading comprehension questions. (College Board Report No. 93-3). New York: The College Board.

White, M, E. (1994). The politics of assessment: past and future. Teaching and assessing writing: recent advances in understanding, evaluating, and improving student performance. San Francisco: Jossey-Bass Publishers, 270-297.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download