Reliability



Reliability of Assessments

What is test reliability?

Test reliability indicates the degree to which a test yields consistent results.

Statistical techniques determine reliability and help ensure that there would not be a radically different score if the student were to attempt the test a second time with no additional learning.

No test is perfectly reliable, but proper test construction methods can do much to eliminate the chances of inaccurate student placement or inaccurate measurement of student progress.

Reliability is defined as "an indication of the consistency of scores across evaluators or over time." An assessment is considered reliable when the same results occur regardless of when the assessment occurs or who does the scoring. There should be compelling evidence to show that results are consistent across raters and across scoring occasions.

Reliability (statistics)

From Wikipedia, the free encyclopedia

In statistics, reliability is the consistency of a set of measurements or measuring instrument. Reliability does not imply validity. That is, a reliable measure is measuring something consistently, but not necessarily what it is supposed to be measuring. For example, while there are many reliable tests of specific abilities, not all of them would be valid for predicting, say, job performance.

In experimental sciences, reliability is the extent to which the measurements of a test remain consistent over repeated tests of the same subject under identical conditions. An experiment is reliable if it yields consistent results of the same measure. It is unreliable if repeated measurements give different results.

Relation Between Validity and Reliability

What is the difference between validity and reliability?

Answer:

• Validity is the extent to which test scores mean what you say they mean. That is, are you interpreting the scores appropriately?

• Reliability is the extent to which test results are consistent over time, different versions of the test, or people scoring it. That is, how dependable are the results?

Validity is defined as "an indication of how well an assessment actually measures what it is supposed to measure." The chapter identifies three aspects of an assessment that must be evaluated for validity: tasks, extraneous interference, and consequences.

Why should we be concerned about reliability?

Answer:

• Your test can’t be valid unless it is reliable (i.e., its scores are dependable).

• In fact, a test’s criterion validity can be no higher than the square root of its reliability.

• It is important to know how much measurement error there is in individuals’ scores (e.g., on a standardized test).

Reliability: Some important points

1. there are different kinds of consistency, so there are different kinds of reliability

2. reliability requires statistical, not logical analysis (validity requires both)

3. calculating reliability requires test scores

4. reliability can be reported in three ways, which serve different purposes

a. correlations

b. standard error of measurement

c. percentage agreement

Reliability Coefficient (Rxx)

Rxx = square root of the following ratio:

similarity in ranks on Forms 1 & 2

(SD1)(SD2)

SD = standard deviation

Important points:

• Like all correlations, reliability coefficients are sensitive to variation in the sample (SD): smaller variation means lower reliabilities, all else equal.

• Why? Because tests can’t distinguish well among people who don’t differ much in knowledge or ability (SD is small). With retesting, small changes in their scores can easily change their ranks on the test—which depresses the numerator above (relative to the SDs).

Assessing Reliability of Norm-Referenced Tests: Correlational Methods

Methods:

• test-retest—same test, different times

• equivalent forms—different forms of test, "same" time

• test-retest with equivalent forms—different forms, different time

• internal consistency—different parts of same test

a. split half

b. Kuder-Richardson and Coefficient Alpha

• interrater consistency—different raters/graders

Assessing Reliability of Norm-Referenced Tests: Correlational Methods

Important points:

Comparing methods

• some methods include more types of consistency than others

• some are better suited to some purposes than others

• test-retest with equivalent forms is the most useful for most purposes

Influences on reliability

• number of items **crucial, because it is something you can control!!**

• spread of scores

Assessing Reliability of Norm-Referenced Tests: Standard Error of Measurement

Definition: The amount of error (movement) in a person’s test score we can expect from one administration to another of same or comparable test.

Helps answer these questions:

• If short time interval: How sure can we be that the person’s true score really is close to their observed score? (fringe of error)

• If long time interval: How likely is their score to remain roughly the same over some period of time? (stability of test scores)

Standard Error of Measurement (SEM)

Important points:

• SEM is derived directly from reliability coefficient

SEM = SD times the square root of (1-reliability)

• SEMs always depend on the spread of scores (SD) and other characteristics of a group (e.g., age)

• SEMs always refer to a specific set of test-takers, therefore

• you need to judge whether the estimates derived from another group really apply to your students (e.g., their age level, heterogeneity)

Differences in error of measurement

Do they really matter? How?

Would you expect all kinds of tests to be equally reliable? Why or why not?

Assessing Reliability of Criterion-Referenced Tests: Percentage Agreement

Question:

Why might we not want to use correlational methods with criterion-referenced tests?

Answer:

The aims of norm- and criterion-referenced tests are usually different. The former often sample a broader range of material and seek to differentiate among students. In contrast, criterion-referenced tests usually cover a smaller, more specific domain of tasks and are meant to assess absolute, not relative, levels of success in mastering the material.

Which decisions demand high test reliability?

• Important

• Final

• Irreversible

• Unconfirmable

• Concern individuals

• Have lasting consequences

Usability of Assessments

• ease of administration

• time required for administration

• ease of interpretation and use

• availability of alternate forms

• cost of testing

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download