This interactive teaching tool uses MS Excel to illustrate ...



This interactive Reliability Instructional Measurement Module (RIMM) uses MS® Excel to illustrate the basics of the most commonly reported reliability coefficients. The information in this document is designed to supplement your classroom instruction. Be certain to consult your textbook or your instructor if you have specific reliability questions not addressed in this tutorial.

For copies of the Excel program and related instruction materials, visit coe.unt.edu/morrow.

You may be asked to take an assessment task or complete an evaluation form to help me make design and content improvements to the tool. Please provide as much information as possible since I’ve spent a great deal of time trying to create this student tutorial. Thank you for your help!

Table of Contents ~ Page ~

Letter to Participants 2

Introduction to Excel 3

Troubleshooting for the Reliability Tool 5

Introduction to Reliability 8

Norm-referenced Procedures

Estimates of Reliability - Interclass Correlations

Test-Retest (Stability) 10 Equivalence 11

Split-Halves 12

Spearman-Brown 13

Notes:

*Depending on the Excel version you’re using, the screenshots in this document may look different from the file you’ll view on your computer.

*The electronic file was created in Excel version 2003. It will work best using either Excel XP (version 2002) or Excel 2003.

March 17, 2005

| |[pic] |

Dear Participant:

Thank you for taking a moment to assist me in a research project as part of my doctoral education in the College of Education at the University of North Texas. The attached materials and the electronic Excel file (available at: coe.unt.edu/morrow) are being used to determine student receptivity to instructional technology as applied to measurement courses.

Your participation is voluntary and there are no personal benefits (unless otherwise discussed by your class instructor) in completing this exercise except the potential you have in identifying policies, procedures or environmental factors which could be modified or enhanced to improve education in measurement courses. The data collected will be used strictly for research purposes in completion of my doctoral project under the guidance of Dr. Robin Henson, Professor and faculty sponsor at the University of North Texas. This project has been reviewed and approved by the UNT Committee for the Protection of Human Subjects (940-565-3940). You may withdraw your participation at any time.

Please take a few minutes to review the Reliability Tool. Any responses submitted by you will be anonymous to the primary investigator. A brief review of the materials should take approximately 20 minutes. Further review of the materials, especially if used for study preparation, may take longer.

If you have any questions, please do not hesitate to contact me at (940) 369-7894 or Dr. Henson at (940) 369-8385.

Sincerely,

Leslie R. Odom, B.S., M.A. Dr. Robin K. Henson

Doctoral Student Assistant Professor, Educational Research

University of North Texas Department of Technology & Cognition

Denton, Texas University of North Texas

Denton, Texas

There are a few pieces of information that you’ll need to know in order to work with this reliability tool using Excel. Below is a snapshot of the tool title page. I’m going to use it to point out several aspects of working with Excel.

[pic]

There are three items labeled on this title page example:

1. This is the “Zoom” button. You can change this value to make the Excel sheet look bigger or smaller on the computer screen. The larger the value, the larger the sheet contents will appear. This value is currently set to 68% and the writing looks small. Select a larger number (say, 75%) to make the page bigger. You can also type in specific percentage values, like this: 62%, 77%, 88%, etc.

2. You can put several separate worksheets in a larger Excel workbook, or file. You can go to a new sheet by clicking on a tab. Five tabs are shown here (i.e., “Title”, “TestRetest”, “Equivalence”, “Split-Halves”, and “Spearman-Brown”). Clicking on a tab will take you to that particular worksheet.

3. You may have to scroll down or across the contents of a worksheet. You do this by clicking on the “slider” bars and moving them up/down or left/right. This works exactly like an Internet browser application (IE, Netscape, etc.) when you’re looking at web pages.

Here are a few additional notes that will help you become acquainted with the capabilities of the reliability program itself:

[pic]

4. Cells or text boxes highlighted in YELLOW can be changed. Type in only numeric values (ex.: 1, 10, 56, 100) in these cells. Excel will recalculate any related formulas automatically.

5. Cells highlighted in GREEN are output cells. These cells show you the result of formulas that Excel is performing. Look for the reliability coefficients (and other related information) in these cells. Excel will allow you to type in non-numeric values (e.g., “A”, “k”, “?”, “%”, etc.) but it will not update the calculations.

6. There are function buttons on many of the following worksheets - like this: [pic]You can print worksheets or update data entries throughout this presentation by clicking on these buttons. Their labels tell you what they do.

7. Don't worry about "breaking" this program. If you encounter any glitches, simply close Excel and select DO NOT SAVE CHANGES if prompted!

TROUBLESHOOTING:

1. If Excel automatically disables all Macros when you attempt to open the Reliability file, try this:

a. In Excel, go to – “Tools”

– Select “Options. . .”

[pic]

b. Under the Options Box, Select the Security Tab.

c. Then Click on the “Macro Security. . .” Button.

[pic]

d. Click on the “Medium” Security Level option.

[pic]

e. Try to reopen the Reliability file again. You will be given the option to “Enable” or “Disable” the macros in the file. Select “Enable” macros. Excel will no longer automatically disable the macros in a file you open.

2. If you click on an item and Excel highlights the border of that item (as shown in the following screen capture), try this:

a. Look to see if the Design icon is on your toolbar. It has a ruler, a pencil, and a triangle on it.

[pic]

b. If it is highlighted in orange, your version of Excel has opened in “Design Mode” and you want to get out of this mode.

c. Click on the icon to exit this mode. The buttons will now perform their function when you click on them.

3. If you accidentally select a drawing object, like these rectangles (these are different from function buttons that “do” something when you click on them), click on some white space to deselect the drawing object.

[pic]

Why do I have to understand reliability? In your class, you are being introduced to the basic concepts of tests and measures. There is a lot of information related to this subject, and many people devote their lives to understanding the processes that relate to testing and scoring. Since tests produce the scores researchers use to understand a variety of measurable characteristics of people, we have ways of determining the quality of those scores that will be used in either the most basic or the most advanced statistical analyses. Understanding reliability methods will help you determine the quality of scores for a test you’ve taken or may give to students, clients, etc.

So how do I understand this thing called reliability? This presentation and the Excel examples focus on reliability methods rooted in Classical True Score (CTS) Theory. There are other theories that have different approaches to estimating reliability values, but let’s take one thing at a time! At its most basic level, the CTS theory involves an understanding about scores that are obtained from testing administrations. It is understood, or assumed, that a person’s observed score is actually made up of two pieces – a true score (the score that represents a person’s true ability or knowledge level) and an error score (we’ll get to this in a second). When added together, these two pieces make up the observed score.

It is important to note that error and true scores are not directly measurable. First, the true score is something we can’t directly measure because the true score represents the amount of knowledge, information, skill, or ability that a person truly has. With regard to cognitive information, this is the amount of knowledge one has residing in the brain. Now, it would be very unethical for your teacher to ask for a brain sample to determine what you’ve learned in his/her class during the semester! So, the teacher has to give you a test to try to obtain an estimate of your true score value. When a person takes a test, and assuming that they have studied, the score they make on the test (the observed score) is the estimate of their true score. However, there are countless things that can impact the students when they take a test. These items will either positively or negatively impact the observed score they make on a test. These items are collectively represented by the error score, but again are not directly measurable. Let’s look at an example:

Amy Chris Bill

[pic] [pic] [pic]

True score – 100 90 80

Observed score – 100 85 85

Error score – 0 -5 +5

These three students just took a test in their measurement class. Amy, Chris, and Bill are special in that their true scores are known – don’t ask how, it was messy! We see that each student has a different true score, but let’s compare the scores they made on the test with their true scores. Amy’s observed and true scores were the same (Yea!). Chris and Bill’s scores were different (Uh-oh). Chris scored five points lower on the test than his true score, and Bill scored five points higher than his true score. Chris is interviewed after the test and he reveals that he stayed up all night, didn’t eat breakfast, and was running late for the exam. All of these items caused him a great deal of stress and he had difficulty concentrating during the test. These sources of error negatively impacted his test performance, so his observed score underestimated his true score ability. Bill was also interviewed and he revealed that he overheard the student sitting next to him mumble the answers to questions #3, #5, and #7. He didn’t know the answers to these questions, so he wrote down what he overheard from his neighbor. For Bill this source of error positively impacted his test performance, so his observed score overestimated his true score ability.

The relationship between observed scores, true scores and error scores is often represented by the equation: Observed score = True score + Error score or O = T + E

We’ll use this equation because observed scores are the only scores available to us when we administer a test. Figure 1 represents this equation in a graphic. After reviewing Figure 1, which student’s observed score is a better estimate of their true score? Why?

Figure 1: Two different observed score measures represented by the proportion of true score and

error score information.

There are different ways to represent this relationship -- but you get the idea. In the real world, the test administrator never really knows the true scores and error scores. Only the observed scores are known. Measurement specialists have developed methods that use observed scores to provide estimates of something called reliability. If observed scores are consistent (reliable) measures, they will be stable estimates of someone’s true score ability. This is what should be of interest to the informed measurement student. Observed scores that are inconsistent (unreliable) are very influenced by the many sources of error. These inconsistent observed scores will not be very good estimates of someone’s true score because of their instability. We want to analyze and interpret observed scores that are reliable estimates of someone’s true score. Let’s take a look at several of these methods.

Interclass Procedures ~ Interclass procedures use the Pearson product moment correlation coefficient (PPM) to provide an estimate of reliability. There are several methods, but all are based on the PPM.

Test-Retest (Stability) Reliability Estimates

Calculating the PPM for the scores from two administrations of the same test results in a reliability coefficient called a TEST-RETEST or STABILITY COEFFICIENT. The researcher is interested in the consistency of observed scores obtained from two different administrations, or trials, of a single test form. Here is a copy of the Excel worksheet that will help you calculate the Test-Retest (Stability) Reliability Coefficient. For this example, we have 26 students who will take a test on two different test occasions.

[pic]

Notice that the scores for these students appear to be very consistent across the two trials. The observed scores for Trial 1 are similar to the observed scores on Trial 2. These scores display a high level of reliability. Note the reliability estimate of .92. You might consider changing some of the entries now and see what happens to the test-retest reliability. Can you figure out why the reliability went up or down when you changed some scores?

Equivalence Reliability Estimates

Calculating the PPM for the scores on two test forms results in a reliability coefficient called an EQUIVALENCE COEFFICIENT. The researcher is interested in the consistency of observed scores obtained from two different versions (but equivalent forms) of a test. When test forms are equivalent, they will still cover the same test materials but the test items will be different for the different test forms. Since the students are tested twice over the same test content, half of the students take Test Form A first, while the other 13 students in our example take Test Form B first. A predetermined amount of time separates the two test administrations. Scores are collected from students after they complete both versions of the test. Here is a copy of the Excel worksheet that will help you calculate the Equivalence Coefficient. Notice that this sheet looks exactly like the “TestRetest” worksheet. This should emphasize the fact that the PPM is calculated the same way for any Interclass reliability procedure. However, the resulting PPM coefficients are given different names depending on the circumstances related to test form or test trials.

[pic]

Notice that the scores for these students appear to be very consistent. Their observed scores on Form A are very similar to their observed scores on Form B. These scores display a high level of reliability. Note the reliability estimate of .97. Wow! Considering the highest value of the PPM is 1.0, these scores are highly reliable. You might consider changing some of the entries now and see what happens to the equivalence reliability. Can you figure out why the reliability went up or down when you changed some scores?

Split-Halves Reliability Estimates

The Split-Halves reliability estimate originates from a single test that has been split into two parts - the total score for the Odd items & the total scores for the Even items. The PPM is calculated based on these two test halves. This correlation is then substituted into the Spearman-Brown Prophecy Formula to calculate the estimated reliability of the whole (original) test. Here is a copy of the Excel worksheet that will help you calculate the Split-Halves Reliability Coefficient. For this example, we have a single test comprised of ten items. The original test is divided into two parts based on the item number. You’ll notice that student responses are scored differently than in previous examples. If the student answered an item correctly, they receive a score value of “1”. If they missed the item, they get no points and receive a score of “0”. The student scores are totaled based on the odd or even item number. Adding up the scores for the odd items yields a total score for the odd items, and adding up the scores for the even items yields a total score for the even items. Note: The total score is displayed to serve as a data check. The PPM is calculated based on the student scores for the odd and even total scores. But we can’t stop here! The original test was ten items long. The calculated PPM is based on five items. Recall that longer tests are generally more reliable than shorter tests. Thus, the obtained reliability must be adjusted to the reliability expected for a ten item test. By substituting the original reliability (i.e., the PPM) into the Spearman-Brown Prophecy Formula, we’ll get a Split-Halves reliability value that’s been estimated based on ten items.

[pic]

In this example, the reliability estimate is .985 - again a very high value. You might consider changing some of the entries now and see what happens to the split-halves reliability. Can you figure out why the reliability went up or down when you changed some scores?

Spearman-Brown Prophecy Formula

This sheet is a calculator that you can use with any reliability estimate. There are two different types of calculators.

PART I: You can calculate a new reliability estimate by entering the values for: 1) the initial reliability for a test, 2) the number of items (questions) on that test, and 3) the item factor (k) that you would like to change the test’s length. If you want to change the test by making it longer, you’ll want to make the item factor (k) value greater than 1.0. If you want to make the test shorter, you’ll want to make the item factor (k) value less than 1.0 but greater than 0.00 (zero). The calculator for PART I provides an estimate for reliability of the new test you will construct.

PART II: You can also use the Spearman-Brown Prophecy Formula to estimate test length changes. Depending on the initial reliability of a test, you can estimate how much longer (or shorter) the test should be in order to provide you with a desired reliability estimate.

[pic]

-----------------------

1

2

3

4

5

True score

Error score

Observed score for Student 1

Observed score for Student 2

75%

25%

50%

50%

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download