Reliability and Validity



Reliability and Validity

How do we use the words reliability and validity in everyday life? What do these words mean? Is there a difference between them or do they mean the same thing?

RELIABILITY

Reliability refers to the consistency of a measure. A measure is considered reliable if we get the same result repeatedly. A research method is considered reliable if we can repeat it and get the same results.

Coolican (1994) pointed out

“Any measure we use in life should be reliable, otherwise it’s useless. You wouldn’t want your car speedometer or a thermometer to give you different readings for the same values of different occasions. This applies to psychological measures as much as any other.”

A ruler for example would be reliable, as the results could be replicated time after time and the same results would be gained (consistency). If you measure the length of a book on Monday, and your ruler tells you its 25 cm long, it will still tell you its 25cm long on Friday.

An IQ test however may be unreliable, if a person sits the test on Monday and scores 140, and then sits the same test on Friday and scores 90. Even though it can be replicated, it shows low consistency and therefore is an unreliable test.

Some research methods (such as laboratory studies) have high reliability as they can be replicated and the results checked for consistency. Other research methods however (such as case studies and interviews) have lower reliability as they are difficult or impossible to replicate. As they cannot be replicated, we cannot check how consistent the results are.

How can we measure reliability?

There are several different ways to estimate or improve reliability depending on the research method used. Match the method of estimating reliability to the description (pg165)

|Test-Retest reliability | |If the measure depends upon interpretation | |If the results in the two halves are |

| | |of behaviour, we can compare the results | |similar, we can assume the test is |

| | |from two or more raters. | |reliable |

|Split Half Reliability | |Splitting a test into two halves, and | |If the results on the two tests are |

| | |comparing the scores in both halves | |similar, we can assume the test is |

| | | | |reliable |

|Inter-Rater reliability | |The measure is administered to the same | |If there is high agreement between the |

| | |group of people twice | |raters, the measure is reliable |

We will look in more detail of the specific reliability of various research methods throughout the course.

VALIDITY

A study may be high in reliability, but the results may still be meaningless if we don’t have validity. Validity is the extent to which a test measures what it claims to measure.

There are three main aspects of validity that we investigate in psychological research Control, Realism and Generalisability.(p138)

Control

This refers to how well the experimenter has controlled the experimental situation. Control is important as without it, researchers can not establish cause and effect relationships. In other words, without control, we cannot state that it was the independent variable (IV) which caused the change in the dependant variable (DV). The result could have been caused by another variable, called an extraneous variable (EV). These are variables which have not been controlled by the experimenter, and which may affect the DV (see below).

Realism

The whole point of psychological research is to provide information about how people behave in real life. If an experiment is too controlled, or the situation too artificial, participants may act differently than they would in real life. Therefore, the results may lack validity.

The term mundane realism is used to refer to how well an experiment reflects real life. If an experimental situation has high mundane realism (in other words, it reflects real life) it would be high in _______________________ validity

Can you see a potential conflict between control and realism?

Generalisability

The aim of psychological research is to produce results which can then be generalised beyond the setting of the experiment. If an experiment is lacking in realism we will be unable to generalise. However, even if an experiment is high in realism, we still may not be able to generalise.

For example, the participants may be all from a small group of similar people, meaning low population validity. Many experiments use white, middle class American college students as participants. What issues with generalisability can you think of?

TYPES OF VALIDITY

Experimental Validity: is the study really measuring what it intends?

INTERNAL VALIDITY refers to things that happen “inside” the study. Internal validity is concerned with whether we can be certain that it was the IV which caused the change in the DV. If aspects of the experimental situation lack validity, the results of the study are meaningless and we can make no meaningful conclusions from them.

– Internal validity can be affected by a lack of mundane realism. This could lead the participants to act in a way which is unnatural, thus making the results less valid.

– Internal validity can also be affected by extraneous variables (see below).

|EXTRANEOUS VARIABLE |HOW DOES IT AFFECT VALIDITY? |HOW CAN IT BE OVERCOME? |

|Situational variables (anything to do with the |Something about the situation of the experiment could act as|Situational variables can be overcome by |

|environment of the experiment): time of day, |an EV if it has an effect on the DV. For example, poor |the use of standardised procedures which |

|temperature, noise levels etc |lighting could affect participants performance on a memory |ensure that all participants are tested |

| |test |under the same conditions. |

|Participants variables (anything to do with |It may be that the differences between the participants |Participant variables can be completely |

|differences in the participants): age, gender, |cause the change in the DV. For example, one group may |removed by using a repeated measures design|

|intelligence, skill, past experience, motivation, |perform better on a memory test than another because they |(the same participants are used in each |

|education etc. |are younger, or more motivated. |condition). Matched pairs (participants in |

| | |each group are matched) could also be used.|

|Investigator effects: this refers to how the |Leading questions from the experimenter may consciously or |Investigator effects can be overcome by |

|behaviour and language of the experimenter may |unconsciously alter how the participant responds. For |using a double blind technique. This is |

|influence the behaviour of the participants. The way |example, the experimenter may provide verbal or non verbal |when the person who carries out the |

|in which an experimenter asks a question might act as|encouragement when the participant behaves in a way which |research is not the person who designed it.|

|a cue for the participant. Also known as experimenter|supports the hypothesis. | |

|bias | | |

|Demand characteristics: participants are often |The structure of the experiment could lead the participant |When designing a study, it is important to |

|searching for cues as to how to behave in an |to guess the aim of the study. For example, participants may|try and create a situation where the |

|experiment. There could be something about the |perform a memory test, be made to exercise, and then given |participants will not be able to guess what|

|experimental situation or the behaviour of the |another memory test. This may lead the participants to guess|the aim of the study is. |

|experimenter (see investigator effects) which |that the study is about the effect of exercise on memory, | |

|communicates to the participant what is “demanded” of|which may cause them to change their behaviour | |

|them. | | |

|Participant effects: participants are aware that they|They may be overly helpful and want to please the |Again, by designing a study so that the |

|are in an experiment, and so may behave unnaturally. |experimenter. This leads to artificial behaviour. |participants cannot guess the aims, |

| |Alternatively, they may decide to go against the |participant effects can be reduced. |

| |experimenter’s aims and deliberately act in a way which | |

| |spoils the experiment. This is the “screw you” effect. | |

TASKS

A. A researcher wants to test whether people’s memories are better in the evening or in the morning. He gives a group of participants a memory test at 9am, and another test at 9pm. The researcher discovers that they scored higher in the morning. He concludes therefore that people’s memories are better in the morning.

Name the IV:_________________________ Name the DV:____________________________

Name any extraneous variables that could have altered the DV?

How could these EVs have been controlled?

B. A psychologist is interested in the effect of age on how well people cope under stressful conditions. Two groups of participants are used, one group are under 25, and another group are over 50. Both groups are asked to sit a difficult exam under timed conditions. After the exam, all of the participants are given a questionnaire to assess how much stress they felt. The older people reported more stress.

Name the IV:__________________________ Name the DV:___________________________

Name any extraneous variables that could have altered the DV?

How could these EVs have been controlled?

EXTERNAL VALIDITY

Read pg 165-166 and fill in the gaps

Assuming that our experiment has high ____________________ validity (that we can be sure that the DV was changed by the _____ and not an _____), we need to assess how well our results can be _________________________ beyond the experimental setting. Two issues here are how much ecological validity the study has, and whether it has population validity.

Ecological validity refers to how well the experimental situation reflects _________ __________, and therefore how well the results can be __________________________ to other places and settings. Ecological validity can be assessed by looking at the ________________ of the experiment. For example, a field experiment takes place in the participant’s own environment, which would lead to ____________ ecological validity, as it is more naturalistic than a _____________________ experiment. _____________ _______________ on the other hand looks at the tasks that the participants have to do and how realistic these are. If the things that the participants are asked to do in the experiment are artificial and contrived, the study would be said to have ______ _______________ ________________ and therefore _______ ecological validity.

Population validity refers to how well the ____________________ used in the experiment represent the general population. Many psychological studies use white, middle class male American students. Can we legitimately take the results from these participants and apply them to other nationalities, _______________, _______, or even different historical periods?

Validity of psychological measures: how valid is the tool we use to measure?

When designing an experiment in psychology, we will need to decide upon a way to measure our variables. If what we are measuring is height, weight, or time for example we could use a tape measure, scales or stopwatch respectively. However, what about if we want to measure something like self esteem, intelligence, conformity or linguistic ability? These psychological concepts need to be turned into numbers that can be measured and compared. The term for this is operationalisation.

To create a measure, we first must define what it is we are measuring. For example, with intelligence, we need to decide what we mean by intelligence and what sort of things we wish to measure. We then decide upon a way to measure this (operationalising).

Examples of the types of measures used in psychology are:

– A test which is given to the participants which produces a score

– A questionnaire or interview

– A checklist where participant’s behaviour can be recorded

– A biological response (e.g. body temperature, hormone levels)

A possible issue with this is that by breaking down a concept into a numerical form, we lose validity and we end up not measuring what we intended. However, there are a number of ways we can assess the validity of a measure.

|Content Validity |Does the method used actually seem to measure what you intended? For example, does an IQ test actually measure levels of |

| |intelligence, or is it measuring ability to solve puzzles? |

| |To ensure content validity, a panel of experts (on IQ for example) may be asked to assess the measure for validity. |

|Concurrent validity |How well does the measure agree with existing measures? For example, does our IQ test agree with established tests of IQ? |

| |We can ensure concurrent validity by testing participant with both the new test and the established test. If our test has |

| |concurrent validity, there should be high agreement between the scores on both measures. |

|Construct validity |Is the method actually measuring all parts of what we are aiming to test? For example, if we use a maths test to test |

| |intelligence, we are missing out on other factors involved such as linguistic ability or spatial awareness. |

| |To maintain construct validity, we need to define what it is we are aiming to measure, and ensure that all parts of that |

| |definition are being measured. |

|Predictive validity |Is our measure associated with future behaviour? For example, if someone scores high on our IQ test, we would expect them to |

| |perform well in GCSE exams, or do well in their career. This is similar to concurrent validity. |

| |We can investigate predictive validity by following up our participants to see if future performance is similar to performance on |

| |our measure. |

TASKS

C. A researcher is looking into the effect of alcohol consumption on self esteem. He develops a questionnaire to assess people’s attitudes towards themselves. How could you see if this questionnaire had content validity?

D. An experimenter creates a questionnaire that measures homophobic attitudes. How would you see if this test had construct validity?

E. A researcher wants to see if people who live healthy lifestyles have better romantic relationships. He develops a checklist of what constitutes healthy behaviour. How do we know if this checklist has concurrent validity?

-----------------------

When assessing the reliability of a study, we generally need to ask two questions

1) Can the study be replicated?

2) If so, will the results be consistent?

Predictive validity

Mundane realism

Content Validity

Construct validity

Concurrent validity

Population validity

Ecological validity

Situational Variables

Participant Variables

Investigator effects

Demand characteristics

Participant effects

Extraneous variables

External Validity

Internal Validity

Validity of psychological measures

Experimental Validity

Types of Validity

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download