Name of Test: - University of Alberta



Test Review: Peabody Picture Vocabulary Test-III (PPVT-III)

|Name of Test: Peabody Picture Vocabulary Test –III (PPVT-III) |

|Author(s): Dunn, Lloyd .M. and Dunn, Leota. M. |

|Publisher/Year: 1959, 1981, and 1997 |

|Forms: A and B |

|Age Range: 2 years, 6 months to 90+ years |

|Norming Sample: |

| |

|The sample was collected in 1995 and 1996. Test examiners included speech-language pathologists, school psychologists, educational diagnosticians, and graduate students who received supervision |

|from field administrators. A counterbalanced presentation of Form A and Form B was done. Concurrent reliability and validity studies were conducted. Test item selection, stimulus word selection,|

|picture plate preparation, field testing, final item selection, bias review, and national tryout are described in the examiner’s manual as well as the technical reference manual. |

| |

|Total Number: 2, 725 |

|Number and Age: 25 age groups took part (ages 2 years, 6 months to range 61-90+ years) with 100 participants in each age group. 6-month intervals from 2 years, 6 months to 6 years, 11 months |

|were used, 12 month intervals from 7 years, 0 months to 16 years were used, and multi-year intervals were used for adult ages. |

|Location: 240 sites nationwide |

|Demographics: Authors used a stratified sample in each age group, by gender, race/ethnicity, geographic region, and SES by parent education level. |

|Rural/Urban: The sample was “balanced across central cities, suburban and small town communities, and rural areas” (Dunn & Dunn, 1997, p. 41). |

|SES: by parent education level |

|Other: The special education category included: learning disabled, speech impaired, mentally retarded, hearing impaired, and gifted in comparable percentages to the U.S. population. |

|Summary Prepared By: Eleanor Stewart May 2007 |

|Test Description/Overview: |

| |

|The test consists of the examiner’s manual, a norms booklet, performance record forms, a folding case that contains the test book, and the technical manual (Williams & Wang, 1997) containing |

|extensive information regarding normative and statistical properties. A software package (MAC or PC) can be purchased separately. |

| |

|The new edition contains the following changes: more test items and format (17 sets of 12 items in each form), extended norms, new pictures that are contemporary, new packaging to improve |

|portability, and new national norms that extend from 2 years, 6 months to 90+ years. |

| |

|Comment: Vocabulary knowledge is a well-known indicator of language ability. However, I found it surprising that the authors of both the examiner’s manual (Dunn and Dunn) and the technical |

|reference manual (Williams & Wang) did not include any discussion of the theoretical perspective on vocabulary knowledge or development. |

| |

|Purpose of Test: The purpose of this test is to screen verbal ability (Dunn & Dunn, 1997, p. x) and assess receptive vocabulary knowledge and comprehension of spoken English. The authors clearly|

|state that the test can only measure one aspect of linguistic function so test scores should not be over-interpreted. In addition to these broadly stated purposes, the authors list 11 specific |

|purposes (p. 3 of the examiner’s manual). Included are such purposes as: “establishing rapport, testing preschool children, and screening for both giftedness and mental retardation” (p. 3). The |

|authors also caution that while the PPVT-III is appealing in its simplicity, this also can be misused if results are over-generalized in interpretation of overall linguistic ability. |

| |

|Areas Tested: receptive vocabulary |

|Listening Lexical |

| |

|Who can Administer: No professional group is identified. The authors state that since the test is simple to administer and score, examiners need only be familiar and have read the examiner’s |

|manual. An understanding of psychometrics is stated as desirable. Potential examiners are encouraged to practice the test administration. However, when addressing interpretation, the authors |

|state that examiners who perform this task should have completed formal course work in psychological testing and statistics. They emphasize the “dangers of labeling individuals and making |

|crucial decisions about their lives on the basis of limited and imperfect data” (Dunn & Dunn, 1997, p. 5). |

| |

|Administration Time: Administration time is estimated to be 11 to 12 minutes, “however time requirements will vary among examinees” (Dunn & Dunn, 1997, p. 7). |

|Test Administration (General and Subtests): |

| |

|The examiner is directed to set up the easel so that the student views the test plate with four pictures and the examiner views the instructions on the reverse page. Figure 2.1 depicts the test |

|set-up. Age calculation is specified with no extra days considered (example provided). |

| |

|Administration and scoring is straightforward and has remained unchanged since the initial version in 1959. Four training items are provided to orient the examinee to the task. The examiner is |

|encouraged to teach the examinee how to respond and is free to do so in whatever manner is suitable for the situation in order to ensure that the examinee is prepared. |

| |

|Start points and basal and ceiling sets are specified. Once a set is begun, it must be continued until all 12 items have been presented. The examiner must establish a basal set in which the |

|examinee correctly identifies all or all but one test items. The instructions for administration are stated in 16 detailed points on pages 11 through 13. For example, allowable verbal |

|instructions are listed: “Put your finger on ___, Show me ____, and Find _____.” A critical range is sought in order to “provide maximum discrimination among individuals of similar ability” |

|(Dunn & Dunn, 1997, p. 15). In order to decide on a start point, detailed instructions are provided in the examiner’s manual. The idea would be to approximate the examinee’s ability by making |

|sure that the items are not too easy or too difficult. The authors state, “on average, an individual takes about five sets of 12 items each, or 60 test items out of 204 (30) percent that most |

|closely approximates his or her ability” (p. 15). A ceiling set is established when the highest set of items administered contains 8 or more errors. |

| |

|There are four black and white line drawings per page. The examiner says aloud the stimulus word that the examinee identifies by pointing or saying the number of the test plate. Responses are |

|marked by the test plate number. Target responses are printed in the test form. A column to the right is marked “E” for error and the examiner can draw an oblique line through the letter to |

|indicate an error. Basal and ceiling rules are printed on the test record as reminders. No coaching is permitted but the examiner can encourage by saying such phrases as “Point to the one you |

|think it might be” or “It’s all right to guess.” Praise is allowed and some examples are provided in Point # 10, (Dunn & Dunn, 1997, p. 14). The raw score is calculated by subtracting the total |

|number of errors above the critical range from the ceiling item. It is assumed that all items below the basal were correct and those above the ceiling set were incorrect. An example is given. |

|This newest edition has changed the rules for calculating the raw score in order to reduce errors. The authors state that the new rules are “more generous than those used in previous PPVT |

|editions” (p. 19). |

|Test Interpretation: |

| |

|Once a raw score is calculated, the examiner can convert the score to standardized scores to compare the examinee’s performance to that of the norm group. Section C of the examiner’s manual |

|discusses how to convert the raw score and the meaning of each of the standardized scores. Examples of each type of standardized score are provided. SEMs and confidence bands are discussed along|

|with the equations needed. Examples are once again provided. Section D provides practice scoring exercises (Dunn & Dunn, 1997, pp. 26-35). The authors are confident in the reliability of the |

|PPVT-III and state, “the reliability of the PPVT-III is so high that the chances are great that an individual’s obtained score and true score are very similar. Furthermore, reliability |

|confidence bands are not rectilinear; they are in the form of the normal probability curve. Therefore, it follows that the obtained score is the best single estimate of a person’s true score” |

|(bold in manual, p. 25). |

| |

|The examiner’s manual does not provide any other type of discussion relating to interpreting the results. |

|Standardization: Age equivalent scores* Grade equivalent scores Percentiles Standard scores Stanines |

|Other Normal curve equivalent |

| |

|*The authors of the technical manual point out that age equivalent scores are only useful during the period of vocabulary growth that extends to age 22 years. For this reason, age equivalency is|

|offered for the range 1 year, 9 months to 22 years. Data are available for the range above 2 years, 6 months. Below age 2 years, 6 months, age equivalents are extrapolated and therefore are to |

|be used with caution. |

| |

|Comment: Other measures of emerging vocabulary for infants and toddlers are available and widely used. For example, the McArthur-Bates Inventory of Communication Development is based on |

|extensive research with normative data. Such measures rely on parent report which has been shown to be reliable and valid for the population of infants and toddlers. |

| |

|***Comment: I consulted the technical manual for the following information regarding reliability and validity (Williams & Wang , 1997). |

| |

|Reliability: |

|Internal consistency of items: High reliabilities (.90 and above) were reported for the 25 age groups of the norm sample for both forms with a median reliability of .95. The authors note, |

|“Because the computation of alpha reliabilities was based on the total test length , and the unadministered items were imputed by Rasch simulation, spurious effects could have been introduced. |

|Users should be cautious in interpreting these coefficients” (Williams & Wang , 1997, p. 21). |

| |

|Split half reliability: Using the Spearman-Bowman formula, the full-test length reliability was calculated. The results indicate reliabilities ranging from .86 to .97 for the standardization age|

|groups for both forms. The authors state that the split-half reliabilities were “slightly lower than the alphas. This might indicate that the spurious effects introduced in the procedure for |

|deriving the alpha reliabilities were not introduced in the Rasch split-half procedure” (Williams & Wang , 1997, p. 21). Table 4.1 presents alpha and split-half reliabilities for the entire |

|standardization sample. |

| |

|Test-retest: 226 randomly selected participants in four age groups were retested: 2 years, 6 months to 5 years, 11 months (n=67), 6 years, 0 months to 10 years, 11 months (n=70), 12 years, 0 |

|months to 17years, 11 months (n=51), and 26 years, 0 months to 57 years, 11 months (n=38). The interval ranged from 8 to 203 days with an average interval of 42 days. Characteristics of the |

|sample are described by gender, race/ethnicity, SES, and region. The authors state, “Almost all retesting was done by the examiner who had administered the PPVT-III the first time” (Williams & |

|Wang , 1997, p. 22). Corrected coefficients were reported between .91 to .94 “with very little or no difference in magnitude between the two forms.” Additionally, the average values “suggest |

|little or no practice effect” (p. 23). |

| |

|Inter-rater: No information is present in the technical manual. |

| |

|Comment: I wonder if they decided it was not necessary given how straightforward the task and scoring are. |

| |

|Other: Between the alternate forms: Both test forms were administered to participants in the standardization sample. The results demonstrate high correlations ranging from .88 to .96 with a |

|median correlation of .94. Thus, the forms are considered to be parallel. |

| |

|SEMs, reported based on alternate forms reliability, “list an SEM of about 4 standard score units across the age range for the 68 percent level, based on a median reliability of .94. For the 90 |

|and 95 percent level, the values are 7 and 8 standard score units, respectively, across the age range” (Williams & Wang , 1997, p. 24). The procedure for calculating the confidence interval is |

|outlined. |

| |

|Chapter 5, “Equivalency” addresses the equivalence of the PPVT-R and PPVT-III and the two forms, IIIA and IIIB. The authors provide this information as they acknowledge that the test is used |

|extensively in longitudinal research. The evidence for equivalence of the forms is also presented. |

| |

|In order to demonstrate the equivalence of the two editions, an equating study was conducted with 193 participants, age 4 years, 5 months through 16 years, 9 months. The sample characteristics |

|are presented to highlight representativeness. The authors note that since both editions used the Rasch model, the study proceeded with the Rasch common-person design for the equating. The |

|mathematical model is presented on page 25 (see the copy on file for reference). The results are reported in terms of means and standard deviations for both editions. Both editions have similar |

|standard deviations as predicted by the Rasch model theory. The authors state that “the difference of the means is about 6.148” (Williams & Wang , 1997, p. 25). A scatterplot of the W-ability |

|scores is presented in Table 5.1. Correlation between scores of the two tests is .97. The authors present an equation that allows users to convert raw scores between the PPVT-R and the PPVT-III.|

|Raw scores can then be converted to standardized scores. An example is provided. Using standard scores, the authors demonstrate overlap between the two editions for three age groups. The |

|corrected correlations range from .83 to .89. They state, “as expected, these correlations are much lower than the W-ability score correlation of .97 between the two editions” (p. 26). |

| |

|In terms of determining relative difficulty, the authors addressed the issue by performing additional analyses. Using the equating sample, means were calculated as percentage correct. The |

|results indicated mean percent correct as follows: “55.2 on PPVT-R Form L, and 59.8 and 59.6 on PPVT-III Forms IIIA and IIIB” (Williams & Wang , 1997, p.27). This means that the participants in |

|the equating study were able to correctly answer 5% more items on the PPVT-III than the PPVT-R. The authors state, “This is consistent with one of the goals of the PPVT-III development plan to |

|increase the sensitivity of the scale to young children by adding easier items. Therefore, for younger examinees (aged 4 to 16), the PPVT- III might be slightly easier than PPVT-R. Because |

|another goal was to improve the adult norms by adding difficulty items, it is possible that the PPVT-III might be slightly harder than PPVT-R for older examinees (adult age range). More research|

|is needed to confirm these hypotheses” (p. 27). |

| |

|A classical test model was used to address the equivalency of the two editions. Means and standard deviations are reported for raw scores for Form IIIA and IIIB. Additionally, correlations and |

|SEMs for the standardization sample as well as reliabilities are presented in table form. Viewing these data, the authors state that the forms are closely matched for all ages providing evidence|

|that both forms are indeed parallel. |

| |

|The PPVT-III was also correlated with a third measure, the Expressive Vocabulary Test (also reviewed in this test review series). All participants in the standardization sample also took the |

|EVT. The results, presented in Table 5.4 (Williams & Wang , 1997), show that there are similarities between the two measures in terms of correlation coefficients. |

| |

|Parallel content was also addressed in the technical manual. The content areas, all twenty of them, are presented for both forms of the PPVT-III to show comparable numbers in each content area. |

|Validity: |

| |

|Content: Item development information is provided in Chapters 1 and 2 of the technical manual. In the section on content validity in Chapter 6, “Validity”, the authors briefly state that the |

|stimulus words were selected to avoid any word that tapped specialized knowledge (i.e., homonyms, words derived from another language). Distractor words were limited to those that would not |

|confuse the examinee (e.g., avoiding words that sound alike). |

| |

|Criterion Prediction Validity: Four studies were conducted during the standardization process. |

| |

|Correlations with Cognitive Ability: |

|WISC-III: administered in counter-balanced order to 41 children, ages 7 years, 11 months through 14 years, 4 months. There was an eight-day average interval between tests. Sample characteristics|

|are provided. |

|Kaufman Adolescent and Adult Intelligence Test: counterbalanced order, 28 adolescents between 13 years, 0 months and 17 years, 8 months, and was administered on same day or within a day. Sample |

|characteristics are provided. |

|Kaufman Brief Intelligence Test: counterbalanced order, average 26-day interval (range same day to 281 days). Sample characteristics are provided. |

|The authors summarize the findings in Table 6.1. The results show high correlations with WISC-III Verbal Ability IQ (.91 and .92), KAIT Crystallized IQ (.87 and .91), and K-BIT Vocabulary score |

|(.82 and .80). Where nonverbal ability was assessed by a test, lower correlations resulted as expected (ranging from .62 for K-BIT Matrices to .85 with KAIT Fluid IQ). Global scores also |

|evidenced strong correlations as follows: .90 and .90 for WISC-III Full Scale IQ, .85 and .91 for KAIT Composite IQ, and lower for K-BIT at .78 and .76. The authors report that these results are|

|similar to those previously obtained from comparisons made with the PPVT-R Forms L and M. |

| |

|Correlations with Oral Language: |

| |

|The special population sample was matched with the norm sample for age, gender, race/ethnicity, SES, education level, and region. T-tests for paired samples were used to compare test results. |

|Various examiners across the U.S. collected test data. |

| |

|Special populations (sample characteristics are provided for both clinical and control groups): |

|Speech impairment: n=50 with speech (not language) problems, age 5 years, 7 months to 13 years, 5 months, sample characteristics by gender, race/ethnicity, SES (parents’ education), region. |

|Results indicate no significant difference between performance of the groups. |

|Language delay: n=39 students with identified language delays who were receiving special services, ages 3 years, 0 months to 8 years, 0 months with a control group (3 years, 8 months through 7 |

|years, 11 months). Sample characteristics are provided. Significant differences between groups evidenced at p ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download