Beyond Zip code analyses:



Beyond Zip code analyses:

What good measurement has to offer and how it can enhance the instructional delivery to all students.

Dave Heistad and Rick Spicuzza

Minneapolis Public Schools

Research, Evaluation, and Assessment

AERA 2003

Chicago, IL.

Beyond Zip code analyses: What good measurement has to offer and how it can enhance the instructional delivery to all students?

The perils of urban education are well documented. The staunch rhetoric about accountability and poor performance of public education (especially in urban centers) is often misplaced, yet it is at the forefront of local, state, and federal agendas[1]. Often overlooked in all of the hyperbole is the fact that all assessments and measures of educational progress of students are far from equal. The purpose of this paper is to examine the essential criteria of what constitutes good measurement systems that allow for accurate, valid, and useful information to families, teachers, and communities about the level and progress of its students.

Unfortunately, not all assessment systems are created equal, yet teachers and administrators are often left with the end product (e.g., test scores) on which to make critical instructional and evaluative judgments. In particular, the major challenge for states attempting to respond to federal legislation, (No Child Left Behind, NCLB) is that is provides only one level of information, percent of students proficient and for different groups (cohorts) of students. Thus, due to numerous variables the ability to truly ferret out sites that are making the grade or beating the odds will go unnoticed. Therefore, this presentation will briefly describe the common pitfalls associated with poor measurement systems (i.e., lack of reliable and valid scores) as well as the unfortunate mistakes made when reporting aggregate (i.e., grade, school, or system) test scores on student performance. The two most common flaws of most accountability systems are (a) reporting only mean or percent of students at a specified level of performance, and (b) reporting cross-cohort performance (e.g., this year’s third grade students compared to last year’s third grade students). Finally, as an illustrative example of how these common pitfalls can be minimized the Minneapolis Public School (MPS) model will be described.

Despite the potential benefit of increased accountability, and stated advocacy for low performing students examining the percent of students identified as proficient falls short in many ways. First, most states do not have, nor will they create, a vertically equated scale to allow true comparisons of student longitudinal progress. Rather, cohort models of analysis will prevail. Second, cross cohort indicators are, at best, weak indicators of school effectiveness and should be supplemented with indicators that track student performance over time. In contrast to the state of Minnesota’s model, the Minneapolis Public Schools assess grades 2-7, and 9 annually on both reading and math assessments reporting national user growth norms (NWEA item bank, 2000). In this way, each child’s growth rates can be compared to national expectations for level of performance and expected growth. Third, changes in school programs (e.g., magnet programs, ELL site, or busing routes) can often lead to “bounce” in test scores that obscure actual gains made by students versus a previous cohort average score.

Critique of mean performance reporting.

Meyer (1996) describes four primary reasons why reporting mean performance only (aggregated average) is insufficient to make judgements about the efficacy of instructional programs, or to hold systems accountable[2]. First, aggregated averages are contaminated by previous experience. Under the state of Minnesota (and many other states) aggregated means reported at a specific grade level contain all the learning and instruction of the students’ previous years. In the case of Minnesota where grade three means are reported, the tests scores actually contain eight years of information (before formal schooling, pre-school, Kindergarten, first grade and second grade). Who should be held accountable? Grade 3 teachers? Moreover, how is a school or district able to use this information (Grade 3 mean performance) to attribute success or failure of its programming or overall system based on this level of information? When there is an absence of entry level information about students, it is highly probable that programs making excellent growth, in a relatively short period of time (Grade 1 and 2), will be evaluated incorrectly. These types of judgements severely handicap program evaluation set up to identify programs that work.

Second, single reporting of mean performance ignores the importance of student achievement growth. Students enter schools and school systems at varying levels of competency. Thus, reporting mean performance only masks the “gain” or “growth” a student has made or the system has been able to elicit. Yet, school systems are challenged to accept students at their present level of performance, engaging them in the educational process by meeting their individual instructional needs, and then moving students to higher levels of performance should be the primary role of our educational system. If this accurately describes community expectations, then it possible to posit that the community will want to examine actual student growth, in order to render judgements about the merits of school systems, programs, or instructional techniques. Under the current Title I legislation and proposed State of MN accountability system there is no valid attempt to examine student growth. The state administered grade 3 and 5 MCA exams have no pre-test and cannot account for initial competency levels that students exhibit. Further, the third- and fifth-grade MCA scale scores are not configured using the same scale. In fact, the test scale scores would need to be vertically equated (made similar) so that individual student growth, between grades 3 and 5 could be reported. A wise thing for the state department to complete, but is not available at this time.

Third, mean performance does not account for mobility. Any school or school system in which large sets of students move in and out of the system (e.g., migrant workers, seasonal workers and families, urban settings) disrupt an accountability system. Students moving into a site need to be assigned to a school location but may not have fully benefited from or been impaired by the instruction at the present site. Furthermore, these students may posses positive or negative residual effects based on the system they left—to whom should these students’ present level of performance be attributed? Again, it’s hard to localize the instructional effect to hold a school or district accountable. It is important that the accountability systems address these difficult features society places on schools, but true strides for reaching social justice will not be met unless the appropriate information is provided on which to base decisions.

Finally, there are events and certain characteristics that are beyond the purview of a school that will impact the mean performance of a student. It is widely demonstrated that effects of poverty, location, race or ethnicity, Language Proficiency, and enrollment in special education will produce different levels of mean performance. Given these characteristics are not randomly distributed across the state, or even within districts, certain buildings or locations will have a mean level performance more directly impacted by these variables than by the instructional programs they deliver.

One of these correlated variables is the percentage of students receiving free or reduced price lunch across schools or districts. Correlations between poverty indicators and average test scores are substantial at the individual student level and at the aggregate classroom, school or district level as exemplified below with a graph of 3rd grade mean scores from the 2000 Minnesota Comprehensive Test of Reading for every school in Minnesota with at least 20 students in third grade (n= 510).

The State of Minnesota has proposed a cut-score of 1420 on the MCA as “proficient” for the purposes of the No Child Left Behind definition of adequate yearly progress. From the above graph it is apparent that no school with at least 76% poverty had an average reading score above 1400; while no school with less than 5% poverty had an average score less than 1400. The overall correlation between percent poverty (i.e. students receiving free or reduced price lunch) and average test score was -.818 in this analysis.

The MPS assessment system offers an expansive assessment model that is based, in part on a single invariant and developmental scale[3] from early grades through college level performance and is built on the foundation of the Northwest Evaluation Association’s (NWEA) model. MPS has been able to select domain-specific and calibrated items to develop assessment measures that are more aligned with and sensitive to local curricula and state standards. In this way, what is taught and what is assessed are closely linked and monitored. Further, by using a system that is tied to a developmental scale, each testing session and domain assessed can be reported in absolute terms, describing a student’s current level proficiency and changes over time and in relative terms referencing critical benchmark standards. In this way, reporting about student- and system-wide strengths and weaknesses can be made within and across years, within and across grades. Finally, because students are “fit” within a certain level of the same assessment, a wider breadth of material that is deemed “just right[4]” for a student is available on which to base judgements of his/her proficiency. This broader assessment increases our confidence in interpreting student competencies relative to national norms and community standards. Moreover, there is increased information that can be accessed to examine both the strengths and weaknesses of instructional programs and student achievement. Each student takes a general level of the assessment predicted by his/her past performance and each sub-domain assesses is equated within the assessment so that relative strengths and weaknesses can be easily detected for individual and group reporting.[5]

Finally, none of this information would be useful or valid if it were imprecise in general or associated with high rates of error for students at the top or bottom ends of the distribution (which is true of most “on grade level” assessments). It is the precision of student scores captured on student test performance that is nearly identical for low, middle, and high level students. This precision is achieved, in part, by administering multiple levels of the same assessment content to capture measures of student proficiency[6]. This increased accuracy in assessment should allow meaningful instructional groupings within the curricula for all students.

To this end, NWEA in collaboration with Allentown, PA and now other districts who use their assessment instrument have created an additional document titled, “The Learning Continuum.” The Learning Continuum is a supporting document developed to help bridge the gap between what is learned about a student during assessment and what is done on a daily basis through instruction. The underlying premise is that student’ skills and overall competencies within content domains are developmental. Thus, by assessing students performance on a developmental scale provides evidence about where a student is performing along a developmental continuum within reading or math. In this way, educators receive test score information and then are able to translate this into specific instructional content that a student is likely to know and have mastered. Further, evidence would suggest that educators also know which areas are yet to be mastered and can directly tie instructional delivery toward these next stages of learning.

For example, each content area (e.g., reading, mathematics, Language Arts and Science) has a corresponding section within the Learning Continuum document. Within each section test scores are grouped into ranges of ten scale-score units. Within each scale-score band, teachers are able locate information about specific concepts, content specific vocabulary, symbols, and functions within mathematics as an example, that a student should know, having obtained a specific test score. Teachers should then be able to examine the precursor skills and content that a student should understand well, as well as contrast this with the type of skills or competencies that are likely to develop next with assistance. Further, this information about individual students may help to influence the pace of instruction, and the creation and maintenance of instructional groupings based on instructional needs. The primary goal is to maximize the growth (i.e., skills and competencies) for each and every student.

In the introduction of this document NWEA articulates the following uses of the Learning Continuum document:

• Material Selection

• Sharing of Resources

• Special Programming needs and specialized instructional goals

• Curriculum Planning

• School Improvement Planning

• Monitoring of Student Progress

• Individual Educational Plans

• Parent Conferencing

It is these articulated uses that provide the useful integration of test score information and instructional methodologies that should enhance teachers’ ability to best meet the educational needs of all students. This presentation will describe and share some of the instructional power of implementing an assessment system that relies on a unitary developmental scale on which student performance can be mapped and monitored longitudinally as a student is serially assessed across time. Examples will be presented of individual student growth charts, and classroom reports on students’ test performance within a particular year and across time are used differentially for instructional planning to meet the needs of individual students and whole classrooms. The core elements for training teachers in “data-based” classroom decision making will be presented in the form of a course syllabus – one element of an alternative compensation model just adopted by the MPS teachers’ union. In sum, it is these unique features of the NWEA and MPS assessment model that increase the probability that teachers will make the appropriate and necessary instructional adaptations to meet the wide array of student needs on a daily basis.

The use of a measurement system that monitors growth of the same students, across time, is much more valid methodology for making inferences about instructional effectiveness as wells as whether schools are making “adequate progress” towards high standards of performance. The NWEA assessment system provides a carefully crafted measurement system that provides both standards-referenced information as well as level and gain information that is norm-referenced. This powerful tandem of level and growth information provides the essential criteria that is necessary for creating and evaluating individual and group growth curves. Dissaggregated growth curves can provide meaningful feedback on the lessening of gaps between current performance and high standards of academic success (see Appendix A). Unfortunately, current guidelines around No Child Left Behind legislation do not encourage the use of student level longitudinal data to assess gap reduction and overall school success and instead depends on indicators of mean performance to assess change over time (e.g., percent of students). Minneapolis Public Schools is working closely with the State of Minnesota to establish a gains-based system for identifying “schools that beat the odds.” Minnesota has included individual student level gain indicators into the state proposal for assessing Adequate Yearly Progress and proposes to vertically link assessments from grade 2 though grade eight. Once the measurement system is in place for yearly assessment, growth curves could then be generated on continuously enrolled students. This information would allow evaluation of schools and/or subgroups within a school (or district) that are making accelerated progress toward established standards of grade level proficiency. In cases where gaps are not being reduced, further investigation would be warranted.

While efforts to provide teachers, parents, and students instructionally relevant feedback have been generally successful in Minneapolis, there is still much to be done to insure the usefulness of assessment information. Next steps include the following:

• Identifying and evaluating strategies for diagnosis and identification of effective instructional strategies.

• Staff development training in the use of data for instructional decision making through a pay for performance model where teachers gain direct access to performance data and can view individual and group growth curves and diagnostic profiles in a web-based data mart.

• Studies of schools and teachers who “beat the odds” with reading and math gains greater than predicted by prior test scores, student demographics, and program participation in LEP or Special Education service.

• Emphasis on more immediate feedback through computerized testing and testing students as they enter so as to include as many students in growth analysis as possible.

• Development appropriate metrics for assessing growth in high school on multiple measures relevant to individual small High School learning environments.

• Continued establishment of links between “high stakes” tests used for accountability under the provisions of No Child Left Behind and local assessment using the Northwest Achievement Levels Tests; and

• Use of formative assessments on a more frequent basis that drive instructional changes and are tied to the overall growth scale measures.

In conclusion, the Research, Evaluation and Assessment Department in Minneapolis Public Schools is directing its energies and resources on the training of teachers and administrative leaders about the appropriate use of data that will drive instructional decision making. Assessments like the Northwest Evaluation Associate levels tests allow us to measure student progress more accurately against community standards and norm-referenced peer groups. We will continue to advocate and prod state and national decision makers to consider the use of assessment tools that are designed to measure achievement gain in the evaluation of current educational reforms and the evaluation of overall school performance. We must discontinue the identification of schools and teachers as “failing” based on demographic compositions and correlated factors like home or school address, and move toward a system that rewards teachers for accelerating learning for all students.

-----------------------

[1] One need not look any further than the front page or many urban Newspapers or monitor the emerging ESEA legislation of No Child Left Behind.

[2] For further discussion in this area see: Meyer, R.H. 1996. “Value-Added Indicators of School Performance,” in Eric Hanushek and Dale Jorgenson (eds.), Improving America’s Schools: The role of incentives, Washington, DC: National Academy Press, Chapter 10, pp. 197-223.

[3] Built with one-parameter (Rasch Model) IRT scaling

[4] The NWEA system relies on a locator procedure to place students in different levels of the same assessment. When the standard error of measurement is too high or students test beyond the range of a test students are re-tested. Placement at the right level, as well as the re-testing sequence ensures that all students are assessed without bias.

[5] See examples of “Teacher Information Reports” for each Minneapolis Public School at mpls.k12.mn.us/rea/

[6] Minneapolis uses paper and pencil versions of the Northwest Achievement Levels Tests. Many districts have technical infrastructure to do on-line testing of the NWEA item bank using the Measures of Academic Progress (MAP)(

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download