Padletuploads.blob.core.windows.net



Educators often need to assess students' learning and achievement. There are multiple forms of assessments that educators use to not only gain knowledge about a student's level of understanding but also to guide the direction of future lessons and course curriculum. This lesson will differentiate between formal and informal assessments and paper-pencil versus performance-based assessments used in educational settings.Introduction'Another test? Why do we have so many tests? We should just be able to learn without feeling stressed about having to prove what we really know. Don't you agree?' It seems like my friend is a little anxious about having so many assessments. Let's help her understand the value of assessment in educational settings. Informal vs. Formal AssessmentFirst, let's define the term assessment. Assessment is the process of observing a sample of a student's behavior and drawing inferences about the student's knowledge and abilities. Yes, many synonyms exist for assessment, such as test, exam, etc. The use of the word 'assessment' promotes more positive connotations with students in the classroom. It should be used in place of terms that indicate possible failure and negative outcome and may cause additional anxiety among students. Before moving on, let's discuss a few important points from the definition of assessment. First, when using assessments, teachers are looking at students' behavior. We can't see inside a student's head in order to determine what is going on, so we must take a sample of their behavior over time in order to make an inference of their knowledge and development. Secondly, the inferences that are drawn are only that - inferences. Educators must use a variety of assessment types in order to gain the most accurate inference of the students' progress overall. Educators should keep in mind assessments are tools that are only useful depending on how well they are aligned with the circumstances in which they are used. For example, a written assessment to determine how well a student can keep a beat in a music class makes no sense and would therefore be an inappropriate tool. There are two overarching types of assessment in educational settings: informal and formal assessments. Both types are useful when used in appropriate situations. Informal assessments are those assessments that result from teachers' spontaneous day-to-day observations of how students behave and perform in class. When teachers conduct informal assessments, they don't necessarily have a specific agenda in mind, but are more likely to learn different things about students as they proceed through the school day naturally. These types of assessments offer important insight into a student's misconceptions and abilities (or inabilities) that might not be represented accurately through other formal assessments. For example, a teacher might discover that a student has a misconception about other cultures and languages when she asks, 'What language do people in North Carolina speak?' Or, the teacher may wonder if Alex needs to make an appointment to have his hearing checked if he constantly says 'What?' or 'I didn't hear you.' Performance assessments include oral presentations and physical assessmentsFormal assessments, on the other hand, are preplanned, systematic attempts by the teacher to ascertain what students have learned. The majority of assessments in educational settings are formal. Typically, formal assessments are used in combination with goals and objectives set forth at the beginning of a lesson or the school year. Formal assessments are also different from informal assessments in that students can prepare ahead of time for them. Paper-Pencil vs. Performance-Based AssessmentThere are many types of formal assessments used in educational settings. In this lesson, we will discuss the difference between paper-pencil assessments and performance assessments. Additional types of formal assessments will be discussed in other lessons within this course. In paper-pencil assessments, students provide written responses to written items. You have probably taken numerous paper-pencil assessments in your educational career. Assessments in which you fill out answers on the assessment form itself or electronic forms, like Scantrons, fall under this category. Typically, paper-pencil assessments include questions to answer, topics to address through paragraph responses, problems to solve, etc. Performance assessments, on the other hand, are assessments in which students demonstrate their knowledge and skills in a non-written fashion. These assessments are focused on demonstration versus written responses. For example, giving oral presentations, completing physical assessments in physical education (PE) classes, performing experiments in a lab, or dissecting activities in anatomy classes fall under this category. Purpose of AssessmentsObserving students engaged in free-time activities is an informal assessmentAssessments are used for multiple purposes, as we have discussed earlier in this lesson. Specifically, assessments can be used as: Motivators. Research shows that students study and learn more material when they are told they will be tested on it or held accountable for the material. Think about this: Have you ever been in class and had the instructor tell you 'This won't be on your test'? What about our music class example, where a paper-pencil assessment is used when a performance-based assessment would have been more appropriate? In the first example, students may feel like learning the information is not useful. In the second example, students may feel like the assessment is worthless. In both cases, the motivation to learn actually decreases. Mechanisms for review. Assessments can also serve to promote constant review of material, which aids in moving the material from short-term to long-term memory in order to be accessed in the future. Feedback. Assessments also provide opportunities for both the teacher and student to receive feedback. Assessments provide feedback to a teacher about the student's knowledge and also the effectiveness of instruction. For students, assessments provide feedback about areas in which they may need to focus or on areas in which they are proficient. Choosing Appropriate AssessmentsLet's look at a few assessment scenarios. Try to determine what type of assessment is being used, formal or informal. Example 1. A kindergarten teacher sets aside the first 15 minutes of every day to allow students to explore, play or engage in any activity of their choice. During this time, the teacher walks around and listens to the student's conversations and observes which activities they typically choose to engage in. This is an example of informal assessment. The teacher uses the time to make inferences of student's likes and dislikes. The teacher may also gain understanding about misconceptions, social interactions and more. Example 2. A middle school teacher sets aside the first 15 minutes of English class to allow the students to free write. These writings are collected each class and returned with feedback on grammar and ideas for writing improvements. This is an example of formal assessment. The students have freedom to write about anything, but they are still expected to turn in samples each class period and they receive feedback. Teachers giving feedback on free-writing activities is a type of formal assessmentNow, let's determine which type of assessment is more appropriate for the situation, paper-pencil or performance assessments. Example 3. A PE instructor wants to assess her student's knowledge of the scoring of a tennis match. She wants to ensure students have an understanding of terminology such as 'set', 'match' and 'love'. Even though PE classes do have more performance-based assessments, this situation would better be assessed through a paper-pencil method. Terminology and concepts for which there are 'correct' or 'incorrect' responses are better assessed through paper-pencil assessments. Example 4. Our final example: a computer teacher wants to assess his student's knowledge of the software Excel. This situation calls for a performance-based assessment. In order for the teacher to gain an accurate assessment of his students' ability to use Excel, he should offer them the opportunity to demonstrate their abilities. Lesson SummaryIn summary, assessments are an integral part of education. Informal and formal types of assessment serve as ways to provide feedback for the student and teacher. They provide the student an opportunity for review, and they can serve as motivators. Performance-based and paper-pencil forms of assessment are useful, assuming they are aligned appropriately with the type of material to be assessed.If you have ever attended a public school or college you have been subjected to a form of standardized assessment. These assessments serve multiple purposes and provide valuable information regarding one's abilities, understanding and potential. This lesson will introduce you to the types of standardized assessments commonly used in schools and discuss two other types of assessments: formative and summative.Testing TimeStudent: It's testing time again. We are expected to come to school early, eat a good breakfast, and get a good night's sleep. Why do we take these assessments every year? Who creates them and what purpose do they serve? Standardized Assessments DefinedExpert: Standardized assessments are assessments constructed by experts and published for use in many different schools and classrooms. Student: What are the different uses for standardized assessments? Expert: Standardized assessments are very common and can be used for several purposes. They can be used to evaluate a student's understanding and knowledge for a particular area. They can be used to evaluate admissions potential for a student in grade school or college. They can be used to assess language and technical skill proficiency. Standardized assessments are often used to determine psychological services. They are even used to evaluate aptitude for a career in the armed forces. Standardized Assessment QualitiesStudent: Okay, I get that they are used all the time for a variety of reasons. My next question is how are they standardized? Expert: Standardized assessments have several qualities that make them unique and standard. First, all students taking the particular assessment are given the same instructions and time limit. Instructional manuals typically accompany the assessment so teachers or proctors know exactly what to say. Second, the assessments contain the same or very similar questions. Third, the assessments are scored, or evaluated, with the same criteria. Standardized Assessment TypesStudent: You mentioned that there are different types of standardized assessments. What are they? Expert: Yes, there are four main types of standardized assessments used by school districts. They are achievement assessments, scholastic aptitude and intelligence assessments, specific aptitude assessments and school readiness assessments. Achievement AssessmentsExpert: Achievement assessments are designed to assess how much students have learned from classroom instruction. Assessment items typically reflect common curriculum used throughout schools across the state or nation. For example, a history assessment might contain items that focus on national history rather than history distinct to a particular state or county. Some familiar achievement assessments you may have heard of include the Stanford-Binet Intelligence Scales (used to assess IQ) and the National Assessment of Education Progress (referred to as NAEP). There are advantages to achievement assessments. First, achievement assessments provide information regarding how much a student has learned about a subject. These assessments also provide information on how well students in one classroom compare to other students. They also provide a way to track student progress over time. There are some disadvantages to achievement assessments as well. Achievement assessments do not indicate how much a student has learned for a particular area within the subject. For example, the assessment may indicate a relative understanding of math, but will not indicate if a student knows how to use a particular equation taught in the classroom. Scholastic Aptitude AssessmentsExpert: Scholastic aptitude assessments are designed to assess a general capacity to learn and used to predict future academic achievement. Scholastic aptitude assessments may assess what a student is presumed to have learned in the past. These assessments include vocabulary terms presumably encountered over the years and analogies intended to assess how well students can recognize similarities among well-known relationships. The most common type of these assessments is the SAT. The advantages of scholastic aptitude assessments are that the same test allows for comparison of multiple students across schools and states. There is also a disadvantage; some students develop test anxiety and do not perform well on standardized scholastic assessments, causing the results to be an inaccurate reflection of the student's actual or potential academic abilities. Specific Aptitude AssessmentsExpert: Specific aptitude assessments are designed to predict future ability to succeed in a particular content domain. Specific aptitude assessments may be used by school personnel to select students for specific instructional programs or remediation programs. They may also be used for counseling students about future educational plans and career choices. One commonly used assessment to evaluate one's aptitude for a career in the armed forces is the ASVAB. With specific aptitude assessments, usually, one's ability to learn in a specific discipline is stable, and therefore, these types of assessments are an effective way to identify academic tendencies and weaknesses. The disadvantage, however, is that the use of these assessments encourages specific skill development in a few areas, opposed to encouraging the development of skills in a wide range of academic disciplines and abilities. School Readiness AssessmentsExpert: Finally, we have the school readiness assessment. School readiness assessments are designed to assess cognitive skills important for success in a typical kindergarten or first grade curriculum. These assessments are typically given six months before a child enters school. The advantage with these assessments is that they provide information regarding developmental delays that need to be addressed immediately. There are disadvantages to school readiness assessments as well. First, the evaluation has been found to have low correlation with the student's actual academic performance beyond the first few months of school. Second, school readiness assessments usually only evaluate cognitive development. However, social and emotional development is critical to one's success in kindergarten and first grade. Choosing Standardized AssessmentsStudent: Okay, I understand the types of standardized assessments and how they are standardized, but how does my teacher or school know which type of assessment to choose? Expert: There are guidelines and considerations for choosing standardized assessments: The school should choose an assessment that has a high validity for the particular purpose of testing. Meaning, that if the school wants to assess knowledge of its student's science comprehension, it should choose an assessment that evaluates science knowledge and skills. The school should make sure the group of students used to 'norm' the assessment are similar to the population of the school. The school should take the student's age and developmental level into account before administering any standardized assessments Formative and Summative AssessmentsStudent: So am I stuck taking assessments only one time a year? Expert: No, actually there are other forms of assessment that are given at different times. These are referred to as formative and summative assessments Formative assessments are ongoing assessments, reviews and observations used to evaluate students in the classroom. Teachers use these types of assessments to continually improve instructional methods and curriculum. Student feedback is also a type of formative assessment. Examples include quizzes, lab reports and chapter exams. Summative assessments are used to evaluate the effectiveness of instruction at the end of an academic year or end of the class. Summative assessments allow educators to evaluate student comprehensive competency and final achievement in the subject or discipline. Examples of these include final exams, statewide tests and national tests. Teacher: Shh! Student: I better get back to the assessment. Thank you for the information. Lesson SummaryStandardized assessments are constructed by experts and published for many uses, including evaluation of academic achievement, prediction of future academic achievement and assessment of skills and aptitude. There are four types of standardized assessments used in the classroom: achievement assessments, scholastic aptitude, specific aptitude and school-readiness assessments. Each of these evaluates a particular aspect of knowledge or skill of students compared to a larger population of similar students. Formative and summative assessments are also used in the classroom to allow educators to evaluate their students more frequently. Formative assessments are ongoing, while summative assessments occur at the end of the course or school year.Have you ever been in the middle of an assessment and thought, 'This question is unfair!' or 'This exam covers material I have never seen before!' If so, the assessment probably did not possess the qualities that make an assessment effective. This lesson will introduce you to the qualities of good assessments: reliability, standardization, validity, and practicality.4 Qualities of Good Assessments'Ugh! I am so frustrated! That was the worst test I have ever had!' 'I know! I probably knew only half the answers at most, and it was like the test had material from some other book, not the one we were supposed to study!' 'And what was with that loud hammering during the test? Couldn't the repair men have waited until after school to repair the roof?!' 'Yeah, all of that coupled with the fact that I was starving during the test ensures that I'll get a failing grade for sure.' This was definitely not a good assessment. A good assessment is supposed to show what we have truly learned. There are four qualities of good assessments. Educators should ensure these qualities are met before assessing students. They are: Reliability Standardization Validity Practicality ReliabilityReliability is defined as the extent to which an assessment yields consistent information about the knowledge, skills, or abilities being assessed. An assessment is considered reliable if the same results are yielded each time the test is administered. For example, if we took a test in History today to assess our understanding of World War I and then took another test on World War I next week, we would expect to see similar scores on both tests. This would indicate the assessment was reliable. Reliability in an assessment is important because assessments provide information about student achievement and progress. There are many conditions that may impact reliability. They include: day-to-day changes in the student, such as energy level, motivation, emotional stress, and even hunger; the physical environment, which includes classroom temperature, outside noises, and distractions; administration of the assessment, which includes changes in test instructions and differences in how the teacher responds to questions about the test; and subjectivity of the test scorer. StandardizationAnother quality of a good assessment is standardization. We take many standardized tests in school that are for state or national assessments, but standardization is a good quality to have in classroom assessments as well. Standardization refers to the extent to which the assessment and procedures of administering the assessment are similar, and the assessment is scored similarly for each student. Standardized assessments have several qualities that make them unique and standard. First, all students taking the particular assessment are given the same instructions and time limit. Second, the assessments contain the same or very similar questions. And third, the assessments are scored, or evaluated, with the same criteria. Standardization in classroom assessments is beneficial for several reasons. First, standardization reduces the error in scoring, especially when the error is due to subjectivity by the scorer. Second, the more attempts to make the assessment standardized, the higher the reliability will be for that assessment. And finally, the assessment is more equitable as students are assessed under similar conditions. ValidityThe third quality of a good assessment is validity. Validity refers to the accuracy of the assessment. Specifically, validity addresses the question of: Does the assessment accurately measure what it is intended to measure? An assessment can be reliable but not valid. For example, if you weigh yourself on a scale, the scale should give you an accurate measurement of your weight. If the scale tells you weigh 150 pounds every time you step on it, it is reliable. However, if you actually weigh 135 pounds, then the scale is not valid. Similar to reliability, there are factors that impact the validity of an assessment, including students' reading ability, student self-efficacy, and student test anxiety level. PracticalityThe fourth quality of a good assessment is practicality. Practicality refers to the extent to which an assessment or assessment procedure is easy to administer and score. Things to consider here are: How long will it take to develop and administer the assessment? How expensive are the assessment materials? How much time will the assessment take away from instruction? Lesson SummaryHey! The qualities of good assessment make up the acronym 'RSVP.' That's easy to remember! It's also important to note that of the four qualities, validity is the most important. That is because the assessment must measure what it is intended to measure above all else. Grades, graduation, honors, and awards are determined based on classroom assessment scores. Reliability is important because it ensures we can depend on the assessment results. Standardization is important because it enhances reliability. And practicality is considered last, when the other qualities have been accounted for.Ensuring that an assessment measures what it is intended to measure is a critical component in education. Assessment results are used to predict future achievement and current knowledge. This lesson will define the term validity and differentiate between content, construct, and predictive validity.Validity: DefinedThe term validity has varied meanings depending on the context in which it is being used. Validity generally refers to how accurately a conclusion, measurement or concept corresponds to what is being tested. For this lesson, we will focus on validity in assessments. Validity is defined as the extent to which an assessment accurately measures what it is intended to measure. Let me explain this concept through a real-world example. If you weigh yourself on a scale, the scale should give you an accurate measurement of your weight. If the scale tells you you weigh 150 pounds and you actually weigh 135 pounds, then the scale is not valid. The same can be said for assessments used in the classroom. If an assessment intends to measure achievement and ability in a particular subject area but then measures concepts that are completely unrelated, the assessment is not valid. Factors That Impact ValidityBefore discussing how validity is measured and differentiating between the different types of validity, it is important to understand how external and internal factors impact validity. A student's reading ability can have an impact on the validity of an assessment. For example, if a student has a hard time comprehending what a question is asking, a test will not be an accurate assessment of what the student truly knows about a subject. Educators should ensure that an assessment is at the correct reading level of the student. Student self-efficacy can also impact validity of an assessment. If students have low self-efficacy, or beliefs about their abilities in the particular area they are being tested in, they will typically perform lower. Their own doubts hinder their ability to accurately demonstrate knowledge and comprehension. Student test anxiety level is also a factor to be aware of. Students with high test anxiety will underperform due to emotional and physiological factors, such as upset stomach, sweating, and increased heart rate, which leads to a misrepresentation of student knowledge. Measurement of ValidityValidity is measured using a coefficient. Typically two scores from two assessments or measures are calculated to determine a number between 0 and 1. Higher coefficients indicate higher validity. Generally, assessments with a coefficient of .60 and above are considered acceptable or highly valid. Types of ValidityThere are three types of validity that we should consider: content, predictive and construct validity. Content validity refers to the extent to which an assessment represents all facets of tasks within the domain being assessed. Content validity answers the question: Does the assessment cover a representative sample of the content that should be assessed? For example, if you gave your students an end-of-the-year cumulative exam but the test only covered material presented in the last three weeks of class, the exam would have low content validity. The entire semester worth of material would not be represented on the exam. Educators should strive for high content validity, especially for summative assessment purposes. Summative assessments are used to determine the knowledge students have gained during a specific time period. Content validity is increased when assessments require students to make use of as much of their classroom learning as possible. The next type of validity is predictive validity, which refers to the extent to which a score on an assessment predicts future performance. Norm-referenced ability tests, such as the SAT, GRE or WISC (Wechsler Intelligence Scale for Children), are used to predict success in certain domains at a later point in time. The SAT and GRE are used to predict success in higher education. These tests compare individual student performance to the performance of a normative sample. In order to determine the predictive ability of an assessment, companies, such as the College Board, often administer a test to a group of people and then a few years or months later will measure the same group's success or competence in the behavior being predicted. A validity coefficient is then calculated, and higher coefficients indicate greater predictive validity. The final type of validity we will discuss is construct validity. In order to understand construct validity we must first define the term construct. In psychology, a construct refers to an internal trait that cannot be directly observed but must be inferred from consistent behavior observed in people. Self-esteem, intelligence and motivation are all examples of a construct. Construct validity, then, refers to the extent to which an assessment accurately measures the construct. This answers the question of: are we actually measuring what we think we are measuring? The Relationship Between Validity and ReliabilityReliability, which is covered in another lesson, refers to the extent to which an assessment yields consistent information about the knowledge, skills or abilities being assessed. An assessment is considered reliable if the same results are yielded each time the test is administered. The relationship between reliability and validity is important to understand. An assessment can be reliable but not valid. Let's return to our original example. If you weigh yourself on a scale, the scale should give you an accurate measurement of your weight. If the scale tells you that you weigh 150 pounds every time you step on it, it's reliable. However, if you actually weigh 135 pounds, then the scale is not valid. Lesson SummaryIn summary, validity is the extent to which an assessment accurately measures what it is intended to measure. Validity is impacted by various factors, including reading ability, self-efficacy and test anxiety level. Validity is measured through a coefficient, with high validity closer to 1 and low validity closer to 0. The three types of validity for assessment purposes are content, predictive and construct validity.How are test scores affected by day-to-day changes of a student? Do different people rate students' performances the same? These questions are addressed through the understanding of reliability. This lesson will define reliability, explain how reliability is measured, and explore methods to enhance reliability of assessments in the classroom.DefinitionStudent One: I'm glad that is over. It's nerve racking to perform and be evaluated by three teachers. Student Two: I agree. I also worry about how each individual teacher will score us. I hope they use the same criteria! Student One: Oh, you are referring to reliability of the scores. Do you know about reliability? Student Two: Not really. I've never used that term before. Student One: Oh! I'll explain! Reliability is defined as the extent to which an assessment yields consistent information about the knowledge, skills, or abilities being assessed. A reliable assessment is replicable, meaning it will produce consistent scores or observations of student performance. For example, our singing performances should result in similar scores from the three teachers. If one teacher gives us a score of 10 out of 10, and the other gives us a score of 2 out of 10, the scores are not considered reliable. Student Two: Oh, okay. So it seems like many factors could impact the reliability of a test or performance. Student One: You are right. Conditions that Impact ReliabilityStudent One: There are many conditions that impact reliability. They include: Day-to-day changes in the student (such as energy level, motivation, emotional stress, and hunger) Physical environment (which includes classroom temperature, outside noises, and distractions) Administration of the assessment (which includes changes in test instructions and differences in how the teacher responds to questions about the test) Test length (generally, the longer the test, the lower the reliability) Subjectivity of the test scorer Measurement of Reliability: Reliability CoefficientStudent Two: So, how is reliability measured? Student One: Reliability is determined by comparing two sets of scores for a single assessment (such as two rater scores for the same person) or two scores from two tests that assess the same concept. These two scores can be derived in different ways depending on the type of reliability being assessed. Once we have two sets of scores for a group of students or observers, we can determine how similar they are by computing a statistic known as the reliability coefficient. The reliability coefficient is a numerical index of reliability, typically ranging from 0 to 1. A number closer to 1 indicates high reliability. A low reliability coefficient indicates more error in the assessment results, usually due to temporary factors that we previously discussed. Reliability is considered good or acceptable if the reliability coefficient is .80 or above. Types of ReliabilityStudent One: There are multiple types of reliability. Inter-rater ReliabilityIn other words, do different people score students' performances similarly? This type of reliability is used to assess the degree to which different observers or scorers give consistent estimates or scores. For example, we performed in front of three teachers who scored us individually. High inter-rater reliability would indicate each teacher rated us similarly. Test-Retest ReliabilityIt is used to assess the consistency of scores of an assessment from one time to another. The construct to be measured does not change - only the time at which the assessment is administered changes. For example, if we are given a test in science today and then given the same test next week, we could use those scores to determine test-retest reliability. Test-retest reliability is best used to assess things that are stable over time, such as intelligence. Reliability is typically higher when little time has passed between administrations of assessments. Parallel-Forms ReliabilityThis type of reliability is determined by comparing two different assessments that were constructed using the same content domain. For example, if our science teacher created an assessment with 100 questions that measure the same science content, she would divide the test up into two versions with 50 questions each and then give two versions of the test to her students. She would use a score from version 1 and a score from version 2 to assess parallel-forms reliability. Internal Consistency ReliabilityThis form of reliability is used to assess the consistency of scores across items within a single test. For example, if our science teacher wants to test the internal consistency reliability of her test questions on the scientific method, she would include multiple questions on the same concept. High internal consistency would result in all of the scientific method questions being answered similarly. However, if students' answers to those questions were inconsistent, then internal consistency reliability is low. Increasing Reliability of Classroom AssessmentsStudent One: Educators can increase or enhance the reliability of their assessments. They can give several similar tasks or questions in an assessment to look for consistency of student performance. They must define each task clearly so temporary factors, such as test instruction, does not impact performance. If possible, educators should avoid assessing students' learning and performance when they are sick or there are external factors, such as uncontrollable noise in the classroom. One final way of increasing reliability of classroom assessments is for educators to identity specific concrete criteria and use a rubric with which to evaluate student performance. Lesson SummaryReliability ensures the consistency of scores or observations of student performance. External and internal temporary factors may impact reliability, such as day-to-day changes in the student, physical environment factors, and subjectivity of the scorer. Reliability is measured through the reliability coefficient with a numerical index range from 0 to 1. 1 indicates high reliability, while 0 would indicate lower. The different types of reliability - inter-rater, test-retest, parallel-forms, and internal consistency - measure different aspects, but all use the standard reliability coefficient range. Generally, a reliability of .80 or above indicates good or acceptable reliability.Playing a musical instrument, creating a spreadsheet and performing in a play are all activities that many of us engage in on a regular basis. These activities are also examples of ways teachers assess a student's mastery of a subject in educational settings. This lesson will define performance-based assessments and discuss the various uses of performance assessments in the classroom.IntroductionPE and band tests are examples of performance assessmentsDid you take a band or PE class in school? The tests in those classes were always a bit different. Instead of pulling out a pencil and answering questions on a piece of paper, you probably had to perform an activity for a grade. Those activities served as ways to measure your knowledge and abilities for that particular subject. Those activities are referred to as performance assessments and we will focus on performance assessments in this lesson. DefinitionPerformance assessments are assessments in which students demonstrate their knowledge and skills in a non-written fashion. These assessments are focused on demonstration versus written response. Playing a musical instrument, identifying a chemical in a lab, creating a spreadsheet in computer class and giving an oral presentation are just a few examples of performance assessments. These types of assessments provide educators with an alternative method to assess students' knowledge and abilities, but they must be used with a specific purpose in mind. Let's discuss four concepts to aid our understanding in choosing appropriate performance-based assessment tasks. Guidelines for Choosing Appropriate TasksThe first guideline deals with products versus process. A product is a tangible creation by a student that could take the form of a poster, drawing, invention, etc. Performance assessments are useful in assessing these products in order to gauge a student's level of understanding and ability. For example, asking a student to create an invention for a science class that incorporates Newton's laws of gravity would be a way to assess a product and the student's knowledge of scientific principles. Completing a push-up in PE is a restricted performanceSometimes we don't have a product to assess and must assess a process. In situations with no tangible product, teachers must assess the process and the behaviors that students display. Giving an oral presentation, singing a song or demonstrating a tennis swing are examples of processes that could be assessed. When assessing a process, teachers may be interested in examining students' cognitive processes as well. Teachers can learn a great deal about a student's thinking by assigning a process task. For example, if a teacher wants to understand the thinking processes behind her student's knowledge of force and acceleration, she might assign students' an activity in which they perform experiments to determine how fast objects will roll down an incline. In this example, the teacher would have students make predictions first, then complete the experiments. The student predictions allow the teacher to gauge their understanding of the scientific principles behind the experiment. When considering using performance assessments, we must also consider individual versus group performance. Teachers have the ability to assign individual or group assessments. Group performance assessments allow teachers to assign complex projects and tasks that are best accomplished by many students. For example, a geography teacher wants to assess his students' understanding of town planning. He may assign a project requiring the students to collect data, make maps, predict population growth, etc. Group performance projects allow students to assess their peers also, which provide a different level of assessment for the teacher. Some performance tasks are relatively short in duration; this is referred to as restricted performance. These are tasks that involve a one-time performance of a particular skill or activity. For example, a PE instructor asks her students to perform a push-up. She wants to assess their form for this one particular exercise. Planning, writing, and performing a school play is an extended performance exampleAlternatively, other performance tasks are extended. When teachers assess extended performance they want to determine what the students are capable of for long periods of time. This method allows teachers to assess development at certain points over time. It also allows time for feedback and the opportunity for students to edit their work. For example, an English teacher might task the students with putting on a play at the end of a semester. The students may have specific goals to meet throughout the semester, which are also assessed, such as creating an outline, assigning roles, writing the script and creating props. The play at the end of the semester concludes the process. While extended performance tasks allow for a more thorough assessment of student knowledge, they are very time-consuming to administer. Another thing to consider is static versus dynamic assessment. Static assessments, which are the most common forms of performance and paper-based pencil assessments, focus on student's existing abilities and knowledge. Dynamic assessment, on the other hand, systematically examines how a student's knowledge or reasoning may change as a result of learning or performing specific tasks. This concept is consistent with Vygotksy's concept of Zone of Proximal Development and provides teachers with information about what students are likely to accomplish with appropriate structure and guidance. Guidelines for Performance AssessmentsFor all types of performance assessment, the following guidelines should be followed: Tasks should be defined clearly and unambiguously. The teacher must specify the scoring criteria in advance. The teacher must be sure to standardize administration procedures as much as possible. The teacher should encourage students to ask questions when tasks are not clear. Lesson SummaryIn summary, performance assessments offer an alternative to paper and pencil formats. They allow teachers to assess products and processes. Performance assessments may be assigned individually or by grouping students together. They may occur once, allowing a student to demonstrate a distinct skill or ability, or occur over an extended period of time. Some performance assessments may assess a student's actual abilities, while others could be used to assess abilities that could be performed over time with the assistance of a more advanced individual. All performance assessments should correspond to the type of situation that is being assessed. Finally, all performance assessments should begin with clearly defined tasks and scoring criteria.Summarizing test results is a critical component of the assessment process. In order for results to be used effectively, they must be summarized in a way that allows educators to compare the achievement of one student to others. This lesson will describe the first step in summarizing results: understanding the basic statistics of score distribution.Summarizing Test Results: Raw Score'My students took these tests, but now I'm being asked by the principal to summarize their test results. How am I ever going to summarize these test results? I don't even know what that means!' You look upset. Are you having some difficulty summarizing these results? I had problems too until I learned a few methods. I can help you! I will explain what this means, why summarizing results is important and some methods you can use to summarize these test results. The normal distribution shows the midpoint of the set of scoresLet's start here. This test has a score of 85. This is a raw score. A raw score is the score based solely on the number of correctly answered items on the assessment. This raw score will tell you how many questions the student got right, but just the score itself won't tell you much more. Let's now move onto how scores can be used to compare one student's results to the results of other students. Normal DistributionAll test scores fall along a normal distribution. A normal distribution is a pattern of educational characteristics or scores in which most scores lie in the middle range and only a few lie at either extreme. To put it simply, some scores will be low and some will be high, but most scores will be moderate. The normal distribution shows two things: The variability or spread of the scores. The midpoint of the normal distribution. This midpoint is found by calculating a mean of all of the scores, or, in other words, the mathematical average of a set of scores. For example, if we had the following raw scores from your classroom - 57, 76, 89, 92, and 95 - the variability would range from 57 being the low score to 95 being the high score. Plotting these scores along a normal distribution would show us the variability. The midpoint of the distribution is also illustrated. Standard DeviationThe normal distribution curve helps us find the standard deviation of the scores. Standard deviation is a useful measure of variability. It measures the average deviation from the mean in standard units. Deviation, in this case, is defined as the amount an assessment score differs from a fixed value, such as the mean. Standard deviations are used as a measure of variabilityThe mean and standard deviation can be used to divide the normal distribution into several parts. The vertical line at the middle of the curve shows the mean, and the lines to either side reflect the standard deviation. A small standard deviation tells us that the scores are close together, and a large number tells us that they are spread apart more. For example, a set of classroom tests with a standard deviation of 10 tells us that the individual scores were more similar than a set of classroom tests with a standard deviation of 35. In statistics, there is a rule called the 68-95-99.7 rule. This rule states that for a normal distribution, almost all values lie within one, two or three standard deviations from the mean. Specifically, approximately 68% of all values lie within one standard deviation of the mean. Approximately 95% of all values lie within two standard deviations of the mean and approximately 99.7% of all values lie within three standard deviations of the mean. Lesson SummaryNow you can see a few ways to understand and summarize assessment results. First, we convert the raw score, which is the score based on the number of correctly answered items. We can then compare the results of one student to a larger population of students. We must understand the basic statistics of test score distribution. Test scores fall along a normal distribution, which shows that the majority of scores fall in the middle of the curve, with a few falling along the upper or lower range. This distribution shows us the spread of scores and the average of a set of scores. The normal distribution enables us to find the standard deviation of test scores, which measures the average deviation from the mean in standard units. Finally, according to the 68-95-99.7% rule, approximately all scores will fall within one, two or three standard deviations away from the mean.Assessment results can yield valuable information about the individual test-taker and the larger population of test-takers. This lesson will describe how to compare test scores to a larger population by explaining standard score, stanines, z-score, percentile rank and cumulative percentage.Standard Score, Stanines and Z-ScoreOkay, you explained how to use a normal distribution to understand test scores. Now I still need to compare individual test scores to a larger population. Can you help me understand how to do that? A common method to transform raw scores (the score based solely on the number of correctly answered items on an assessment) in order to make them more comparable to a larger population is to use a standard score. A standard score is the score that indicates how far a student's performance is from the mean with respect to standard deviation units. In another lesson, we learned that standard deviation measures the average deviation from the mean in standard units. Deviation is defined as the amount an assessment score differs from a fixed value. The standard score is calculated by subtracting the mean from the raw score and dividing by standard deviation.Example of a standard deviation graphIn education, we frequently use two types of standard scores: stanine and Z-score. Stanines are used to represent standardized test results by ranking student performance based on an equal interval scale of 1-9. A ranking of 5 is average, 6 is slightly above average and 4 is slightly below average. Stanines have a mean of 5 and a standard deviation of 2. Z-scores are used frequently by statisticians and have a mean of 0 and a standard deviation of 1. A Z-score tells us how many standard deviations someone is above or below the mean. To calculate a Z-score, subtract the mean from the raw score and divide by the standard deviation. For example, if we have a raw score of 85, a mean of 50 and a standard deviation of 10, we will calculate a Z-score of 3.5. Cumulative Percentage and Percentile RankAnother method to convert a raw score into a meaningful comparison is through percentile ranks and cumulative percentages. Percentile rank scores indicate the percentage of peers in the norm group with raw scores less than or equal to a specific student's raw score. In this lesson, 'norm group' is defined as a reference group that is used to compare one score against similar others' scores. Cumulative percentages determine placement among a group of scores. Cumulative percentages do not determine how much greater one score is than another or how much less it is than another. Cumulative percentages are ranked on an ordinal scale and are used to determine order or rank only. Specifically, this means that the highest scores in the group will be the top score no matter what that score is. For example, let's take a test score of 85, the raw score. If 85 were the highest grade on this test, the cumulative percentage would be 100%. Since the student scored at the 100th percentile, she did better than or the same as everyone else in the class. That would mean that everyone else made either an 85 or lower on the test. Graph illustrating cumulative percentagesCumulative percentages and percentiles are ranked on a scale of 0%-100%. Changing raw scores to cumulative percentages is one way to standardize raw scores within a certain population. Lesson SummarySo you can see there are a few ways to understand and summarize assessment results. Let's recap what we discussed, and hopefully you will be able to apply these concepts to your classrooms. Test scores fall along a normal distribution, which we learned about in a previous lesson. A normal distribution shows that the majority of scores fall in the middle of the curve, with a few falling along the upper or lower range. This distribution shows us the spread of scores and the average of a set of scores. The normal distribution enables us to find the standard deviation of test scores, which measures the average deviation from the mean in standard units. Standard scores indicate how far a student's performance is from the mean with respect to standard deviation, and there are a few types of standard scores used in education, including stanines and Z-scores. Finally, we also discussed ways to represent scores by percentile and cumulative percentage rankings, which indicate the percentage of peers in the comparison group with raw scores less than or equal to the specified top score.When a teacher gives an exam in class, how does she decide if the test scores were good or bad? This lesson focuses on classroom assessment, specifically how to analyze the variability of scores within a given group of students. We'll discuss both standard deviation and bell curves.Variability in DataImagine you're a teacher and you test your students on the United States' Revolutionary War. How do you know what the scores mean in terms of how well the students did once you grade the test? You might be interested in how much variability, or difference, there was in the students' scores. In other words, did all of the students get similar scores to each other? Or did some students do really well, while other students in the same class did really poorly? The purpose of this lesson is to talk about how you can learn about the variability of scores in a classroom environment and why that might matter. We're going to cover two important concepts: standard deviation of scores and how to interpret a normal distribution, also known as a bell curve. Standard DeviationLet's start by using an example. Imagine you teach a class with 20 students, and they take a test with 20 multiple choice questions about the Revolutionary War. Imagine that the grades you get back from scoring their tests look like this: Student #1: 20Student #11: 10Student #2: 17Student #12: 10Student #3: 16Student #13: 10Student #4: 14Student #14: 8Student #5: 14Student #15: 8Student #6: 12Student #16: 8Student #7: 12Student #17: 6Student #8: 12Student #18: 6Student #9: 10Student #19: 4Student #10: 10Student #20: 3Now you want to know the basic variability within the classroom. So, did the students' scores kind of clump up all together, meaning the students all showed about the same amount of knowledge? Or did the scores vary widely from each other, meaning some students did great whereas other students failed the test? The answer to this question can come very precisely from the standard deviation, which is a measurement that indicates how much a group of scores vary from the average. So let's look at our example from the Revolutionary War test. Calculating Standard DeviationThe standard deviation calculation looks complicated at first, but it's really quite simple if we take it step by step. We'll start by finding the mean, or average, of all the scores. To do this, we add up all the scores and divide by the total number of scores. This gives us a mean of 10.5. The next step is to take each score, subtract the mean from it and square the difference. For example, looking at the top score of 20, we subtract 10.5 and then square the difference to get 90.25. We repeat this process for each score. Now, we add up all our squared differences and divide by the total number of scores. This gives us 353/20, or 17.65. The final step is to take the square root of this number, which is 4.2. This is the standard deviation of the scores. Why Standard Deviation MattersNow that we have our standard deviation of 4.2, what the heck does that mean? Well, it just gives us an idea of how much the scores on the test clumped together. To understand this better, look at the two distributions of scores on the screen. The one on the left shows scores that are all very similar to each other. So, because the scores are all close together, the standard deviation is going to be very small. But, the one on the right shows scores that are all pretty different from each other (lots of high scores on the test, but also lots of failing grades on the test). For this distribution, we'd have a high number for our standard deviation. So why do we care about standard deviation at all? Well, a teacher would want to know this information because it might change how he or she teaches the material or how he or she constructs the test. Let's say that there's a small standard deviation because all of the scores clustered together right around the top, meaning almost all of the students got an A on the test. That would mean that the students all demonstrated mastery of the material. Or, it could mean that the test was just too easy! You could also get a small standard deviation if all of the scores clumped together on the other end, meaning most of the students failed the test. Again, this could be because the teacher did a bad job explaining the material or it could mean that the test is too difficult. Most teachers want to get a relatively large standard deviation because it means that the scores on the test varied across the grade range. This would indicate that a few students did really well, a few students failed, and a lot of the students were somewhere in the middle. When you have a large standard deviation, it usually means that the students got all the different possible grades (like As, Bs, Cs, Ds, and Fs). So, the teacher can know that he or she taught the material correctly (because at least some of the students got an A) and the test was neither too difficult, nor too easy. So, we can get a good idea of the pattern of variability using this idea of standard deviation. However, there's one more way you can look at the pattern of scores. That's our last topic for this lesson, and it's the idea of a bell curve, which is the most common type of distribution for a variable. Bell CurvesLet's plot the test scores in a graph. The x-axis is for the score received, and the y-axis is for the number of students who got that score. So still using our same example of 20 students who took a test with 20 questions, you can see here the pattern that shows up on the graph. There's a big bump in the middle, showing the five students who got the middle score of 10. Then, the graph tapers off on each side, indicating that fewer students got very high or very low scores. The shape of this distribution, a large rounded peak tapering away at each end, is called a bell curve. Remember that we said that most teachers will want their students' scores to look kind of like what we see here. We had a lot of scores that fell in the middle (indicated by the big bump), which might be like a letter grade of a C. We had a few students who did really well (which might be like the grade of A) and a few students who did poorly (in other words, they got an F). When you have a bell curve that looks like this one, with a bump in the middle and little ends on each side, you know you have a normal distribution. A normal distribution has this bell shape and is called normal because it's the most common distribution that teachers see in a classroom. Now, imagine that most of the students got an A on the test. What would that distribution look like? It would look something like this one. You can see here that the bump falls along the right side of the graph (where the higher scores are), with it tapering off only on the left side, showing that most students got high scores and only a few got low scores. The exact opposite would be true if most students got an F, which would look like this graph. When a distribution is not normal and is instead weighted heavily on either side like this, it's called a skewed distribution. Skewed distributions are both less common in classrooms and usually less desirable. Lesson SummaryIn summary, standard deviation is a measurement that indicates how much a group of scores vary from the average. Standard deviation is important because it can tell you how much a group of grades varied on any given test. It might be able to tell you if the test was too easy or too difficult. When the graph of a group of scores makes a bell curve, a large rounded peak tapering away at each end, we have a normal distribution. A normal distribution has most scores falling in the middle of the range, whereas a skewed distribution is not normal and is instead weighted heavily on either side with most scores falling on one extreme end or the other. Think about the typical scores that you used to get in middle school or high school. Where would your grades fall?Assessment results allow educators to make important decisions about students' knowledge, abilities and future educational potential. There are multiple ways to summarize and interpret assessment results. This lesson will discuss ways to summarize norm-referenced assessments and criterion-referenced assessments.Using AssessmentsTeacher: Thank you for coming in today to meet with me regarding your child's progress in school. I want to provide you information on the multiple types of assessments we take in the classroom and explain how we score and use the results for various purposes. We take multiple types of assessments in our class, and there are many ways I summarize the results of these assessments. These summaries provide feedback regarding your child's level of mastery and understanding. These assessments also give me a way to address any areas of weakness for individual students or in the class as a whole. Raw ScoresThe most basic way to summarize an assessment is through a raw score. A raw score is the score based solely on the number of correctly answered items on an assessment. For example, this is your child's most recent math test. His raw score was a 96 because he got 96 items correct on the assessment. Raw scores are often used in teacher-constructed assessments. The potential drawback to the use of raw scores is that they may be difficult to interpret without knowledge of how one raw score compares to a norm group, which is a reference group used to compare one test taker's score to similar other test takers. We'll talk about using norm-referenced scores in a moment. Raw scores may also be difficult to understand without comparing them to specific criteria, which we will discuss now. Criterion-Referenced ScoresI want to discuss another method of scoring: criterion-referenced scoring. This refers to a score on an assessment that specifically indicates what a student is capable of or what knowledge they possess. Student scores can be tied to an equivalent age or grade levelCriterion-referenced scores are most appropriate when an educator wants to assess the specific concepts or skills a student has learned through classroom instruction. Most criterion-referenced assessments have a cut score, which determines success or failure based on an established percentage correct. For example, in my class, in order for a student to successfully demonstrate their knowledge of the math concepts we discuss, they must answer at least 80% of the test questions correctly. Your child earned an 85% on his last fractions test; therefore, he demonstrated knowledge of the subject area and passed. It's important to remember that criterion-referenced scores tell us how well a student performs against an objective or standard, as opposed to against another student. For example, a learning objective in my class is 'students should be able to correctly divide fractions.' The criterion-referenced score tells me if that student meets the objective successfully. The potential drawback for criterion-referenced scores is that the assessment of complex skills is difficult to determine through the use of one score on an assessment. Norm-Referenced ScoresNow let's discuss the type of score that compares one student's performance on an assessment with the average performance of other peers. This is referred to as norm-referenced scores. Norm-referenced scores are useful when educators want to make comparisons across large numbers of students or when making decisions on student placement (in K-12 schools or college) and grade advancement. Some familiar examples of norm-referenced assessments are the SAT, ACT and GRE. Age/Grade Equivalent, Percentile, StandardStandard deviation units show the normal distribution for a set of scoresThere are three types of norm-referenced scores. The first is age or grade equivalent. These scores compare students by age or grade. Breaking this type down, we can see that age equivalent scores indicate the approximate age level of students to whom an individual student's performance is most similar, and grade equivalent scores indicate the approximate grade level of students to whom an individual student's performance is most similar. These scores are useful when explaining assessment results to parents or people unfamiliar with standard scores. For example, let's look at your child's raw score on a recent math standardized assessment. Looking at the chart, we see that your child's raw score of 56 places him at an 8th grade level and an approximate age of 13. The potential disadvantage of using age or grade equivalent scores is that parents and some educators misinterpret the scores, especially when scores indicate the student is below expected age or grade level. The second type of norm-referenced scoring is percentile rank. These scores indicate the percentage of peers in the norm group with raw scores less than or equal to a specific student's raw score. Percentile rank scores can sometimes overestimate differences of students with scores that fall near the mean of the normed group and underestimate differences of students with scores that fall on the extreme lower or upper range of the scores. For example, let's look at your child's percentile score on a recent math standardized assessment. The percentile indicates he scored a 55. This means that he scored better than 55% of other students taking the same assessment. The final type of norm-referenced scoring is standard score. These scores indicate how far a student's performance is from the mean with respect to the normal distribution of scores (also referred to as standard deviation units). While these scores are useful when describing a student's performance compared to a larger group, they might be confusing to understand without a basic knowledge of statistics - which is covered in another lesson. We see here from your son's score he falls about one standard deviation away from the mean (the average scores of the population that took the same assessment). This information tells us that his score is slightly above the scores of the other students. Lesson SummaryOkay, so let's recap what we have discussed in our meeting. First, there are multiple ways to score assessments. The scores tell us different things about a student's progress. Raw scores are simply the number of items correct on an assessment. Criterion-referenced scores tell us what a student is capable of because the score is reflective of successful demonstration of knowledge or failure to demonstrate knowledge in a specific area. Norm-referenced scores are a bit more complicated. These scores compare one student's score to other students across large groups. Scores can be compared by age and grade, referred to as age or grade equivalent scores. Scores can also represent a percentile ranking, which indicates the percentage of peers in the norm group scoring equal or lower to a specific student's score, referred to as percentile scores. Finally, scores can be compared to a mean, referred to as standard score.How does a teacher decide what is a good exam score and what is a bad one? This lesson focuses on classroom assessment, but instead of different types of assessment (such as essay versus true/false questions), we'll discuss statistical methods for summarizing scores on any form of testing. Specifically, this lesson covers the statistical tools known as the mean, median and mode.The Statistics of Classroom AssessmentImagine you're a teacher, and you give your students a test over their understanding of the United States Revolutionary War. How do you know if they really learned anything? You can look over their test answers, but what are your expectations? Do you want every student to get every single question right? Is it okay if half of the students get every question right but half of the students get zero questions right? What about if every student gets half of the questions right and half wrong? What's your threshold for feeling like the students really learned? This lesson's focus is on how you can assess learning in a classroom environment. We're not going to talk about different testing methodologies such as a multiple-choice test versus an essay test. While that question is really important, the focus of this lesson is how to understand the results from any form of testing in terms of how to translate the results into an understanding of where the class lies in terms of how well they learned the information. In order to do this, we'll be discussing how to use some basic statistics to analyze any assessments done in the classroom. The lesson includes the concepts of mean, median and mode. In a different lesson, you'll learn about standard deviations and bell curves, two additional concepts you might need to know. Statistics of Summary: the MeanLet's start with the example from before. Imagine you teach a class with 20 students, and they take a test with 20 multiple choice questions about the Revolutionary War. Imagine that the grades you get back from scoring their tests look like this: Student #1: 20 Student #11: 10Student #2: 17 Student #12: 10Student #3: 16 Student #13: 10Student #4: 14 Student #14: 8Student #5: 14 Student #15: 8Student #6: 12 Student #16: 8Student #7: 12 Student #17: 6Student #8: 12 Student #18: 6Student #9: 10 Student #19: 4Student #10: 10 Student #20: 3So you can see we have student #1 through student #20, and you can see that the scores range quite a bit. Looking at these scores you can see that one student, student #1, got a perfect score of 20 out of 20. Many students got scores somewhere in the middle, with five students getting half of the questions right - that would be a score of 10 out of 20. A few of the students did pretty well, only missing a few questions, while a few students did pretty badly but at least got a few of the questions right. How can we be more precise with analyzing these scores using statistics? Let's say the principal of your school wants a quick summary of how your students did on their test. How would you summarize the results? The most common type of statistic, in either the context of the classroom assessment or in laboratory research projects, is statistics of summary. There are various types of statistics of summary, but in general their purpose is to quickly give a general impression of the overall trend in results. So, just like you'd guess, based on the term statistics of summary, these statistics just give you a ballpark idea of what happened on the test. Let's go over three different types of summary statistics. Find the mean by adding up all the scores and dividing by the number of students.The most well-known statistic of summary is called the mean, which is the term we use for the arithmetic average score. When most people use the term 'average score,' what they're really referring to, technically, is what we call the mean. How do we calculate the mean? We simply add up all of the individual results, get the total, and then divide by the number of students in the class. In our example, you can see how this would look on the screen. If you add up the scores of 20 + 17 + 16 and so on through all 20 students, you get a total score of 210. You divide 210 by 20 (the number of students), and you get a mean of 10.5. You can see that this score, 10.5, is a pretty representative score of the middle score for this class, so it works nicely as a summary. Statistics of Summary: the MedianA different statistic of summary is called the median. A median is simply the score that falls exactly in the middle, such that half of the people had higher scores, and half of the people had lower scores. To find the median, you don't have to do any actual math (hooray!). All you have to do is put the scores in numerical order from highest to lowest, find out how many people are in each half (so in this example, it's 10 in the top half and 10 in the bottom half) and count down until you're at the 10th score. The 10th score here is the student who got 10 questions right out of 20. So you can use the median just as a nice way of knowing where the middle student fell. Why would you use the median instead of the mean? In this example, the two scores are pretty similar (a mean of 10.5 versus a median of 10). So here, it's doesn't really make any difference which one you would pick. The difference between the mean and the median only really matters if you have extreme scores on one end or the other. Let's say you were curious to know how many state capitals the children in your kindergarten class knew. Let's say you have five kids in the class, and maybe you get scores like these: Child 1: 1 capitalChild 2: 2 capitalsChild 3: 3 capitalsChild 4: 3 capitalsChild 5: 47 capitalsSo child #1 only knew one capital, child 2 knew two capitals, and that's basically average until you get to child 5, who actually knew 47 capitals. Here, one of the scores (the child who knows 47 capitals) is extremely different from the rest of the scores. When a score is extremely different from the rest of the scores in a distribution, that score is called an outlier. If an outlier exists in your data, it will have a huge effect on the mean. Here, the mean would be: 1 + 2 + 3 + 3 + 47 = 56 / 5 = 11.2. So the mean of 11.2 is not a very good representation of the actual average number of state capitals kindergartners in your class knew. This number makes it look like they are much better at naming capitals than they really are. So in the case of outliers, it's better to use the median. In our example, the median is, again, just the middle number - so the median here is 3. The number 3 is a much better representation of the basic level of the class on this particular task. So in summary, use a median if you have outliers, because the mean might not be a good summary number in this case. Statistics of Summary: the ModeThe third and final statistic of summary is called the mode, which is simply the score obtained by the most people in the group. Let's go back to our original example of scores on the history test for the Revolutionary War. When you look at the test scores again, what is the most common score? The answer here is the score of 10. Five students got that score, so the mode in our example is the score of 10. Again, in this particular example, the mode is similar to both the mean and the median. So why would you use the mode instead of the mean or median? Usually the mode is used for examples when scores are not in numerical form. Remember, the mode is telling you what the most common answer is. So modes are good when the data involved are categorical instead of numerical. The most common score among the group represents the mode.Think about baseball teams. Who won the World Series last year? Do you know the team that's won the World Series the most often ever since it began? The answer is the New York Yankees. So it's accurate to say that the mode team for winning the World Series is the Yankees, because it's the most common answer. Let's go over one more example. When you get a new car, your car insurance price is based on a lot of things, like your gender and age, but it's also based on the color of your car. You have to pay more for insurance if you drive a red car. Why is that? It's because the mode color of car that gets into accidents is red. In other words, red cars get in more accidents than any other car - so red is the mode car accident color. It wouldn't make sense to try to use a mean or a median when talking about colors of cars, because there aren't any numbers involved. So for categories like colors or baseball teams, you have to use the mode if you want to create a statistic of summary. Lesson SummaryIn summary, classroom teachers often want to know how well an assessment went by summarizing the general trend in how well students did. We covered three different types of summary statistics. The mean is the arithmetic average score, or the number you get when you add up all the individual scores, then divide by the number of students. The median is simply the score in the middle, where half of the students did better than this score and half did worse. We use the median when there are outliers, or extreme scores that might affect the mean. Finally, the mode is simply the most common score or category. Modes are usually used when the data aren't in numerical form, which means that the mean or median are impossible to use. No matter which of these statistics you use, any of them are great ways to summarize a classroom assessment with a single, simple answer.Standardized tests are used frequently in educational settings. This lesson will help you understand the advantages and disadvantages of these tests and also explore factors that impact standardized test performance.Standardized Tests: Background InformationStudent: When did we first start using standardized assessments? Expert: Standardized assessments are defined as assessments constructed by experts and published for use in many different schools and classrooms. These assessments are used in various contexts and serve multiple purposes. Americans first began seeing standardized tests in the classroom in the early 20th century. Currently, standardized tests are widely used in grade school and are even required in most states due to the No Child Left Behind Act of 2001. Student: Most of my test items are multiple-choice. Are all standardized tests multiple-choice? Expert: Standardized tests may be comprised of different types of items, including multiple-choice, true-false, matching, essay and spoken items. These assessments may also take the form of traditional paper-pencil tests or be administered via computer. In some instances, adaptive testing occurs when a computer is used. Adaptive testing is when the students' performance on items at the beginning of the test determines the next items to be presented. Standardized Tests: AdvantagesStudent: So are all standardized tests good to use? Expert: Well, actually, there are multiple advantages and disadvantages of these types of tests. Let's talk about the advantages first. There are many advantages of standardized testing: Standardized tests are practical, they're easy to administer and they consume less time to administer versus other assessments. Standardized testing results are quantifiable. By quantifying students' achievements, educators can identify proficiency levels and more easily identify students in need of remediation or advancement. Standardized tests are scored via computer, which frees up time for the educator. Since scoring is completed by computer, it is objective and not subject to educator bias or emotions. Standardized testing allows educators to compare scores to students within the same school and across schools. This information provides data on not only the individual student's abilities but also on the school as a whole. Areas of school-wide weaknesses and strengths are more easily identifiable. Standardized testing provides a longitudinal report of student progress. Over time, educators are able to see a trend of growth or decline and rapidly respond to the student's educational needs.Standardized testing allows educators to determine trends in student progressStandardized Tests: DisadvantagesExpert: There are disadvantages of standardized testing. Standardized testing is also highly scrutinized. Critics cite the following disadvantages for the use of standardized testing: Standardized test items are not parallel with typical classroom skills and behaviors. Due to the fact that questions have to be generalizable to the entire population, most items assess general knowledge and understanding. Since general knowledge is assessed, educators cannot use standardized test results to inform their individual instruction methods. If recommendations are made, educators may begin to 'teach to the test' as opposed to teaching what is currently in the curriculum or based on the needs of their individual classroom. Standardized test items do not assess higher-level thinking skills. Standardized test scores are greatly influenced by non-academic factors, such as fatigue and attention. Factors that Impact Standardized Testing by GradeStudent: Yeah, I've noticed that if I take a test on a day that I'm really tired, I'll perform lower than usual. Expert: Yes, there are several factors that will impact standardized test scores. A big disadvantage to standardized testing are those non-academic factors that impact scores. These include test anxiety, fatigue, lack of attention - and the list goes on. Specific characteristics can be categorized by grade. In grades kindergarten-2nd, students have a short attention span, and there is a large amount of variability in their attention span. There is very little motivation to do well on standardized tests in this grade range due to students' inability to understand the purpose of the test. Test results are also inconsistent among this grade range. For grades 3-5, test scores are interpreted as the 'end-all be-all' for evidence of academic ability, which causes a lot of stress and anxiety. Students in this grade still have a wide range of abilities and levels of understanding, which lead to wide variability in scores. In grades 6-8, there is an increase in test anxiety. Students also tend to become more skeptical about the value of standardized tests in this grade range. And then, in grades 9-12, there is more skepticism regarding usefulness and validity of standardized tests. There is decreased motivation to perform well on tests, and many students in this grade range have deemed themselves 'poor test-takers' and stop trying. Promoting Standardized Testing in the ClassroomExpert: There are ways to promote standardized testing in the classroom. Since standardized testing prevails in most classrooms, educators should utilize strategies to keep students motivated and to help them understand the value of the tests. Educators can remind students of the value of standardized test scores for tracking academic progress over time. Educators can encourage students to do well, but also remind them that their skills and knowledge are assessed through a variety of methods, not just standardized tests. Educators should acknowledge shortfalls of standardized tests but also promote the benefits of this type of assessment as well. Finally, educators should allow students to practice test-taking skills that are needed for standardized testing (such as giving timed assignments) in order to decrease test anxiety. Lesson SummaryExpert: Standardized tests are created by experts and used to assess understanding and skills for a variety of purposes. Standardized tests are comprised of different types of items, including multiple-choice and essay questions. Additionally, standardized tests can be administered via computer or traditional paper-pencil methods. There are many advantages of standardized testing, including the practicality and ease of administration and ability to compare results among a large group of students. Disadvantages also exist, including the fact that standardized test items do not assess higher-level thinking skills and test scores are impacted by non-academic factors such as stress and fatigue. Educators can use strategies to help students understand the academic importance of standardized tests and also prepare their students in advance to ease test anxiety.Do high test scores equal high achievement? Many politicians and educational reformers think the answer is yes. High-stakes standardized testing has become commonplace in American schools. This lesson will define high-stakes testing and accountability and present problems associated with these types of tests.Educational Reform and High-Stakes TestingIn America, many people, including politicians and educators, are calling for reform of education. Low achievement levels and a limited labor pool of skilled graduates (especially in the science, technology, engineering and math fields) stimulate talks of overhauling education. The solution, for some, is high-stakes testing. This lesson will define high-stakes testing and accountability and also discuss some problems associated with these tests. High-Stakes Testing DefinedHigh-stakes testing is defined as the practice of basing major decisions on individual student performance, school performance and school personnel on a single assessment. The most recent and well-known establishment of standardized high-stakes testing is the No Child Left Behind Act of 2001. The act requires states to develop standards and assessments for basic skills (such as reading and mathematics) and assess these skills annually. Federal school funding is tied to these assessment results. The No Child Left Behind Act of 2001 is covered in more detail in another lesson. High-stakes testing places pressure on schools and teachers to produce high test scores each year or face consequences such as reduced funding, salary restrictions and personnel termination. Administrators and teachers are held accountable for the students' performance in their classrooms and schools. Accountability DefinedAccountability is defined as an obligation of educators to accept responsibility for students' performance on high-stakes assessments. The No Child Left Behind Act of 2001 mandates some form of accountability in all schools for grades 3-8. Use of High-Stakes Testing ResultsHigh-Stakes testing results are regularly used: To determine yearly progress in meeting state-determined standards For promotion to the next grade level For awarding high-school diplomas Advantages of High-Stakes TestingAlthough high-stakes testing is a controversial subject among educators, it does have certain advantages: Tests are based on clearly defined standards and provide important information on students' performance growth and declines Tests can highlight gaps in an individual student's knowledge, classroom achievement gaps or school achievement gaps Tests may also motivate students to improve their performance, especially when test results are tied to high school diplomas and grade promotion Disadvantages of High-Stakes TestingThe controversy over high-stakes testing deals with how results are used and how reliable results really are in determining what a student knows and is capable of. Some disadvantages of high-stakes testing include: The tests may lead to inaccurate inferences of student performance, due to non-test factors, such as anxiety and motivation, of the test-taker Teachers and educators are burdened with more standards to teach and end up teaching to the tests (as opposed to more individualized curriculum to meet student needs) High-stakes testing does not assess higher-level critical thinking skills Since each state can determine standards, different test criteria may lead to different overall conclusions on student and school achievement and performance There is an emphasis placed on punishing lower-performing schools and personnel and not enough emphasis on helping those schools improve High-Stakes Testing GuidelinesNational organizations, such as the American Psychological Association, have established guidelines for the appropriate use of high-stakes testing in U.S. schools. The guidelines were set forth in order to promote fairness and avoid unintended consequences of high-stakes testing. The American Psychological Association recommends that decisions on a student's continued education should not be based on the results of one single test, but a comprehensive set of exams and performance assessments. They also say that if the results of one single assessment are used to determine a student's continued education, such as for grade promotion or graduation, there should be evidence that the test addresses the specific content and skills that students have had the opportunity to learn. When using high-stakes testing, school districts and states should only use test results as they are clearly defined. For example, if a test is supposed to determine graduation ability, those results should only be used for that specific purpose. And finally, special accommodations should be made for students with limited English proficiency, and likewise, for students with learning disabilities. Lesson SummaryIn summary, high-stakes testing involves making major decisions based on the results of a single assessment. States are mandated to have high-stakes testing, which is tied to state education funding, due to the No Child Left Behind Act of 2001. Individual teachers and schools are accountable and have an obligation to accept responsibility for students' performance on high-stakes assessments. High-stakes testing allows educators to gain information on how well a student performs annually and allows schools to track student and school growth (or decline) over time. High-stakes testing is scrutinized because results could lead to inaccurate reflections of students' actual abilities and knowledge due to non-test related factors. Also, teachers begin teaching to the test instead of building a curriculum around individual class needs when high-stakes testing is involved. National organizations have established guidelines for proper use of high-stakes testing, which include providing accommodations in special circumstances, only using results for indicated purposes and not basing students' continued education on the results of one single assessment.Assessments are used to gain useful information about test-takers' knowledge, skills and progress. Sometimes, however, the results of these assessments are incorrect due to biases. This lesson will differentiate and discuss types of testing bias and differences among test-takers that may lead to testing bias.Introduction to Test BiasSchool Principal: 'How can this be? These test scores range from well below average to excellent for one grade! These scores can't possibly be right. Maybe something happened during the administration of the test. Or maybe the test itself is flawed. I need to research what happened.' A test that yields clear and systematic differences among the results of the test-takers is biased. Typically, test biases are based on group membership of the test-takers, such as gender, race and ethnicity. 'Yes, it looks like the scores of some students were much lower than the scores for others. Also, I see a difference between ethnicity groups, too! This test must have some sort of bias.' Cultural BiasA test is not considered biased simply because some students score higher than others. A test is considered biased when the scores of one group are significantly different and have higher predictive validity, which is the extent to which a score on an assessment predicts future performance, than another group. Most test biases are considered cultural bias. Cultural bias is the extent to which a test offends or penalizes some students based on their ethnicity, gender or socioeconomic status. Types of Test BiasResearchers have identified multiple types of test bias that affect the accuracy and usability of the test results. Construct BiasFirst is construct bias. Construct bias occurs when the construct measured yields significantly different results for test-takers from the original culture for which the test was developed and test-takers from a new culture. A construct refers to an internal trait that cannot be directly observed but must be inferred from consistent behavior observed in people. Self-esteem, intelligence and motivation are all examples of a construct. Basing an intelligence test on items from American culture would create bias against test-takers from another culture. Method BiasAnother type of testing bias is method bias. Method bias refers to factors surrounding the administration of the test that may impact the results. The testing environment, length of test and assistance provided by the teacher administrating the test are all factors that may lead to method bias. For example, if a student from one culture is used to, and expects to, receive assistance on standardized tests, but is faced with a situation in which the teacher is unable to provide any guidance, this may lead to inaccurate test results. Additionally, if the test-taker is used to a more relaxed testing environment, such as one that includes moving around the room freely and taking breaks, then an American style of standardized testing administration, where students are expected to sit quietly and work until completion, is likely to cause difficulty in performance. Again, this could yield results that may be an inaccurate representation of that student's knowledge. Item BiasThe next type of bias is item bias. Item Bias refers to problems that occur with individual items on the assessment. These biases may occur because of poor use of grammar, choice of cultural phrases and poorly written assessment items. For example, the use of phrases, such as 'bury the hatchet' to indicate making peace with someone or 'the last straw' to indicate the thing that makes one lose control, in test items would be difficult for a test-taker from a different culture to interpret. The incorrect interpretation of culturally biased phrases within test items would lead to inaccurate test results. Language Differences and Test BiasIn addition to biases within the test itself, language differences also affect performance on standardized testing, which causes bias against non-native English test-takers. Non-native English test-takers may struggle with reading comprehension, which hinders their ability to understand questions and answers. They may also struggle with writing samples, which are intended to assess writing ability and levels. Bias and Test-Taker DifferencesBiases in testing also occur due to social, cognitive and behavioral differences among test-takers. Test-takers with cognitive or academic difficulties will often: Have poor listening, reading and writing skills Perform inconsistently on tests due to off-task behaviors, such as daydreaming and doodling Have higher than average test anxiety Test-takers with social or behavioral difficulties will often: Perform inconsistently on tests due to off-task behaviors Have lower than average motivation for testing Test-takers with delays in cognitive processing will often: Have slow learning and cognitive processing Have limited reading and writing skills Have poor listening skills Finally, test-takers with physical or sensory challenges will often: Have a tendency to get tired during a test Have less developed language skills Have poor listening skills Have slower learning and cognitive processing skills Educators should account for individual differences among test-takers when administering tests and using results to predict future performance and success. Lesson SummarySchool Principal: 'Wow. There are many factors to consider when looking at test bias. I can clearly see that there are significant differences among my test-takers, which means the test is biased. I know there are examples of construct bias, because the scores of our non-native English students were significantly different than native-English. There may have been method bias, due to the administration of the test. And I know there was item bias, because the test uses colloquial phrases, such as 'bury the hatchet,' within the items. I know now we should account for language differences among the test-takers and also consider social, cognitive and behavioral differences, as those may lead to test bias.'Assessments are excellent tools in the classroom. Used properly, they provide invaluable information about student knowledge and progress. However, if misused, assessments can misrepresent the actual knowledge and learning taking place in the classroom. This lesson will discuss the use and misuse of standardized assessments.Defining AssessmentThe school principal says: Many of you and our parents have asked about the use of assessments in our school. I wanted to share with you today a little information about the use and occasional misuse of assessments in the classroom and schools. As a review, let's define the term assessment. Assessment is the process of observing a sample of a student's behavior and drawing inferences about the student's knowledge and abilities. Today we will focus on standardized assessments. Standardized assessments are defined as assessments constructed by experts and published for use in many different schools and classrooms. These assessments are used in various contexts and serve multiple purposes, as we will discuss next. Use of Standardized AssessmentsStandardized assessments serve multiple purposes in the classroom, including: Showing educational accomplishments - Educators are able to track individual student and group progress year-to-year. Standardized assessments allow educators to compare groups of students by reporting results by student populations, such as grade levels, ethnic groups, and gender. These results allow educators to note areas of accomplishment by groups and subgroups and quickly identify any potential area of concern or weakness. Standardized assessments serve as motivational tools. Research shows that students study and learn more material when they are told they will be tested on it or held accountable for the material. They also serve as mechanisms for review. Assessments serve to promote constant review of material, which aids in moving the material from short-term to long-term memory in order to be accessed in the future. Standardized assessments also serve as feedback. Assessments provide opportunities for both the teacher and the student to receive feedback. Assessments provide feedback to a teacher about the student's general subject knowledge. For students, assessments provide feedback about areas they may need to focus on or areas in which they are proficient. Misuse of Standardized AssessmentsUnfortunately, standardized assessment results are misused. I want to make you aware of how these results are commonly misused in the classroom and by schools. First, evaluating schools based solely on standardized scores is a misuse of the assessment. It is common for standardized assessment results to be used by state and federal governing agencies to assess the quality and performance of schools. However, we should keep in mind that standardized assessments assess general concepts based on a common national curriculum. These assessments don't necessarily provide reliable data on what is taught in the individual classroom. Student achievement should be assessed in order to hold a school accountable, but there are many ways of assessing achievement, and standardized assessments are only one way to do that. Therefore, the use of standardized assessments as the only evaluation criteria of a school is a misuse. The evaluation of individual teachers based off of standardized assessments results is also considered a misuse by many researchers and educators. The students in a classroom for any given year are very diverse. Evaluating teachers based on the year-to-year progress of their classes, which change students each year, yields unreliable information about that teacher's ability to impart knowledge to her students. Giving students a grade based on the results of standardized assessments is considered a misuse. Standardized assessments test the general knowledge and skills of the students based on a generic curriculum. These results do not provide adequate information on the knowledge and skills the student has actually learned in the classroom, where the curriculum is defined by different objectives and standards. Standardized assessment publishers even point out this fact and note that standardized tests should not replace end-of-the-course assessments. Finally, using standardized assessment results to form classroom instructional decisions and modify curriculum is also considered a misuse. As stated above, standardized assessments test only a sample of knowledge and skills expected to be learned in each grade. Instructional objectives and decisions should be based off performance and non-standardized assessments in which students are assessed on specific objectives. Lesson SummaryI'll conclude by summarizing what we have discussed. First, assessments allow us to draw inferences about students' knowledge and abilities. Standardized assessments afford the same opportunities but are created by experts based on a more general curriculum. Because of the way standardized assessments are created, the results should not serve as the sole evaluation of a school or a teacher. The use of standardized assessment results to dictate instructional practices and for grading are also considered misuses. Fortunately, standardized assessments serve multiple purposes when used appropriately. They provide valuable information on longitudinal progress of students and student groups, they serve to motivate students to learn, they provide feedback to students, teachers, and parents, and they also serve as review mechanisms. Keeping what we have learned today in mind, standardized assessments are a valuable tool and should be used in combination with our other measures of student performance.An ecological assessment is one type of assessment that is used to help students that have special needs. In this lesson, we discuss ecological assessments, what they entail, and how they are used.Special Education and Ecological AssessmentsThis is Dave. Dave attends Cornerstone Elementary and is in the second grade. Dave's a pretty smart kid, but he has a behavioral disorder which can adversely affect his educational performance. For example, when Dave first started at Cornerstone, he sometimes acted very inappropriately in certain classes. However, now that certain adjustments have been made, Dave appears to be much happier overall and behaves more appropriately. This is partly due to the special education ecological assessment that was performed not too long after Dave started. Definition of Ecological AssessmentAn ecological assessment is a comprehensive process in which data is collected about how a child functions in different environments or settings. Sometimes, students eligible for special education perform or behave well in some environments but have difficulty in others. For example, at school, a student may be calm during class time but is always upset in the cafeteria. Our friend Dave was normally very well behaved during math and science class but could act very inappropriately during language and art. Other children even have school phobia, which is an irrational, persistent fear of going to school. These children seem fine at home but consistently become anxious, depressed, or scared every time they have to go to school. Information IncludedInformation for an ecological assessment is often obtained through observation. However, information can also be gathered through student records and interviews with the student and his or her family. Information for Dave's ecological assessment was first collected through observation by a specialist. She observed his behavior in all of his classes and breaks during school and even observed him at home and at the park. Additional information was then collected through interviews with Dave, his parents, and his teachers. The type of information collected in an ecological assessment includes (but is not limited to) information about the physical environment, patterns of behavior and activity, interactions between the authority figure and child, interactions between children, and expectations of the child by parents, teachers, and peers. An ecological assessment can help determine why the child functions differently in different settings. Maybe he or she misbehaves when the environment is too stimulating, or maybe the expectations of the authority figure are drastically different from one environment to the next. If you remember, Dave used to be very well-behaved during math and science class but could act very inappropriately during language or art class. Dave has a behavioral disorder, and his ecological assessment revealed that he tended to act up during the same time every morning, regardless if he was at school or not. So, his behavior was affected by the time of day instead of a specific environment. Having this information was crucial in determining and accommodating Dave's needs. Use in Individualized Education ProgramsIn fact, the completed ecological assessment was used to help create Dave's Individualized Education Program (IEP). An IEP is used in special education. It's a document that is drawn up and agreed upon by teachers, parents, specialists, and (if possible) the student. The document describes the assessment results and present achievement level of the child, then specifies goals for the school year as well as any support needed to achieve those goals or accommodate any special needs. An IEP for each child is required under the Individuals with Disabilities Education Act (IDEA). In other lessons we discuss the IDEA and IEPs in more depth. For now, let's go back to Dave's IEP. Dave's behavioral disorder makes him eligible for special educational assistance, which is why he received an ecological assessment. Once his IEP was created, it helped his support team - which consists of Dave's parents, teachers, and specialists - to work together to focus on things that could improve his educational performance. One goal on his IEP was to stabilize his behavior while at school. If you remember, Dave's ecological assessment revealed that he always acted up during a certain time every morning, regardless if he was at school. Long story short, Dave had been taking his mood stabilizer medication in the morning, and it doesn't take effect very quickly. It was determined that he should try taking his medication before bed, and once he started doing so, his behavior during the day stabilized. Now, Dave appears to be much happier overall, behaves more appropriately at school, and has greatly improved his educational performance. Lesson SummaryAn ecological assessment is a comprehensive process in which data is collected about how a child functions in different environments or settings. It is normally used as part of a special educational assessment. The data is usually collected through observation, interviews, and student records and is used to help determine why a child may act or perform well in some environments but have difficulty in others. Ecological assessments can reveal if the child's behavior is affected by the physical environment, interactions with other people, expectations by an authority figure, or something else that's unrelated to the environment. This helps a student's support team work together to help improve educational performance. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download