Amazon Web Services



February 2016Appendices to “Developing Instruments to Assess and Compare the Quality of Engineering Education: the Case of China and Russia”E. Kardanova, P. Loyalka, I. Chirikov, L,Liu, G. Li, H. Wang, E. Enchikova, H. Shi, N. JohnsonAppendix A. Analytical approach: technical detailsOne of the intentions of the pilot study and subsequent analysis was to create shortened final tests that can be used in future studies by selecting only the items with the best psychometric properties from our pilot tests. In particular, while the pilot tests were 55 minutes each, the research team sought to cut the length of each subject test to 40 minutes for the final versions. We therefore included more items in the pilot tests than we needed for the final tests. We also gave more time to the students during the pilot study so that we would be able to delete some of the items from the tests due to poor psychometric quality—that is if they did not fit the IRT model or had low discrimination. As a discrimination index we used the correlation between examinees’ responses to the item and their ability levels. As for the threshold to detect the items with low discrimination, we used a value of 0.2, which is usually used for this purpose in similar studies (Crocker and Algina 1986). We expected that the final number of items for the main study would be 35–40 items for each subject test.To measure the extent to which the data fit the Rasch model, we used the unweighted and weighted mean square statistics provided by Winsteps (in terms of Winsteps output: OUTFIT MNSQ and INFIT MNSQ, respectively). These statistics rely on standardized residuals, which represent the differences between the observed response and the response expected under the model (Wright and Stone 1979). Generally, a criterion of +1.2 for these statistics is used to flag potential problems with misfit. To test for DIF across countries and across grades we used the ETS approach for DIF classification (Zwick et al, 1999), which designates items as A (negligible or nonsignificant DIF), B (slight DIF), or C (large DIF) items depending on the magnitude of the Mantel-Haenszel statistic (Dorans 1989) and its statistical significance. An item was considered a C item if two conditions were satisfied: (1) the difference in item difficulty between different groups of students was more than 0.64 logits, and (2) the Mantel-Haenzel statistic had a significance level of p < .05 (Linacre, 2011). Only C items were considered as items with DIF in this study. To examine the dimensionality of each scale we conducted a principal components analysis (PCA) of the standardized residuals (Linacre 1998; Smith 2002). Theoretically, if all the information in the data is explained by one latent variable, the residuals would represent random noise and would be independent of each other. As a consequence, correlations between the residuals would be near zero. If there is no second dimension within the data, then a PCA of the standardized residuals should generate eigenvalues all near one and the percentage of variance across the components should be uniform (Ludlow 1985).To analyze reliability, we used the person reliability index provided by the Rasch analysis (Stone 2004). The separation index compares the distribution of student measures (estimates of ability) with their measurement errors and indicates the spread of student measures in standard error units. The index can be used to calculate the number of distinct levels, or strata separated by at least three errors of measurement, in the distributions (Wright and Stone 1979; Smith 2001). The number of strata are calculated as: Strata=(4G+1)/3, where G is the separation index. At least three different strata are recommended (for example, low, middle and high ability levels).In order to show the relative distribution of item difficulty and students’ scores in a common metric, we constructed the variable map (Wright and Stone 1979). For equating between grades, we used a separate calibration design with anchoring items from one test when calibrating the other test (Wolfe 2004). After equating, we evaluated the quality of the link between the grade 1 and grade 3 tests by calculating the item-within-link statistic (Wright and Bell 1984). Under the null hypothesis of the items exhibiting perfect fit within the link, this statistic has an expected value of 1. Link adequacy was also evaluated by determining the stability of the item difficulty estimates across the grade 3 test with and without anchoring. To do that, we calculated the correlation between the item difficulty estimates. Appendix B. The results of the psychometric analysis for grades 1 and 3 physics tests The results for the physics tests are substantively similar to the mathematics tests and will be presented briefly because of space limitations. Five and six items were deleted from the grade 1 and grade 3 physics tests respectively because of poor psychometric quality. For further analysis we thus considered the sets of 40 and of 39 items for the grade 1 and grade 3 tests respectively. Of the 40 items for the grade 1 physics test, 17 demonstrated country-related DIF: 9 items in favour of China and 8 in favour of Russia. The other 23 items were DIF free and could be used for linking between the two countries. As for the grade 3 physics test, 16 demonstrated country-related DIF: 8 items in favour of China and 8 in favour of Russia, while the remaining 23 items were DIF free. As with the mathematics tests, we used DIF-free items in each test for linking between the two countries. Items that were not considered good for linking were deleted at earlier stages, either because they did not have good psychometric qualities when they were analyzed for inclusion in a particular test or because they exhibited DIF for at least one test.Further analysis showed that all items of the grade 1 and grade 3 physics tests had good psychometrics characteristics, fit the model, and could be considered as essentially unidimensional. The person reliability and the person separation index were 0.83 and 2.03 for the grade 1 physics test and 0.77 and 1.84 for the grade 3 physics test, indicating three statistically distinct groups of students along the continuum for each test. The variable maps for the physics tests are presented in Figures B.1 and B.2. Figure B.1. The Physics Grade 1 Test Variable MapFigure B.2. The Physics Grade 3 Test Variable MapAppendix C. The Content Areas of the Math and Physics TestsThe experts were asked to rate the items according to four criteria: (1) comprehensibility of wording, (2) appropriateness in measuring the content area of interest, (3) difficulty, and (4) expected time required to answer. The experts found 13.4% of the physics items and 7.5% of the math to have problems in clarity of wording (meaning 25% or more of the experts marked the item for clarity issues). We reviewed all the items that experts marked as having clarity issues and rectified the items that had simple and obvious wording problems and deleted the other unclear items from our test item bank. The experts ranked (2), (3) and (4) on continuous scales (e.g. 1-4). We gave priority to the items that experts deemed to be most appropriate in measuring the content of interest. We also selected items spread across a range of difficulties and avoided items that experts thought would take students too long to complete. Consequently, we selected less than half of the original item pool for use in the clinical pilot. Out of a pool of 179 physics items and 174 math items we initially collected, we selected 80 physics items and 80 math items to be used in the clinical pilot respectively. In selecting these items, we also took care to keep balance across weighted content areas.Table C.1: The Content Areas of the Math Grade 1 TestNumberTopicFrequency%1Derivatives and their application718.92Equations718.93Functions and domains513.54Inequalities38.15Mathematical reasoning and logic513.56Single Variable Differentiation410.97Trigonometric functions and equations616.2Total37100Table C.2. The Content Areas of the Math Grade 3 TestNumberTopicFrequency%1Derivatives and their application37.72Equations12.63Functions and domains12.64Inequalities12.65Linear Algebra512.86Mathematical reasoning and logic25.17Multivariate Differentiation615.48Ordinary differential Equations12.69Probability and statistics37.710Series25.111Single Variable Differentiation717.912Single Variable Integration512.813Trigonometric functions and equations25.1Total39100Table C.3. The Content Areas of the Physics Grade 1 TestNumberTopicFrequency%1Circuits513%2Electromagnetic fields615%3Electromagnetic induction718%4Mechanical energy13%5Motion and forces38%6Optics615%7Oscillation and mechanical waves615%8Dynamics – Mechanics13%9Electricity and Electric Fields25%10Magnetism and Magnetic Fields25%11Waves and Oscillation13%Total40100%Table C.4. The Content Areas of the Physics Grade 3 TestNumberTopicFrequency%1Circuits25%2Electromagnetic fields25%3Electromagnetic induction615%4Motion and forces13%5Optics718%6Oscillation and mechanical waves25%7Dynamics - Mechanics25%8Electricity and Electric Fields513%9Magnetism and Magnetic Fields615%10Relativity and Quantum Physics25%11Waves and Oscillation410%Total39100%Appendix D. Selection of MajorsIn order to meaningfully compare learning gains across institutions and across countries in this field, it is necessary to limit the sample of students to those that have overlapping course requirements. In both China and Russia, the major-categories selected for inclusion in this study (Electrical Engineering and Computer Science) do not correspond to discrete majors but rather to “categories” of majors that have varying course requirements. We therefore took these steps to further limit our sample to students who are enrolled in similar majors within the categories of EE and CS, i.e. majors that are similar in terms of course requirements and curriculum. Through finding overlapping courses between majors and countries, our goal was to limit our sample to EE and CS majors whose students experience a common set of curricular experiences that are most relevant to the EE and CS categories.Unlike China and Russia, US doctoral-research institutions (the institutional equivalent of our China and Russian sample) typically have only one major called EE and one major called CS. We collected curricula information from ten doctoral-research institutions (Stanford, Cornell, University of Washington, Vanderbilt, Virginia Tech, Boston University, Kansas State University, University of Kentucky, Wayne State University, Marquette University) and constructed a list of required courses common to all the institutions in EE and CS respectively. While it is true that this does not fully account for the heterogeneity in the American higher education system, we believe that looking for overlap with these American majors allows for us to state with much greater confidence that we have developed our assessments to be of relevance to EE and CS students across international contexts. In China, we also collected curricular information on all EE and CS majors from 10 Chinese universities (both elite and non-elite) and constructed a list of common required courses for each major within country. In Russia, we obtained the national curriculum for each EE and CS major. Finally, we compared the required course lists for the U.S., China and Russia, and dropped the majors that did not require the full list of required courses used in American CS and EE departments. Through this process, we selected EE and CS majors whose students have similar curricular experiences within country and across countries. Since the common curriculum overlaps with that of the US, we can also have reasonable confidence that the curricula we used for test development bears relevance to the field of EE and CS in general and are not just reflections of the peculiarities of the higher education systems in China and Russia.Appendix E. Chinese Curricular Standards Although China’s Ministry of Education does not publish official national curriculum standards for engineering education, it approves a finite list of textbooks that reflect the math and physics content that should be taught in engineering programs in Chinese universities. We compared the MOE-approved textbooks against one another and found that the main content areas were essentially the same across all of these approved textbooks. We then based our content map for China on the (almost entirely) overlapping content areas between the textbooks, as this constitutes the de facto national curriculum for engineering students in China. The full list of textbooks is included immediately below.Zhong, X. 2013. Eleventh Five-Year National Plan General Higher Education Text Book—General Physics Curriculum: Mechanics (2nd Edition). PutongGaodengJiaoyuShiyiwuGuojiaGuihuaJiaocaiDaxueWuliTongyongJiaocheng: Lixue (2nd Edition). Peking University Press.Liu, Y. 2013. Eleventh Five-Year National Plan General Higher Education Text Book—General Physics Curriculum: Thermodynamics (2nd Edition). PutongGaodengJiaoyuShiyiwuGuojiaGuihuaJiaocaiDaxueWuliTongyongJiaocheng: Rexue (2nd Edition). Peking University Press.Chen, B., and J. Wang. 2012. Eleventh Five-Year National Plan General Higher Education Text Book—General Physics Curriculum: Electro-magnetism (2nd Edition). PutongGaodengJiaoyuShiyiwuGuojiaGuihuaJiaocaiDaxueWuliTongyongJiaocheng: Lixue (2nd Edition). Peking University Press.Chen, X., and X. Zhong. 2011. Eleventh Five-Year National Plan General Higher Education Text Book—General Physics Curriculum: Optics (2nd Edition). PutongGaodengJiaoyuShiyiwuGuojiaGuihuaJiaocaiDaxueWuliTongyongJiaocheng: Guangxue (2nd Edition). Peking University Press.Chen, X., and X. Zhong. 2011. Eleventh Five-Year National Plan General Higher Education Text Book—General Physics Curriculum: Modern Physics (2nd Edition). PutongGaodengJiaoyuShiyiwuGuojiaGuihuaJiaocaiDaxueWuliTongyongJiaocheng: JindaiWuli Peking University Press.Tongji University Department of Mathematics. 2007. Twelfth Five-Year National Plan General Higher Education Text Book: Advanced Mathematics (part 1) (6th edition). ShierwuPutongGaodengJiaoyuBenkeGuojiajiGuihuaJiaocai: GaodengShuxue. Higher Education Press.Tongji University Department of Mathematics. 2007. Twelfth Five-Year National Plan General Higher Education Text Book: Advanced Mathematics (part 2) (6th edition). ShierwuPutongGaodengJiaoyuBenkeGuojiajiGuihuaJiaocai: GaodengShuxue. Higher Education Press.Tongji University Department of Mathematics. 2007. Twelfth Five-Year National Plan General Higher Education Text Book: Engineering Mathematics and Linear Algebra (5th edition). ShierwuPutongGaodengJiaoyuBenkeGuojiajiGuihuaJiaocai: GongchengShuxueXianxingDaishu Higher Education Press.Sheng, J., S. Xie, and C. Pan. 2010. Eleventh Five-Year National Plan General Higher Education Text Book: Probability and Mathematical Statistics (4th edition). PutongGaodengJiaoyuShiyiwuGuojiaGuihuaJiaocai: Gaolvlun Yu Shuli Tongji. Higher Education Press.Appendix F: Test Item SelectionTest items were selected from each country’s past university entrance exams (China’s Gaokao and Russia’s Unified State Exam), other standardized exams, and widely-used exercise books in both countries. All of these items were multiple choice items and all were taken from sources that are used widely in each country and targeted at a national population of students similar to the students in our sampling frame. A number of items were taken from Russian materials. In Russia, we took test items from standardized exams in math and physics that were provided by the Institute for Monitoring the Quality in Education – the country’s primary quality assessment agency for higher education. The items for the grade 1 tests were very similar to those used on the Russian Unified State Exam, which is the mandatory college entrance exam that all students must take if they seek entry to higher education institutions. The items for the grade 3 tests were based on the Russian Federal State Standard in math and physics (mandatory part of the curriculum for most Russian higher education institutions).Additional test items were taken from Chinese materials. In China, grade 1 test items were taken from the college entrance examination (gaokao), a nation-wide standardized examination of high school learning that determines college entry for the vast majority of students. Test items for grade 3 math and physics came from official Chinese exercise books that are on the list of approved curricular materials provided by the Ministry of Education for university use (see Appendix D for details).ReferencesCrocker, L., and J. Algina. 1986. Introduction to Classical and Modern Test Theory. New York: Holt, Rinehart, and Winston.Dorans, N.J. 1989. “Two New Approaches to Assessing Differential Item Functioning: Standardization and the Mantel-Haenszel Method.” Applied Measurement in Education 2 (3): 217-233.Linacre, J.M. 1998. “Detecting multidimensionality: Which residual data-type works best?” Journal of Outcome measurement 2: 266-283.Ludlow, L.H. 1985. “A strategy for the graphical representation of Rasch model residuals.” Educational and Psychological Measurement 45 (4): 851-859.Smith, E.V. 2001. “Evidence for the reliability of measures and validity of measure interpretation: A Rasch measurement perspective.” Journal of Applied Measurement 2: 281-311.Smith, E. V. 2002. “Understanding Rasch measurement: Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals.” Journal of Applied Measurement 3(2): 205-231. Stone, M.H. 2004. “Substantive scale construction.” In Introduction to Rasch measurement, edited by E.V. Smith and R.M. Smith, 201-225. Maple Grove, MN: JAM Press.Wolfe, E.W. 2004. “Equating and Item Banking with the Rasch Model.” In Introduction to Rasch measurement, edited by E.V. Smith and R.M. Smith, 366-390. Maple Grove, MN: JAM Press.Zwick, R., D.T. Thayer, and C. Lewis. 1999. “An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis.” Journal of Educational Measurement 36(1): 1-28. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download