Brownerin.files.wordpress.com



Running head: EXCEL PROJECTExcel Project: Public school expenditures and It’s relationship to academic performanceErin BrownSeattle Pacific UniversityEDU 6976 Interpreting and Applying Educational Research IIFall Quarter, 2008Data was collected from public school districts within all 50 of the United States, using the 1997 Digest of Educational Statistics, an annual publication of the U.S. Department of Education. This data was then analyzed using numerous statistical measures and the results and interpretation of this analysis can be read as follows.Part I Histograms, Box Plots, and Frequency DistributionsThe data collected from the public schools of the 50 states includes: current expenditures per pupil in average daily attendance, the pupil/teacher ratio, the estimated annual salary of teachers, the percentage of all eligible students taking the SAT, the average total score on SAT, the average math score on SAT, and the average verbal score on SAT. We begin by looking at the distributions of the three continuous variables that represent the treatment of the independent variable. That is, what the money was spent on (expenditures, teacher ratio, and teacher salary). We can analyze these distributions by putting the data into histograms and box plots. Histograms show us the distribution of data. They allow us to view that data in a more visual way. Let’s take a look at Figure 1 below. This histogram shows us the current expenditures per pupil. You’ll note that the majority of students in the United States are listed in the 4 to 6 bin. This means that $4,000 to $6,000 is allotted to the students to be spent on the students. Figure 1 – Histogram of ExpendituresIf we take a look at the box plot for this same expenditure variable (Figure 2), it also gives us a visual display of the data, with the exception that this time it divides the data into equal fourths or quartiles. Each quartile is equal to 25% and, therefore, all four quartiles are equal to 100% of the data.Lower WhiskerLower HingeMedianUpper HingeUpper Whisker3.6564.881755.76756.4347.4690114300Figure 2 – Box Plot of ExpendituresFigure 2 shows us that the line to the left of the box is slightly longer than the line to the right of the box. When box plot is not centered within the range, it shows skewness. We can see that the distribution here is not quite normal. It skews slightly towards the positive or upper end of the distribution. If we glance back at the histogram, we can see the reason for this is the extremely high numbers in the fourth quartile.Figures 3 and 4 give us a picture of the average teacher salary across America. The box plot (Figure 4) shows that again the distribution is skewed. It’s positively skewed towards the upper end of the distribution. The skewness is also more prominent this time. Figure 3 - Histogram of Salary Figure 4 – Box Plot of SalaryThe histogram (Figure 3) goes a long was to explaining why this skewness occurs. Whereas, the majority of teachers in the United States are in the 30 to 40 thousand dollar range, there is one state that throws the distribution off by offering on the average a much higher pay.Let us move on to the ratio data which is much more interesting. This information refers to the average number of pupils (in the public elementary and secondary schools) per teacher. When looking at the box plot (Figure 5), it looks positively skewed. It’s not that impressive unless you take a look at the numbers. Lower WhiskerLower HingeMedianUpper HingeUpper Whisker13.815.22516.617.57520.20114300Figure 5 – Box Plot of RatioThe lower whisker is 13.8 while the upper whisker is 20.2. This means that some states average 13 students per teacher and other states can have an average as high as 20 students per teacher. Now, anyone who teaches in the classroom can tell you there is a huge difference in what can be accomplished with 13 students as opposed to 20 students. Also note that the median is listed as 16 students. Figure 6 – Histogram of RatioIf you take a look at the histogram (Figure 6), it gives you an obvious reason for this skewness. Only 2 out of all 50 states are in the higher classroom ratio bins. These outliers, or scores that are very high in comparison to others, have a direct effect on the mean. Because these outliers are in the upper end of the data, they cause the distribution to be skewed positively.Now that we’ve looked at the data concerning the treatment of independent variable and how the money was spent, let us now take a look at the outcome or dependent variables. They are also continuous as they tell us about the eligible students who take the SAT, the average verbal SAT score, the average math SAT score, and the total SAT score.First, we’ll look at the data for the average total SAT scores in the United States. Both the histogram (Figure7) and the box plot (Figure 8) show a slightly skewed distribution towards the lower end of scores. Figure 7 – Histogram of Total Average SATLower WhiskerLower HingeMedianUpper HingeUpper Whisker844897.25945.5103211070114300Figure 8 – Box Plot of Total Average SATThis positive skew in the distribution could be due to the mode since there are 18 states that average scores of 845 to 915. To give us more of an understanding, we need to see the breakdown of data per each portion of the test.Figure 9 - Histogram for Average Math SATFigure 10 – Histogram for Average Verbal SATAt first glance the histogram for the average math scores (Figure 9) seems to be the reason, due to the positive skewness. The average verbal scores (Figure 10) may seem a little inconsistent with low scores in the middle of the range, but overall show more strength than the math averages. Looking deeper into the box plots of each, math SAT (figure 11) and verbal SAT (figure 12); we see that verbal seems to be more of a problem. Lower WhiskerLower HingeMedianUpper HingeUpper Whisker443474.75497.5539.55920114300Figure 11 – Box Plot of Average Math SATLower WhiskerLower HingeMedianUpper HingeUpper Whisker401427.25448490.255160114300Figure 12 – Box Plot of Average Verbal SATThough the math box plot is positively skewed, the average scores on the verbal portion are much lower than the math. The lowest math average score is 443 yet the lowest verbal average score is 401. This is a good 40 point difference. The same could be said for the higher scores. The highest math average is 76 points higher than the highest verbal average. Thus, the lower overall average in the verbal scores has a greater effect on the total SAT average than the skewness of the math scores.Some of the most interesting data comes from the histogram (Figure 13) and the box plot (Figure 14) of the percentage of eligible students that take the SAT. The box plot shows us that only 0% to 20% of the students in 22 states who are eligible actually take the SAT. Now, is this due to the fact that only that many are eligible? Or is it due to the fact that most students in these states do not plan on extending their education past high school?Figure 13 – Histogram for Eligible Students Taking SATLower WhiskerLower HingeMedianUpper HingeUpper Whisker492863810114300Figure 14 – Box Plot for Eligible Students Taking SATI suppose that would lead to a completely different research study. What amazes me about this information is that we can safely assume from this data that at the very least 80% of the student population in these 22 states do not attend college. And what is even more incredible when looking at the box plot, is that the lowest percentage in the range is 4% (the lower whisker). Plainly speaking, 96% of the student population in that particular state has no intention of officially continuing their education at least in the collegiate world.Figure 15 – Bar Graph of RegionsThe last variable we need to consider is the categorical variable of the regions (Figure 15). Each of the 50 states is assigned to one of four regions: West (1), Midwest (2), South (3), and the Northeast (4). Since 50 cannot be divided evenly by 4, it’s safe to predict that the distributions will not be equal. This is important to keep in mind as this can have an effect on the next portion of this research paper when we begin to take a closer look on how certain regions differ in expenditures and the resulting test scores.Part II Comparisons Using ANOVABreaking down our sample further into the four regions (West, Midwest, South, and Northeast) may tell us more about their differences. To help us highlight these differences we use the Analysis of Variance or ANOVA process.ANOVA assesses whether the means of two or more groups are statistically different from each other. Though there are other tests designed for this, ANOVA is the best choice for our study due to its ability to analyze our four regions while controlling for Type I error. When the expenditures per pupil are broken down into the four regions (Table 1.2), we can see noticeable differences. The p-value of <.001 indicates that these differences are significant. When we conduct the Tukey test (Table 1.4), we can see that the differences are significant between the Northeast and West and the Northeast and Midwest. Comparing Expenditures by RegionCould this have something to do with the greater number of Northeastern states? After all, there are more states listed under the Northeastern region than those of the West and Midwest (Table 1.1). Though we should consider everything in this study, this doesn’t seem to be the case in this particular situation. The South has fewer states listed within its region than any of the others, yet no significant difference was determined when comparing it with the Northeast.The pupil/teacher ratio (Table 2.2) also indicates significant differences. Its p-value is lower than the recommended .05. Thus, we again further the process by using the Tukey HSD test (Table 2.4). This tells us that differences are especially significant between the Midwest and West as well as the Northeast and West. Comparing Ratio by RegionIf we isolate specific scores (Table 2.1) we can see the Midwest has one extreme score or outlier that could be affecting the results. Also, and even more important, we can see two overly inflated numbers among the West that could be the main reason for the differences among the regions. So far, our analysis has shown more differences in the Western portion of the country than anywhere else. Let’s continue on and see whether or not this trend continues. Comparing Salary by RegionThe ANOVA table for the average teacher salary (Table 3.2) again indicates a significant difference through its low p-value. The Tukey test (Table 3.4) pinpoints the exact location of that difference. The Midwest and Northeast have significant differences among the average teachers salary. Could this be due to the fact that the cost of living is lower (on average) within the Midwest? Or could this be due to the fact that the cost of living is higher in the Northeaster portion of the country? Also, if we take a look at the specific averages listed under these two regions (Table 3.1), we can see one extreme number or outlier among the Northeastern region. This inflated number may be the reason for the significant difference shown. Though significant differences have been shown on all ANOVA tables thus far, several adjustments could be made by individual states to level the playing field, so to speak. For example, if California and Utah could make an effort to lower their pupil/teacher ratios to no more than 20 students per instructor, there would be no significant differences among the four regions. These two states have the power to equalize the pupil/teacher ratio across America. We would then be better able to compare the equity in public school expenditures across America by focusing on the other, more relevant, expenditures.Now, let’s take a look at the percentage of all eligible students taking the SAT by region (Table 4.2). Again, the p-value is lower than .05 indicating significant differences. The Tukey test (Table 4.4) points out differences between the Northeast and both the West and Midwest. Comparing Students Eligible to take SAT by RegionSince the Northeast seems to be the common factor we look closer at those numbers (Table 4.1). The Northeast numbers seem to be extreme in several directions. There are two high percentages (80% and 81%) and two lower numbers (17% and 23%). Though not much can be done to the high percentages except to make sure they are accurate, the two states with the lowest percentages, Ohio and West Virginia, need to seriously focus on raising their paring Total Average SAT Scores by RegionLast, we look at the total average SAT scores by region (Table 5.2). The p-value is once more low and points to significant differences. The Tukey test (Table 5.4) shows more differences within this comparison then all the other previous tables. Differences among the Midwest and all the other regions (West, South, and Northeast) were significant. Taking a closer look at the Midwest (Table 5.1)which seems to be the common factor, we see that, with the exception of one state, their numbers are consistently higher. Are students just smarter in the Midwestern states?What’s truly fascinating is when you look at the average total SAT score of the Midwest region and see that it is 1048.08 (Table 5.3); and then compare it to the low percentage of eligible students taking the SAT, a mere 12.58% (Table 4.3). Thus, fewer students are eligible to take the test but those students who do take it are extremely bright (Charts 1 & 2). Why is this?Chart 1 – Pie Chart of Total SAT Scores by RegionChart 2 – Pie Chart of Students Eligible to take SAT by RegionAll these ANOVA tables have pointed to statistical significant differences within each continuous variable in comparison of the four assigned regions. But what of the practical significance? In order to answer these questions we need to find the effect size. This is done by calculating eta squared which is equal to the Sum of Squares Between Groups (SSb) divided by the Total Sum of Squares (SSt). η2 = SSb / SSt This is the simplest way of measuring the proportion of variance explained within an ANOVA. If the effect size is small (.01), we will know that although the statistical difference may be significant, the practical difference is rather small. Likewise, if the effect size is medium (.06) or large (.14) this also explains the practical difference as opposed to the statistical (p-value) difference (Table 6).SSbSStη 2 Practical diff.Expenditures33.301591.0048.42largeRatio107.355251.682.43largeSalary551.6781729.63.32largeEligibility for SAT17914.135095.1.51largeTotal SAT129849274308.47largeTable 6 – Effect SizeTable 6 shows a practical significant difference within all variables. It tells us that this significance is extremely large.There are a few assumptions to keep in mind when interpreting ANOVA data. First, we use interval data. This is true in our study. Second, samples should be randomly sampled. I think it’s safe to say our data came from a reputable source, the U.S. Department of Education. Third, the samples must be independent of one another. In our study what effects one group will not affect the others. Lastly, the Analysis of Variance must be of normal distribution. As we’ve discussed early in Part I of this paper, several of our distributions are slightly skewed and a few are most predominantly or significantly skewed. We need to keep this in mind when dissecting ANOVA data.Part III Scatterplots and Linear RegressionWhereas correlation allows for “better than chance” predictions, regression takes this a step further. Simple linear regression is a method for making such predictions. It can sometimes point us to a possible cause. To accomplish this we use scatterplots and regression equations. Scatter plots are used to find confidence intervals. The slope and intercept of the regression line help us with the regression equation. Figure 16 – Scatterplot of expenditures and SAT scoresIn Figure 16 we show the visual correlation between expenditures per pupil and the average total SAT score. The scatterplot lets us see the relationship of these two variables in the form of a line. This line passes through the scatterplot in a way that the average distance from the line for each point in minimized.Table 7 – Expenditures and SAT scoresIn Figure 16 we can see the elliptical cluster of corresponding points and the regression line are at a negative angle but only slightly so. Table 7.1 lists the slope at negative 20.907 and Table 7.2 tells us the intercept is 1089.41. These numbers are used to calculate the regression equation.Y = -20.907X + 1089.4We can use this formula to find the possible SAT score given a certain amount of expenditures. Unfortunately, there is no reason to make these calculations due to the coefficient of determination (Table 7.3). Since this is 0.1448, this tells us that only 14% variability in y can be explained by x. This seems to be an extremely weak relationship and not worth our time to explore further.When correlating the pupil/teacher ratio with the total SAT scores, we see zero correlation as represented by the Pearson r in Table 8.3. This means that as one variable changes, there is no related change to the other variable. In other words, as the pupil/teacher ratio changes, the average SAT scores have no related change. It is important to note that we’ve rounded our figures to 0%, even though there is a very slight change.Table 8 – Ratio and SAT scoresAgain, the regression equation (Figure 17) is not much use to us. The correlation is just too slight to be of any help.Figure 17 – Scatterplot of ratio and SAT scoresThis next set of data was the most interesting of all the simple linear regressions done in this study. The average teacher’s salary and the current expenditures per pupil were charted onto a scatterplot (Figure 18). We can clearly see the visual relationship in the form of a positive elliptical slope. With the exception of one noticeable outlier, the points are clustered closely on either side of the slope.Figure 18 – Scatterplot of salary and expendituresTable 9.1 tells us the slope is 3.79. Table 9.2 points out a 12.44 intercept. But what fascinates me the most is the coefficient of correlation (0.87) and the coefficient of determination (.76) shown in Table 9.3. This correlation, according to Guilford’s interpretations for values of r (Sprinthall, 2007), is strong and suggests a marked relationship. The coefficient of correlation tells us that 87% of the variability in teacher’s salaries can be explained by expenditures per pupil. Table 9 – Expenditures and salaryThis is where the regression equation (Figure 18) becomes extremely helpful. We can take this equation and can use it to predict what a teacher’s salary is most likely to be based on the amount of money spent per student (Table 9.4). Table 9.4ExpendituresEquationSalaryX3.792(X) + 12.436Y2.0003.792(2) + 12.43620.0203.0003.792(3) + 12.43623.8124.0003.792(4) + 12.43627.6045.0003.792(5) + 12.43631.3966.0003.792(6) + 12.43635.1887.0003.792(7) + 12.43638.9808.0003.792(8) + 12.43642.7729.0003.792(9) + 12.43646.56410.0003.792(10) + 12.43650.356According to our table, if a state were to spend on average two thousand dollars per student, the average teacher salary in that state would be $20,020. Likewise, if a state were to spend on average ten thousand dollars per student, the average teacher’s salary would be $50,356. Though this doesn’t answer our original concern about whether or not school spending and academic performance are statistically related, it would be helpful to any teachers interested in relocating to another state or college students interested in teaching as a possible career choice. But, of course, they need to keep in mind that though 87% of the variability can be explained by the expenditures, there is another 13% that cannot be explained by this.Figure 19 – Scatterplot of ratio and salaryThe last bivariate scatterplot between pupil/teacher ratio and average teacher salary (Figure 19) is a great example of zero correlation. The regression line is parallel to the x axis. With the exception of a few outliers, the points are scattered further apart and in a more circular shape. Table 10 – Ratio and salaryThe slope (Table 10.1) is -.003, the coefficient correlation (Table 10.3) is -0.00, and the coefficient of determination is 0.00. Basically, this tells us that there is little to no relationship between these two variables.The only linear regression that showed a statistical significance in the relationships of the variables was salary and expenditures. And, though this can be useful in many situations, I don’t see how this can help us prove or disprove that school spending and academic performance are statistically unrelated. I believe the fact that we didn’t find significant correlations in the other scatterplots tells us much more. ConclusionsNone of the findings in this research study have done anything to alter the media’s claim that school spending and academic performance are statistically unrelated. The histograms brought to light several other concerns, such as low verbal SAT scores across the U.S and the alarmingly low number of students taking the SAT in the Midwest. But these concerns cannot be blamed on low expenditures. The ANOVA analysis showed statistically and practically significant differences regionally, yet wouldn’t that be expected in a country so large and with so many blended cultures? The linear regression data showed no correlations concerning academic achievement (SAT scoring).If this research was to be done a second time I would suggest that the 50 states be broken into 5 regions evenly. This would cancel out any concerns regarding disproportion. I’d also suggest a different form of testing. One standardized test to be used across the country and to be mandatory for all high school seniors. This would give a true picture of how much is being learned. The SAT is only taken by those pupils wishing to attend college and that excludes a large portion of the student body across America. If an alternate test cannot be found, another possibility is to make the SAT mandatory to all high school seniors and wave the fee charged for taking it the first time around. Although several suggestions have been made regarding this cross-country research, I would be more interested in this study being done exclusively statewide. The Washington state WASL test could give us a wonderful standard per grade level and the state could be broken down by regions. All students are already required to take this test so no extra testing would be required. All the other information, ratio, salary, etc., is already on file due to the fact that it had already been collected for the nationwide study. I believe focusing this research on a smaller scale to one individual state would be much more effective and worthwhile. BibliographyNational Center for Education Statistics (1997). Digest of Education Statistics. U.S. Departmentof Education, 1997. Retrieved from , R.C. (2007). Basic Statistical Analysis (8th ed.). Boston, MA: Pearson Education, Inc.,2007. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download