Florida Department of Education



THE STATE OF FLORIDA

Moderator: Kathy Hebda

December 4, 2012

8:30 a.m. ET

Operator: Good morning. My name is (Andrea), and I will be your conference operator today.

At this time, I would like to welcome everyone to the Student Growth – I'm sorry. I would like to welcome everyone to the Student Growth Implementation Committee Meeting. All lines have been placed on mute to prevent any background noise.

After the speakers' remarks, there will be a question-and-answer session. If you would like to ask a question during this time, simply press star then the number 1 on your telephone keypad.

I would now turn the call over to Ms. Hebda. You may begin.

Kathy Hebda: Thanks very much.

Good morning, everybody. This is Kathy Hebda with the Florida Department of Education. I'd like to welcome all of our members of the Student Growth Implementation Committee back. It's been a while since we've met together. I'd also like to welcome members of the public who may be listening to the call.

And just for everyone's information, I know we're probably used to this but I'll say it again just to be sure, all members of the committee and everybody who's presenting this morning, their lines are open and you're free to speak at any time and ask questions.

Members of the public have the ability to listen to the meeting only, so there's not a question-and-answer period for the public. This is just a committee meeting and committee work going on today. But we do appreciate everybody's interest in what goes on in this committee.

What I will do next is go over the agenda which is what you see on your screen right now. If for some – before I go to the agenda – excuse me, let me take the roll, then we'll go over the agenda and then we'll get right into our meeting work.

I would like to thank very much Ronda Bourn who has agreed to be our chair. Thanks, Ronda.

Sam Foerster was a great chair, and he is now a great deputy chancellor for School Improvement and Student Achievement with the Department of Education, so he is no longer able to serve on the committee because the committee is – doesn't have any members in the department on it. And Ronda has agreed to be our chair, so thanks very much, Ronda.

Ronda Bourn: You're more than welcome.

Kathy Hebda: Thanks. OK. I'm going to call the roll to make sure we make folks here.

Stephanie Hall?

Stephanie Hall: Here.

Kathy Hebda: Lisa Maxwell?

Lisa Maxwell: Here.

Kathy Hebda: Nicole Marsala?

Nicole Marsala: Here.

Kathy Hebda: Gisela Field?

Gisela Field: Here.

Kathy Hebda: Sandi Acosta?

Sandi Acosta: Here.

Kathy Hebda: Tamar Woodhouse-Young?

Tamar Woodhouse-Young: Here.

Kathy Hebda: That's Tamar. That's Tamar. I know you're on there. I talked to you earlier.

Tamar Woodhouse-Young: Present.

Kathy Hebda: Great. Thanks.

Lavetta Henderson?

Anna Brown?

Anna Brown: Here.

Kathy Hebda: (Dorothea) Edgecomb?

Doretha Edgecomb: Doretha, here.

Kathy Hebda: Doretha, excuse me. (Inaudible). Thank you.

Lori Westphal?

Lori Westphal: Here.

Kathy Hebda: Joseph Camputaro?

Joseph Camputaro: Here.

Kathy Hebda: Gina Tovine?

Stacey Frakes?

Stacey Frakes: Here.

Kathy Hebda: Latha Krishnaiyer?

Lawrence Morehouse?

Lawrence, I saw you logged in on the webinar. I don't know if you made it on the phone yet.

Lawrence Morehouse: Yes, I'm here.

Kathy Hebda: Great.

Lawrence Morehouse: Can you hear me?

Kathy Hebda: I sure can.

Lawrence Morehouse: Thank you.

Kathy Hebda: Linda Kearschner?

Linda Kearschner: Here.

Kathy Hebda: Maria Noya?

Maria Noya: Here.

Kathy Hebda: Lance is not able to make it today. And then Jeff Murphy?

Jeff Murphy: Good.

Kathy Hebda: And we have a new member. Brandon McKelvey has replaced Sam Foerster as a district representative. And Brandon is from Seminole County so we welcome Brandon this morning as a new member of Student Growth Implementation Committee.

OK. And, Christy, are you on the line? Christy Hovanetz?

Christy Hovanetz: Yes, I am here, Kathy.

Kathy Hebda: And Harold Doran?

Christy Hovanetz: I just talked to him. He'll be here shortly.

Kathy Hebda: OK. All right. Well, Harold doesn't go first. That's all right.

Harold Doran: I am here. I was just taking a minute to dial in.

Kathy Hebda: Oh, very good. Good morning, Harold and Christy.

OK. Members, let's look at your agenda which is up on your screen. If you're not logged into the WebEx for some reason, if you're traveling if you're using the PowerPoint that we've sent to you, it's the second PowerPoint – the second page on the second PowerPoint.

Now that we've done the roll call, we're going to review the agenda. If you can see, the first thing we're starting with are a couple of status reports. Since we haven't met in a few months, we wanted to give you an update on how things have gone for '11-'12 before we launch into the next phase of the work and that's where the first set of reports is.

And the second thing we'll do is talk about the optional VAM, optional growth models that the AIR has been working on so far based on your last direction to them and end-of-course models and what the progress has been in both of those areas.

One of the things that you'll talk about some today, members, as you go look at the impact data from '11-'12 on the FCAT's value-added model and as you start to explore the results of the optional models that are being created for districts to determine if they want to use or not for some of the local assessments and other statewide end-of-course assessments is you're going to see that models are not necessarily going to be identical, not every model is going to look just like the FCAT model.

And one of the things that you'll have a chance to look at and discuss and do some results on is why that might be and why it might be well for you to consider selection of a model that's not identical to the FCAT model since it may work better with a different assessment and a different set of data and a different set of teaching circumstances. So don't let that bother you today as we go through different kinds of things.

We expected this in the very beginning. One of the things we learn from AIR in that very first meeting was that your model needs to fit your assessment and your circumstances, so you're going to be presented with an information today that will let you explore those kinds of things.

So are there any questions before we begin with the first presentation? OK. Hearing none, I'm going to queue up that presentation, and I'm going to turn it over to Juan Copa.

Juan Copa: Good morning. I'm glad to be with you all again. It's been a while since we last met.

We're spending the first part of today's meeting to recap the impact analyses that were run in conjunction with the release of the 2011-12 value-added results for the FCAT model that you all are very instrumental in developing beginning last spring of 2011.

We're on the PowerPoint. If you're not following on the WebEx, we're on the PowerPoint that relates to impact data. These PowerPoint that has roughly 16 slides just for your – just to orient you to which PowerPoint we're looking at. And we will be discussing, starting on slide 3, three different analyses that were done. The first looks at the model fit of the 2011-12 run versus the prior year run.

This was done because, remember when we – when we did all our development work beginning in the spring of 2011, we were doing that development work based on the data we had at the time which was FCAT – original version of FCAT data.

As many of you know, we moved to FCAT 2.0 beginning with the 2010-11 assessment and now into '11-'12, so this would be the first year that '11-'12 results, where we will be using FCAT 2.0 for both the current year score as well as the most immediate prior year score.

So one of the first things that AIR did was to look at the model set to make sure the model was behaving the same basically similarly with the FCAT 2.0 data as it did with the FCAT data. We had every expectation it would given that they are very similar exams in terms of their structure. They're both comprehensive exams administered from grade level to grade level each year.

We spend a lot of time beginning the spring of 2011 identifying variables that we believe would have an impact on the student learning on those assessments, and we would have every reason to believe those same types of variables would impact FCAT 2.0 just as they impacted FCAT and so this was a – that was our hypothesis going in and we ran analysis to confirm, in fact, that the model would behave the same way with the FCAT 2.0 data.

And we also ran analysis comparing the results by grade to determine whether or not there is any advantage within the scores by grade level. Of course, you remember the model is run specific to subject and grade, so we'll look at those analysis as well. And then we'll close with what I think are the most powerful analysis that we ran to really show whether or not value-added analysis is operating as we had hoped it would given that we know this has been a very challenging selection, a very complicated model that we started talking about back in the spring of 2011.

But we went this route. This committee chose to go that route as well because of the promise of value-added analysis that it does level the playing field as best as possible given different characteristics of classrooms and schools across our state and within our districts. And so this last analysis really goes at whether or not that is, in fact, the case looking at the value-added scores in relation to various classrooms characteristics.

We're onto now slide 4. And slide 4 and 5 basically show the R-squared, if you remember back to our discussion last spring. This is the statistic that looks at the model fit. How much – what percentage of the – of the result as explained by the model itself?

And as you can see here, there we have the series of bars, where the green bar represents the R-squared on the 2012 model and the – what is – I don't know, a mustard yellow, orange, tan, some sort of odd color is the 2011 model R-squared. And as we can see – as you can see basically for both slide 4 and 5 slide, which slide 4 is reading, the R-squares are comparable. They are basically the same.

And if anything, the model on the 2012 data, the R squares are a bit higher in some cases indicating nominally a better fit whether that's statistically significant better fit. I don't think so, but it's nominally basically the same results. The model is behaving as expected with FCAT 2.0 just as it behaves with FCAT, the original FCAT version 1.

Now the next slide, slide 6 and 7 look at the results by grade level. These are those box-and-whisker charts that were presented – that we saw when looking at the different (precision) of the model option back in the spring. But they still – we see here is that the models behave fairly similarly across grade levels both for Reading and, on the next slide, for Math. There does not appear to be any particular advantage in VAM scores based on grade level.

The next series of slides are really the ones that we feel are very important especially in terms of like I mentioned whether or not value-added is behaving as we would expect. We have gone down this road of the statistical method, so where the promise of leveling the playing field is best as possible. Now we want to see whether or not that is indeed the case.

And so the next series of slides beginning with the scatter plots showing the relationship between the VAM score and students – the percent of the students with disabilities within a teacher's classroom starts that analysis. In this first set of scatter plots, let me walk you through these scatter plots, each dot on the plot represents a teacher VAM score, and this is a statewide data.

And so across the vertical axis that you see the VAM scores ranging, of course, zero meaning basically typical positive score being performance above typical and negative score indicating performance below typical. And on the horizontal axis, you see the percent of a teacher's classroom that is contained with students with disabilities.

And as you can see here and across on the scatter plot – you can make it out – there is across – pretty much across the zero line a straight flat line. And that flat line indicates the relationship between the VAM scores themselves and the percent of a teacher's classroom that has disabled students – that are disabled students. As you can see there's basically no relationship.

In other words, regardless of the percentage of the teacher's classroom that is disabled, the teacher has basically an equal opportunity receiving a high value-added score or a low value-added score. There is basically a zero correlation.

Next slide. Same type of analysis, but this is now in relation to the percent of English language learners within a teacher's classroom. And as we can see as we can make out the line for both reading and mathematics, it's basically again a flat line. Regardless of the percent of a teacher's classroom, that is English language learner, there is basically an equal opportunity of earning a high value-added score or a low value-added score.

The next slide is a very important slide because this is, remember the factor that we could not control for explicitly within the model given the statutory parameters. This is the percent of the teacher's classroom that is (on-frame or dislaunched). This is an indicator of socioeconomic status that was strictly prohibited from inclusion within the model based on the statute.

And as you can, of course, though the wall restricted us from including that explicitly in the model, it didn't restrict us from performing analysis to see if that whether or not it makes an impact. And we did run those analyses. They are who did perform those analyses, and again the result is the same as what we share with disabled students and English language learners. Basically, there is no relationship between the percent of a teacher's classroom that is (on-frame or dislaunched), and the teacher's VAM score. Again, there is basically that flat line.

Next slide is percent gifted, basically the same results.

And the next slide here is another factor that was strictly prohibited from the inclusion within the model for the statute. This is the percent of – this is race/ethnicity. This is the percent of a teacher's classroom that is non-white. And as we can see, very similar results of what we've seen with the other slides. It looks the scatter plot was very similar to the previous (launch status) chart. And as you could see, the flat line again basically no correlation between the percent of a teacher's classroom that is minority and the teacher's VAM score.

And the last piece here is the main prior performance of a classroom, basically the entering capabilities of the students based on their test scores versus the VAM score. And again, here you see basically a zero correlation.

If we had seen a correlation in terms of any of these characteristics, you would have seen the scatter plot either tilted in one direction or the other. This particular chart shows the scatter plot. Again, this is a bit of a psychological exam, but we can look at this like ink blots, and this looks to me like a football. Again, we just had this Tuesday or Monday night a football special in my mind.

And so you have here a scatter plot that looks like a football. If there was indeed a relationship, this football would be tilted at one direction or the other end. Basically, you don't see that here. You see a flat football reflected with a flat line.

Next slide.

And this – for those that are familiar with statistics, these are actually the correlation, the actual correlations behind those scatter plots. And basically, a zero correlation indicates basically no relationship. A positive one would be perfect correlation and a positive direction; a negative one would be perfect correlation and a negative direction, meaning that, for example, the negative one would be – a high VAM score would be related to low socioeconomic status, for example. That would be a perfect negative correlation. We do not see that with these analyses. We see correlations basically hovering at around zero, reflecting the lack of a relationship between the teacher VAM scores, in this analysis, on FCAT and those various teacher classrooms as listed.

So we believe these slides are very powerful. This is evidence to the fact that the route we took with all its complicated nature in terms of statistics does at least in this instance here with the FCAT model show evidence of the promise of value-added analysis of that level in the playing field across the diversity of our classrooms in relation to the scores themselves.

And with that, that concludes the review of the impact analysis. I'll now turn it over to Kathy to continue through the agenda. Thank you again for serving on this committee, for being here again on short notice as we change to a virtual meeting as opposed to a face-to-face meeting, never optimal, but I really appreciate it.

As Kathy mentioned, I've now transitioned into a new role within the department. And my schedule just didn't allow for a face-to-face meeting. I have to go before the House of Representatives later this afternoon within my new – within my new role. So again, I appreciate your flexibility as we meet here this morning virtually and, of course, look forward to having a face-to-face meeting in the near future. Thank you again.

Kathy Hebda: Members, this is Kathy. Before we go to the other two status reports, does anybody have any questions about the things they have just seen or do you need to see (inaudible) those slides again?

Lisa Maxwell: Kathy? (Inaudible).

Kathy Hebda: Can you speak out just a little bit for me?

Lisa Maxwell: Yes, this is Lisa Maxwell.

Kathy Hebda: Yes, OK.

Lisa Maxwell: (Inaudible) an experience that we are having here in Broward County. Can you hear me?

Kathy Hebda: I can. You're a little bit muffled, but I can hear you.

Lisa Maxwell: Well, do you see some anomalies here in the county especially with (level five)? I'll give you – and it's significant. We're getting teachers who are teaching gifted, has a significant of number of (inaudible). There's a ceiling effect that's happening in the model. We have just as one example a teacher who is Teacher of the Year twice and has a needs improvement based on the VAM score.

And that's essentially because her kids that are operating at the top level. So how do you, you know, I see these – the slides. They show, you know, that there seems to be no correlation. However, when you run the model over a statewide basis, it probably has sort of a muting effect.

When you break it down to the district level, our experience is different. So maybe you address that or have him address that. And how are we going to not disincentivize our top teachers by giving them a needs improvement because they've got the top kids in the class?

Kathy Hebda: Lisa, thanks. Those are great questions. They actually lead into the status reports that I have to provide.

I'm going to hold your thought just for a second just to make sure that there isn't anybody who has a question about any particular slide in here. And if not, I'm going to respond to what you're talking about because it does lead into my next segment.

Are there any other questions just on the slides that we saw here, Members?

Male: Evaluation systems.

Kathy Hebda: I'm sorry. Was there a question?

Male: If you have…

Kathy Hebda: Who's speaking please?

Male: You have volume. What does it sound like?

Kathy Hebda: Go ahead.

Juan Copa: I'm sorry. We heard somebody attempting to answer – ask a question, we believe. If there is somebody trying to ask a question, please identify yourself and – or Kathy will move on to answer Lisa's question.

Hearing none, I think we'll move on to Lisa's question.

Kathy Hebda: OK. And it could be an issue just getting on and off mute.

So if you – operator, I want to make sure all the member lines are open. They are open, correct?

Gina Tovine: Kathy?

Operator: All member lines are open.

Kathy Hebda: OK, very good.

Gina Tovine: Kathy, this is Gina Tovine. How are you?

Kathy Hebda: Hi, Gina.

Gina Tovine: Hey, I have a question for you, something that we experience with the results and something just to be aware of is that when we ran our VAM scores, the FCAT results using the same students and then the non-FCAT teacher results were like even anywhere from a tenth to a hundredth of a point difference using the exact same students. So I'm not sure if you're aware of that or what your thoughts are.

Kathy Hebda: Right. Thanks very much. To me, that's even in the same category as what Lisa was asking about even though it's a different sort of a detail.

All of those things sum around a couple of things. The first is the slides that Juan just presented are based on those teachers who taught Reading, Language, Arts, and Mathematics courses that are associated with the FCAT. That's where these numbers comes from. That's how you worked on developing the model. That's how you ended up recommending the model.

And as you can see, that model has a great fit for those instances.

Gina Tovine: Right.

Kathy Hebda: Remember the model and this is for everybody that the model has the – looking at student performance and student growth using this model is one piece of a much larger evaluation system that each district has approved. And we have approved – each district has developed and we've approved. And they are teachers who don't teach Reading, Language, Arts, and Mathematics and for whom a district has not selected an assessment for the courses that they do teach who are using a variety of things that districts have determined would be their student growth measures working within the parameters of the law, but also making various choices.

At the same time, even teachers who teach those courses associated with this model – Reading, Language, Arts and Mathematics in the appropriate grades, as Lisa was bringing up in Broward County, there a handful of districts who do not actually use the VAM score.

Remember – you may remember committee members that when the value-added model predicts a score for a student and then we look to see how that student actually performed, there are students who will have exceeded the expected score and students who will meet the expected score and students who will not quite make the expected score. And one of the things that you can see from the model, one of the results that you can get before you provide the score is the number and percent of those students assigned to a teacher who did meet or exceed that expectation.

There were a few districts, Broward included, Nassau was another, Escambia is another, and there are a few more that determined that for them, for '11-'12, that was the best metric to use from the model for their evaluation system. That occurred for a number of reasons, some of which included it was easier for people to understand than a score. It was very similar to what they felt like they were used which was a learning games number that we've used for school grades for years. And for a variety of reasons, that's what they chose to use.

That number doesn't necessarily provide you all the statistical parameters you can employ to use a value-added score. So in some instances, things that may have happened such as Lisa was bringing up where you have students who may have hit a ceiling effect, that may be exaggerated by only using the percent meeting expectation as opposed to the score itself – the value-added score itself.

I say that not to say that districts have made bad decisions because they have not made bad decisions. Everybody has made the decisions and you're one that they were – they felt were the best decisions they could make in their evaluation system as they began, and that depends on a lot of – a lot of things that are really unique to each district – how they – how they negotiate their systems, their capacity to run data – lots of different things where they were instructionally as a district. All kinds of factors went into those local decisions about their evaluation systems. And the law provided that flexibility to use a number of different things.

What you see now happening is you have districts who have – after getting the results, after seeing how it affected some of their teachers, meaning – particularly the teachers that are on the temporary measures. Not as much we really haven't heard very much from people who are using the model for whom it was intended those Reading, Language, Arts, and Mathematics people. But for people who are using different kinds of model results for other teachers, as Gina was bringing up those who don't teach at that courses, but their students take the FCAT and the district doesn't have an assessment, they want to use yet for those courses.

And particularly, districts who use the instructional team option and put teachers maybe who are teaching 11th and 12th grade on a school-wide team and assigned them school results…

Male: (Inaudible).

Gina Tovine: …those are – operator, can you determine who that is?

Operator, can you mute that line?

Male: Operator, can you close that line?

Kathy Hebda: (Andrea)?

Operator: Yes, sir.

Yes, ma'am.

Kathy Hebda: Can you please mute that line?

Operator: I'm sorry. That was from the line of…

Kathy Hebda: I'm sorry, (Andrea). What did you say?

Operator: That was the line of Lawrence Morehouse. I'm sorry.

Kathy Hebda: OK. All right. Thank you for muting that for us.

Yes, Members, try not to put us on hold if you can so we don't hear your radio music.

Anyway, we're talking about the people who are on temporary measures, particularly school-wide. And when districts saw what – how that data worked out when they were assigning those percentages, some of them made adjustments to those results to their cut point and those sorts of things which districts have authority to do.

In some cases, some districts are using those temporary measures where they set their cut points, worked out really well. And they stopped at what they had. Others made adjustments. And I think that's one of the things that has been most discussed recently in the news about the model and a lot of focus on the value-added model itself.

When I think what we have is districts doing the best that they can with folks on temporary measures before they feel comfortable using their own local assessments that they're using now in their classes for evaluation purposes, having to make adjustments until they feel like their systems are fully developed for those teachers.

And, Lisa, I don't know if that completely answered your question, but there are some things that Broward, as other districts can do to mitigate those results for those teachers.

Lisa Maxwell: Yes, Kathy, thank you so much. That does (inaudible) quite a bit. And I think that knowing that there's such ability with the states in allowing us to make those adjustments, we'll go a long way in dealing with, you know, (inaudible) and things like that. And just in general, we don't want to be in a position where we're giving their message to our top teachers. That's really bad, so I'm glad we can to make those adjustments and that the department is going to work with us to do that.

Kathy Hebda: Thanks, Lisa.

And, Gina, I think, one, had a couple of extra comments she was going to make about your particular situation with the FCAT and the non-FCAT when it was the same kids.

Gina Tovine: OK.

Juan Copa: Gina, we're in conversations with Anne on that issue. Again, that non-FCAT attribution, the non-FCAT model that was done falls in that category with temporary measures that Kathy was talking about at paragraph 7(e) within the statute.

We received that note from Vickie in your district – Vickie Cartwright about that issue. I think we – if we can discuss different options on how to deal with that again, I think that goes in the realm of flexibility as Kathy was talking about because again there could – there's a statistical explanation for that, but when it gets down to the ground in terms of practical application within evaluation system, again I think we need to discuss different options on what you can do with those situations.

Gina Tovine: OK. That's great. Thanks, Juan.

Kathy Hebda: Sure. And we have been working a lot with districts, as I've said recently who is – who have made adjustments to their '11-'12 results.

And my next report is just to give you a status on where districts are with the reporting of those results. You may remember that the law requires the commissioner to begin reporting this December 1st which this year fell on Saturday about evaluation results by groups of personnel across the state and by districts and by school – evaluation results meaning percentages of highly effective, effective needs improvement, developing on satisfactory across the state just like we've been reporting in the past, but the past reporting was always just satisfactory and unsatisfactory.

Districts have submitted – most districts have submitted their data. So what we are doing is – and we get a lot of questions about this from the media as you might well imagine. So we notified the media last week that we were going to report – where all the data we had so far to meet the December deadline, so we anticipate that tomorrow the first – the first cut of that data will be provided, and it will include the data that's been reported by districts so far.

There are about seven or eight districts that have not been able to report any data yet because they're still working on things for a number of reasons, some of which have only to do with technology and not their evaluation results. And other districts have reported classes of personnel. Some districts have reported just their administrators, some districts have reported just their classroom teachers and they are working on their administrators. So when the data come out, you're going to see that people are in a variety of stages of reporting.

We'll produce a final report that will have everybody say that in the final results in January. I will update this report next month after the holidays when the last few districts have been able to finish their reporting.

And as I mentioned too, as we were just discussing, a couple of districts have adjusted a cut point for some of their teachers on school-wide measures and other things like that that they have chosen, so some of those percentages even for districts that have reported, it may be adjusted slightly by the time we produce the January final report. And that report will also go to the State Board of Education. We'll be discussing that at their January meeting. And you can also expect that we would be requested to provide updates to the legislature as they have committee weeks after the first of the year as well leading at the legislative sessions.

One thing I want to make sure that everybody understands about this data that we report is these are reported in the aggregate, meaning we do not report individual evaluation results by teacher or by principal. There is still a law in the books that says teachers' evaluation is not public for a year after – the year after it's made. And so we provide these data in groups, as I said, by school, by district, and then by category.

We will have some analyses in January based on things like Title I and non-Title I schools and all the things that we normally report for equitable distribution – high minority, low minority schools, high poverty schools, low poverty schools, those kinds of things. We'll also have some analyses after the first of the year. I don't know if it will be in the January report but after the first of the year that we'll include what we've agreed to look at under Race to the Top, which is looking at value-added results and those kinds of things as well as the overall evaluation results. So that will begin after the first of the year.

We've all submitted in discussion with school districts about having some local informal meetings with the groups of them in December and January. And also, you can expect to see the Department of Education rules, the State Board of Education rules that pertain to district of evaluation systems and the value-added model, that language being posted this month and then rural workshops happening in January as well. So there's going to be a lot going on, a lot of talking, a lot of opportunities for input from people.

And then what we will try to do for the committee – for you all is each time we meet provide you updates on what we've learned, what people are telling us, and any other status reports of things that are happening out there so that you stay informed of what's happening around the state.

Other questions about those reports?

Gisela Field: Kathy, this is Gisela from Miami Dade County. I was wondering if you intended to also break up the chartered teachers versus the non. I know we transmitted last week the few charter schools in our district to our Race to the Top of their evaluation results, but I have no idea how they utilize the VAM, how they applied it. And I would hate for their results to be lumped with ours, and I know, in fact, that it's a different plan.

Kathy Hebda: Because the data will be reported by school, people will be able to see school results. We will, of course, leave out schools that have less than 10 teachers with value-added scores. But otherwise, all school results will be reported as well so people will be able to see which schools are associated with what kinds of results.

And for those of you that are not familiar with that part of Race to the Top charter schools, we're allowed to join districts who are participating in Race to the Top. And so there are some charter schools that because they were participating in Race to the Top are included in the '11-'12 data. Most – the majority of charter schools were not participants in Race to the Top so their first evaluation results will be provided for '12-'13.

So the district – to answer your question, Gisela, we'll be reporting by district results which would include your few charter schools that are Race to the Top, but then we also report by school so folks can see which schools have which results.

Gisela Field: OK. Thank you very much.

Kathy Hebda: Sure.

Other questions about the report or anything else before we go to the (inaudible).

Lisa Maxwell: Yes.

Kathy Hebda: Yes, ma'am.

Lisa Maxwell: Kathy, it's Lisa Maxwell again. One more question. For those districts (inaudible) really rumbling, but maybe (inaudible) union, et cetera, or other mitigation, and then looking at not using them at all, what, from your vantage point, would be the consequences for Race to the Top and DRE or just making a decision to (abandon them)?

Kathy Hebda: That has come up. They came up with a couple of districts a couple of months ago, a couple of the smaller districts. And one of the things that districts who are participating districts in Race to the Top agreed to do is they agreed to beginning in '11-'12 use the adopted growth model for state assessments for those courses and subjects that it applied to.

So for the teachers that we're talking about here that actually taught Reading, Language, Arts, and Mathematics, those statewide results or the results using the statewide value-added model would apply to them, and that's something the districts who are participating in Race to the Top agreed to do for '11-'12. So a district who throws out all their value-added data including for those teachers who actually – for whom the measure was designed and applied, who taught those courses would be exiting from Race to the Top because they would have broken face with phase two MOU that they agreed to implement.

It also, of course, would violate the law because the law require that that data be used for those teachers in '11-'12. What the law does not require is that the value-added results, value-added scores themselves be used for teachers on temporary measures.

There are things in the law about using state assessment results for teachers when the district doesn't want to use the student assessments (inaudible) other grades and subjects, but it doesn't say in those cases that you have to use the value-added results. If districts wanted to use learning games or something else for those teachers that made more sense they could, and in some cases it might. But a district would have to – to remain in Race to the Top, would have to continue to apply those results to teachers of Reading, Language, Arts, and Mathematics that are associated with FCAT because that's the one model that was completed for '11-'12.

Lisa Maxwell: Thank you, Kathy.

What do you mean by temporary measures? What does that mean exactly?

Kathy Hebda: It means that the paragraph 71012.34 from 7(e) that Juan was talking about were classroom teachers who teach courses where the district hasn't declared that the assessment they use in that course right now is going to be their official assessment for that course because we know kids take those assessments all the time, but there still are places where districts are not comfortable using those same assessment results for evaluation purposes.

It's like the seventh grade social study feature, for example. And my students certainly take social study assessments throughout the year, but the district has not determined that those assessments are going to be used for evaluation purposes.

So in that case, the districts then have to use state assessment results for the students assigned to that teacher by law. That's what's in that paragraph. And that paragraph expires in 2015 because by 2014-15, it's anticipated and it's required in the law that the district has chosen assessments that measure the content of those courses for students to take, to actually measure the mastery of the content of the course, and then select from those the ones they want to use for evaluation purposes by that year.

So those – that provision and the provision for using learning targets for other teachers that maybe teach 11th and 12th grade and don't have any students to take a state assessment, and including the instructional team provision that we talked about that most districts determine that school-wide data or they interpret it as school-wide data, that also expires in 2015. So they are measures that are used as approximations of student performance for our teacher's evaluation that would be combined with their instructional practice to produce the final summative rating.

Lisa Maxwell: Got it. Thank you.

Kathy Hebda: Sure. Other questions?

Gina Tovine: Kathy, this is Gina again. I have a question on the slide that Juan went over. Are we able to go back to that or (inaudible)?

Kathy Hebda: Which slide number, Gina? Do you know?

Gina Tovine: It's actually all the correlations. It's just a general question. If he can give an explanation, I know he did. I just maybe need to hear it a second time.

I know when we were going through the process of identifying the model when we selected the covariate model, we identified specific characteristics to include mainly looking at their effect size, looking at ones that had a high effect, I believe. And so obviously some of the same characteristics are here like ELL and students with disabilities and E.D.

So when I'm seeing the result here and seeing a zero correlation, I'm scratching my head trying to figure out the relationship between the two. Is this because the viewer correlation is indicating that the model is doing what it's supposed to do? And can you give another explanation on that to help me understand it?

Juan Copa: You know, that's exactly it is. It's an indication of the models doing what it's supposed to do. And we explicitly within the model control for ELL students with disabilities and gifted status. And those factors, they are controlled from the model and it's resulting in the results that we see that there is basically no correlation. That factor is not driving the VAM score.

Now, I think it's an empirical question because we didn't control for minority status or (inaudible) status again because the law prohibited those controls for being explicitly the model, yet those didn't have an impact either which lends us to believe that those factors are indeed already factored into the prior year score – the prior year performance which was our hypothesis going in.

Gina Tovine: Right.

Juan Copa: So I think it's an open question. The model didn't control for students with disabilities, ELL.

I would – I would venture to hypothesize and probably would see about the same results because we know that when looking at the – in the model development, it was that – it was that prior performance indicator that was by far driving the prediction. The other factors were provided marginal information. So it's possible that we would – even if we didn't control for students with disabilities status and English language learner, you probably would see the same correlation to the zero correlation because you could make an argument that those factors also factor into the prior performance. And so I think that's an empirical question. I don't have that data in front of me so I can't answer it definitively.

But basically what these – what these charts are showing us is that the model is behaving as we hope it would behave in terms of leveling the playing field based on different characteristics.

Gina Tovine: OK. Thank you. It helped.

Kathy Hebda: Anything else, Members, before we go to the next part of the agenda?

OK. I'm going to go to the other PowerPoint now and queue up Harold.

Harold, are you ready?

Harold Doran: I am ready. Good morning, everyone. How are you? And can you all hear me?

Kathy, can you hear me?

Kathy Hebda: I can hear you. Yes, Harold.

Harold Doran: OK. Excellent. So I would like to let folks on the phone know that I'm joined by Eric Larsen who is a new member of our value-added team, and he is going to do some of the work in the presentation with me here this morning.

Kathy, you're going to advance the slides for us. Is that right?

Kathy Hebda: That's correct.

Harold Doran: All right. Excellent.

All right. So good morning, everyone. Kathy, I'm starting – right now, I'm looking at slide number three.

Kathy Hebda: That's where we are.

Harold Doran: All right. Excellent.

So we're going to present to you results from some optional VAM (inaudible) as well as from one of the ERC results and then talk to you later about some other work that's been done.

We are trying to structure because we're talking about different value-added models, different results. We're trying to structure the presentation in exactly the same way even though we're talking about different models just so we can keep our sanity.

So essentially, we're trying to answer three questions with the results that we'll present to you today. The first thing we're going to look at or the question that we're trying to answer is, are the input data accurate and sensible? That is, are the scores that we're using in the model, do they seem like they represent the population in the way we expect them to. And we're looking at what we call descriptive statistics and we'll present those to you. That's typically the place where we might find red flags, anomalies, issues.

The second question that we're trying to answer is, are the models behaving as expected? Value-added models are expected to behave in a certain way. There are certain statistics that should look a certain way. And so before we look at the results of the model, we actually know what some of those things should be.

Once we have the results, then we can compare them to what our belief about the model should be and see if they conform to our typical expectations.

As Juan showed you a moment ago, the FCAT results conform almost perfectly to how people in the world of value-added modeling would think a value-added model should behave – the impacted results, the correlations and so on present a very nice picture of how well the model behaves as a value-added model.

The last question we'll try and answer is do the results suggest any advantages or slash disadvantages to certain teaching groups. So one of the questions that came up earlier was whether or not there was a ceiling effect. And if there's a ceiling effect, we would see that in the impact data. We would see that there will be a very large correlation that would give teachers who – if there's a ceiling effect, teachers who receive kids that were very high performing on average would systematically have lower value-added scores because they can't show growth. That would be an indication that it's giving a disadvantage to a certain group.

And so that are the three questions that we will try and answer as we go through this presentation. And then within each of those, we have various indicators that we will use to present to you today.

Moving on to slide number 4. This is just a reminder. I think it was Gina, a moment ago, who was talking about the covariates and recalled very nicely how we went through and made decisions about all of the covariates that were included in the FCAT model.

This is just a reminder slide. We don't to just randomly choose covariates or just pull things out of the air. They have to be intentionally chosen to help the value-added model work, and there are typically a few indicators that we look for or criteria by which we would – we would want a predictor variable to meet.

The first one would be, we want the predictor variables to have a high correlation with the outcome variable. That’s a statistical correlation.

The second would be, we wanted to have a curricular relationship in terms of a test score. So for example, Math is a better predictor of Math than Science is a predictor of Math. That just generally is true. There may be instances where it may not be true but it’s typically true.

It should be correlated with factors that contribute the student learning but are not in control of the teacher. Remember, we’re trying to level the playing field here and that’s what that indicator is helping.

And a high correlation with the unobserved process of how students are sorted in the classes. Remember, if we could randomly assign kids into classrooms, we wouldn’t need a value-added model.

But, of course, you can't randomly assign kids in the classrooms and you can't randomly assign kids into schools. People choose their schools. Kids are sorted into classrooms in the intentional reasons and so we have to control for that statistically.

That’s what the value-added model is trying to do. So if we want to level the playing field so we say, “Well, these teachers doesn’t have an advantage over here because he or she taught these kinds of kids then we have to control for those kinds of kids statistically.”

Now, if any of the predictors failed to do that or if all of the predictors failed to do that than in the end, we just wouldn’t have a good value-added model and we might have results that are partially biased.

So our goal is to identify predictors and meet this criteria to the best degree possible knowing no perfect predictors. There are good and some that may not be as good. All right.

Moving on to slide number 5. This list should be very, very familiar to the SGIC. These are the variables that are used in the FCAT model more or less and we have tried to be as consistent as possible to honor the direction that the SGIC gave, the department and AIR in the past.

There may be a need to make some different decisions but as a starting point, we always use the information we gathered from the SGIC because that was where you put and gave your best guidance.

So this list of variables are the covariates that are found in the model that we will present today. I wouldn’t go through them one by one unless anyone has a question on any of these covariates on slide number 5.

All right, moving on to slide number 6. My colleague here, Eric, will jump in and share a couple of the results.

Going into this, I want to share with you and remind of you something that’s very important that Kathy said at the beginning. This is a different test than the FCAT. It’s in different grade than the FCAT.

It is not based on the entire state population. It is based on a few selected districts and so the results are different or the results might be different. And there is no expectation by AIR or the department, there should be an expectation that the exact same things that cost for the FCAT to work well or not well would apply in the exact same way to any of the other value-added models.

So one of the things that we’re doing as this presentation unfolds that you’ll see is we present the results showing 50 percent of the school component added in and results that do not include 50 percent of the school component.

As we go through the presentation, you’ll see why, it will become obvious, and that there may be a key decision that the SGIC would like to weigh in on, but I want to just remind you that there are many thing that cause for these models to be different, different kids, different districts, different populations, different tasks, different grades.

And so we need to go into this with the framework of some of the things we decided for FCAT probably should remain but there’s no expectation that everything should remain in the exact same way.

With that said, we’re looking at slide number 7 and I’m going to turn it over to Eric.

Eric Larsen: Good morning, everyone. This is Eric. Can you hear me OK?

Harold Doran: Yes.

Eric Larsen: Great. So the first slide here, I guess it’s slide 7, just a little background to make it on SAT-10 Harold already provided a little bit. So the Stanford Achievement Test 10th edition, students in some schools fairly are tested in Grade 1 and Grade 2.

So when we’re running our models, our VAM models for SAT-10, we’re using a Grade 1 SAT-10 scores as predictor scores for Grade 2 predictor scores. So we’re using actually the same test as both the predictor and the outcome.

The estimates might not be as precise as they could be because we don’t have standard error of measurement for the SAT-10. If we did have them, if they weren’t available or make available to us, we could provide a more precise estimate. But because we don’t have those standard error of measurement, these estimates wouldn’t be as precise as we’d like them to be.

And basically the model we’re estimating here for SAT-10 is the same as the FCAT, just with some different test predictor variables and obviously different outcome variables and the sample is different because not all students take, you know, the SAT-10.

Any questions about slide 7 before we go to slide 8?

Harold Doran: Let me just add one thing about the measurement error. Here, if districts were choosing to implement this model and if they have the standard errors to account for measurement error which is a really important concept, the framework allows for them to do that.

The issue is that when we receive these particular SAT (inaudible) or the standard errors just weren’t available to AIR at the time that the analysis was run. If they were, we could have included them but any district that would use this optional VAM if they have the standard errors, they could use them after – as they wished.

All right, moving onto slide 8.

Eric Larsen: OK, the slide 8 here, we see the outcome scores, the second grade SAT-10 scores. And we put the slide here for a number of reasons, first, it’s just to make sure that the data look OK. So would you – so this (inaudible) that shows you the distribution of scores both overall for all students and then by subgroup.

And so we look at this and we said, OK, do the data look as we expect? And we’d say that students who are English language learners seem to do better on average than English language learners. Students who don’t have visibilities seem to do a little better.

So that’s where we’re trying – OK, the data looked like we expect them to. It’s also a reminder of why we need to do value-added modeling in the first place. There are average differences between groups of students on how well they do on the scores and students are not randomly distributed across schools and teachers.

You know, different teachers have very different classrooms and this is why we need to do value-added modeling. If students were randomly distributed, we could just compare student outcome scores and use that to evaluate teachers.

And that is something you need to keep in mind also when we look at the impact scores because student characteristics might not only be associated with the levels of scores but also the growth rate of scores.

So this is slide 8. And on slide 9, we just see the previous SAT scores, so the first grade SAT-10 scores. And we see very similar picture we saw on slide 8 for the second grade. These scores are all a little lower on average because the first grade SAT-10 and the second grade SAT-10 are on the same scale so obviously, students are going to score a little lower first grade than the second grade.

Any questions about these two slides?

Harold Doran: Let me just run real quickly, these same trends that we observed in the disaggregated subgroups and the same trends that we observed with FCAT. So the fact that we see differences between groups, so for example, looking at slide 9, the data tell us the students who are not ELL on average tend to perform slightly better than students who are identified as ELL.

Students who are not special ED tend to perform on these tests, have higher scores on these tests than students who are special ED. These are patterns that we observed with FCAT by these subgroups. It’s also the same patterns that we observed in virtually every state we’ve ever worked in with the data.

So none of these identify any red flags or anomalies with the input data. So if we just circle back to the first question, are the input data sensible. From the AIR’s perspective, everything conforms to a typical pattern of how scores look when you just aggregate them by subgroup.

Now, that’s not a statement that these discrepancies are good. Of course, we don’t want those discrepancies to be there, but they are and there’s nothing that’s shocking to us when we compare this to FCAT or our work in any other states.

Juan Copa: And Harold, let me just interject here. This is Juan again. Just to reiterate so it’s clear to everybody, what this slide is showing is simply looking at the SAT-10 scores of the students by these characteristics.

This is not the result of the – any proposed model yet. This is just looking at the distribution of scores on the SAT-10 by these student characteristics.

Harold Doran: All right, going on to slide 10.

Eric Larsen: Slide 10, just for people who – what Harold and I just talked about, what we see here for the outcome scores for students, for student SAT-10 scores are what you’d expect and what we also saw in the FCAT. The differences – the average differences between groups of people seemed to be normal.

One thing that we didn’t mention is not surprising, there is also a correlation between the students’ first grade SAT-10 score and their second grade SAT-10 score which is what you’d expect. The correlation is about 0.77 and that’s what – again, what we would expect.

Questions about slide 10? Now, let’s move onto slide 11.

So this – we will get the variance of teachers and schools just to tell us is the model behaving as we would expect it to behave. We would expect the teacher variance, the variance between teachers to be greater than the variance between schools, and just when you think about it, the effectiveness of teachers, there are more teachers, their ability – their effectiveness in teaching and there are probably a lot more than the average between schools. So we expect the teacher variance to be greater than school variance and it looks like that. It just sort of – OK, let’s move on to the next one, to slide 12.

And the model seems to be behaving exactly as we would expect. That’s a good sign that bar on the left shows the variance in teacher – you know, teacher effect and it is larger than the school effect. So that’s a good sign, it tells us that the model is doing what we expect it to do.

Harold Doran: So just circling back to that second question in terms of the diagnostics, with the – again, linking this back just to the FCAT, this pattern for the standard deviation between teachers being compared to the standard deviation between schools, it’s exactly the same that we see in the FCAT.

And so if one would ask the question, well, what is that number, so we see teacher standard deviation is about 10. What that means is on average, the average – the average value-added score before we add in the school component has an average score of zero and the standard deviation of 10. That’s all it means.

But we’re trying to answer the question, does this look like what a value-added model should look like. And the answer to that question is, yes, if the teacher bar is bigger than the school bar, and so the answer here is yes. From this picture, it’s looking like a value-added model should look, that’s good.

Eric Larsen: So some other things we want to look at when we were looking at the results of our model and seeing if it’s behaving as we expected to, one is the R-squared. So the R-squared tells us if we look at students outcome scores that like SAT-10 scores, students are obviously going to not all score the same, they are going to score differently.

So we want to see how much of the difference between students and scores can be explained by them all. If the R-squared is one, that tells us that we can purposely predict every single student score. Our model tells us everything we need to know about what the students and the score on the SAT-10.

If the R-squared is zero then the covariates we include in the model, we can't use them to learn anything about how a student and the score on the SAT-10. And so the SAT-10 – the R-squared for the SAT-10 is 0.62 which is good.

That’s what we saw (inaudible) FCAT in that range and that’s for most VAM model, that’s about (inaudible). We can explain, you know, about 60 to 65 percent of the variation between students just by what we include in the model.

OK, now, the impact data results, let’s stop and take a little but while. We’re even looking at these. So students learn at different rates. So conditional on student scores when they come into the classroom, some students are going to learn at a slower rate, some students are going to learn at faster rate.

One reason students might learn at a slower or faster rate is the teacher and the school and that’s the fact we want to catch with our VAM models. We want to say, OK, some students are learning quickly, some are learning more slowly. How much of that is due to the teacher? How much of that is due to the school? We want to capture that in our VAM estimate. But there are other reasons that students might learn more quickly or more slowly.

Not if the reason the students learn at different rates, if on the basis of that, students are randomly distributed across schools and across teachers, that’s not a problem. But not only – what am I trying to say?

The reasons that students learn at a lower or faster rate might be associated with the process for sorting students in the schools, sorting students in the classrooms and that might sort of contaminate the teacher VAM effect.

So when we get a teacher VAM estimate, we not – we might not be capturing only the teacher’s effectiveness but we might be capturing other factors that cause students – the rate at which students learn, other factors that are out of these school’s control and that’s what we worry about.

We only want to capture the rate of student learning that’s due to the teacher and school. We don’t want that to be contaminated by other factors that affect the student’s rate of learning that are out of teacher’s control or the school’s control.

And some of these factors might be related to observable student characteristics, students with disability status, socioeconomic status and it might be also associated with other factors that we just can observe in the data.

Harold Doran: So we just add – so going into this, with the impact data, we expect the correlation with the classroom characteristics to be zero with the VAM score. Let’s use an example, let’s just say that there is an advantage.

Teachers will have a high proportion of – or excuse me – a – you know, a high proportion of gifted kids. Let’s just say there is an advantage. What we would see in the impact data if that’s true is that teachers who have gifted kids on average would tend to have higher VAM scores. OK, that means the correlation would be positive and close to one.

If that were true then you would have to struggle with the question, does the teacher look good because of what they did or does the teacher look good simply because of who they taught? Could have been a bad teacher but they were lucky and that they received good kids, gifted high proportions of gifted kids.

So that’s why we want those correlations to be close to zero. If the correlations are not zero then we have to go through those questions and have the discussion of is this reflecting something in the real world that we believe is true or does it indicate a problem with them all? And it’s never – you can never know the answer to that, you can only hypothesize.

All right, moving on to slide number 15.

Eric Larsen: So slide number 15, what we see here on the Y-axis is the estimated, you know, teacher effect or school effect. And I think we have both here, right, the first…

Harold Doran: Yes. So the left panel, the one that says, with school component, what we’ve done is we’ve – on that left panel, that includes 50 percent of the school component in exactly the same way the FCAT model does.

The right-hand side that reads, no school component, does not include any of the school component at all. It’s just the teacher component all by itself. So the X-axis is just showing the percentage of kids where SWD in a class and the Y-axis is the – is the teacher’s score or what we might call a VAM score.

And then it’s presented in two ways – with the school component and without. As you can see the difference in this slope of that regression line, we want that line ideally to be flat or as close to being flat as possible.

Eric Larsen: And so what we see here in slide 15 is it is very flat, so that’s a good sign for us. We can interpret this pretty easily. It seems to be the fact that the teacher VAM estimate is not being biased by factors that might be associated with the share of students with disabilities.

This is a good time, so let me just say, OK, so this is good. Let’s go on now to slide 16. Same thing here, we have the – on the Y-axis is the teacher VAM score with the school component or without the school component and then the share students who are English language learners in the class.

And what you see is when we include the school component in the teacher VAM negative correlation between the teacher VAM score and the share of students who are ELL in the class gets larger or stay more negative.

Now, we don’t know if this – well, I guess what this is suggesting is we don’t know if this has something to do with the sorting of students in the schools or if it’s just something about the effectiveness of the schools or the high shares of students who are English language learners.

But – so there is something to think about here though and it’s how much of the school component do we want to include in the teacher VAM, how much of this is in the teacher’s control and how much of this is out of the teacher’s control.

Moving on here, again, teacher VAM score with and without the school components on share of students who were gifted, small correlations here, not really much to worry about, I don’t think.

Now, here with share of students who are socioeconomically disadvantaged, it looks like we’re on slide 18.

Harold Doran: Eighteen, yes.

Eric Larsen: We see again, what – when you include the school component, that’s the left-hand panel, the negative correlation is much greater in magnitude when we don’t include the school component.

So we don’t know if this is because schools where there are high shares of students with low socioeconomic status, if those schools are just less effective than schools with lower shares of students – lower-SES students or if there is – there is something about having large shares of low-SES students in school that caused students who are in the lower rate but unrelated to the score of teacher effectiveness.

Harold Doran: So this one is interesting. And just to be clear on exactly what the interpretation is here, the closer you get to 100, what that is saying is, teachers who have larger proportions of kids who are identified as economically disadvantaged, on average, they tend to have lower VAM scores when you have the school component in.

When you don’t have the school component in, it doesn’t seem to impact the results that much. The schools are tiny impacts, not as much as when you have the school component in. Now, we’re building a story, we’ll answer the why question in a little bit and I know someone it itching to ask that question but give us just a moment, we’ll get there.

Now, OK, moving on to slide 19.

Eric Larsen: And we see a similar story as we saw with the share of students who are low-SES. When you add the school component, it appears that schools that have a larger share of – or teachers at schools that have a larger share of low-SES students – I’m sorry, of non-white students have lower VAM scores.

Ideally, we’d like to see no slope there but it does seem to be that there is an association between the non-white population of the school and the teacher VAM score when you include the school component. And again, with the school component, the slope is much more negative than when you don’t include the school component. (Inaudible) with low-SES.

And so finally, now, we’re on – not finally – but slide – finally for the impact slides, we’re on slide 20. And here, we see again, a similar story but now, what we’re looking at is the average first grade SAT-10 score of the classroom.

And so here, we see a positive association, so teachers, students – or teachers where the average incoming test score on the SAT-10 is high, tend to have higher VAM scores, and this is particularly in place – in the case when you include the school component.

But then – I don’t know if you are talking about the ceiling effect that came up early here but this might be sort of an argument against the ceiling effect. It turns out that teachers that have higher average incoming test scores have higher VAM scores in the case of the SAT-10.

So what we’re showing in this (inaudible) is sort of estimate – we’re getting a sense of the correlation between the classroom and school characteristics and the teacher VAM score. And then on the next slide, we just present the actual correlations between the factors.

So what we have here is the correlation between the teacher VAM score and various classroom characteristics. In the first column, that’s when the school component is not included in the teacher VAM score, you know, half the school component or half of the school estimate plus the teacher estimate.

In the right-hand side column, we see the teacher VAM – or the correlation between its average characteristics and the teacher VAM score when the school owner is included and it’s reflecting what we just saw on the picture.

The mean prior score is possibly associated with the teacher VAM score particularly when the school component is included, share of students with disabilities, share of students who were gifted. The correlations are large there.

But again, the share of students who are low-SES, there is a negative – a large negative correlation between the share of students who are low-SES and the teacher VAM score particularly when you include the school component, ELL, the last row from the bottom there, smaller correlations, and again, the share of students who are non-white, fairly significant negative correlation with the teacher VAM score particularly when you include the school component.

Harold Doran: Let me point on thing out and offer some thoughts. The first column, no school component, from my position as the person who’s been doing the value-added work for a long time, those correlations would not alarm me. They’re very small.

The minus .12, it’s noticeable but it’s not alarming, it’s not so big that it would cause me to be worried. But ignoring that 0.12 from – and all the other correlations are about the same as what we saw in the impact data slides that Juan presented earlier and we would give all of that a thumbs-up, right?

Remember, the number doesn’t have to be exactly zero to be zero. Now, the 0.12 – minus 0.12, there’s a tiny trend there but it’s small, it’s not alarming but there is something there. But when you go to the school component, when the VAM score includes 50 percent of the school component as it does with the FCAT, you can see that those correlations in a couple of cases for economically disadvantaged, ED, non-white and mean prior essentially double in size and become large enough where it is something that should be a very big conversation.

Whether or not it indicates a problem with the model or something else, one can never know that, and we’ll share some thoughts with you in a moment. But I would say to you that these correlations get very big and very noticeable and there’s a big conversation that one would need to have if you were to report the VAM score with 50 percent of the school component.

So let’s move on to slide 22 and summarize what we think is actually happening here. So the impact data here show as stated that the correlations are small when there’s no school component but they are larger when the school component is added in. There are not one correlations on it, one an absolute value, but they’re not zero in a couple of cases. They’re clearly not zero.

So let me answer the why question and let me put a couple of thoughts out there in terms of what could be happening. I don’t presume to know the answer but I’m going to hypothesize on some things from my experience.

Now, if you think about a level score analysis alone and if we computed the differences between schools and the differences between teachers using just not a covariate model but just regular one-year test scores.

What we would see is the schools appeared to be more different than teachers, that is schools would have more variance than teachers would have variance. The reason that that would be true and the reason that is true is because schools are stratified by the kinds of kids they teach.

Rural schools are very different than suburban schools. Suburban schools tend to have more larger proportions of economically disadvantaged, larger proportions of non-white students and so on. They tend to receive students who have lower test scores on average and so on. Now, universally true but generally true.

Rural or urban schools tend to have students that are fewer minority. They tend to have students who on average have higher test scores, fewer students who are economically disadvantaged and so on and that gets reflected in their test scores.

So the test scores, just the average single-year test score makes schools look very, very different. That’s a bias, OK? It’s not because the schools, in fact, are better or worse, it’s a reflection of the populations they teach. That’s the limitation of AYP kinds of analysis and why people have moved to growth models because in the growth model, we can presumably control for that.

All right, now, let me take that and connect to what’s happening here. The school component in this value-added model is capturing some of those systematic differences between the schools. Schools in value-added model, the school component shows that schools still appear to be different but as a function of who they teach, not necessarily what they did.

And so what seems to be happening when that school component gets added back into the teacher effect, it adds in some bias and we can see that reflected in the impact data that it causes for those results to have a little bit more bias.

Let me – that’s argument number one with what is happening in the data from my perspective. Possible argument number two is this, the school component could be adding – back in could be perfectly fine to do. We know in the real world there is not an equal distribution of teaching abilities in the population. We’re dealing with this in another state right now and there’s some research to support this.

Teachers who might – the teachers who are able to produce higher growth or better teachers might choose intentionally to work in school systems that have students that have students who on average tend to perform higher, fewer minority students and so on, OK.

There is not an equal distribution of teaching ability in the population. There may even district policies in some cases. I don’t know if this is true in Florida, it was true in the district I worked in where the more veteran (inaudible) and experienced teachers could choose which schools they wanted to go to and as a principal before I could hire a new teacher, I had to accept what we called district-initiated transfers.

So it could be reflecting the fact that there is either some bias in the results or it could be partially reflecting the fact that there’s an unequal distribution of teaching ability in the population. We cannot know the answer with certainty, we could only hypothesize.

So let me stop talking before we move on because the next part is the Algebra. And I think the key decision here is whether or not the SGIC thinks the value-added model in this particular instance, for SAT-10, should include 50 percent of the school component or not.

If it includes 50 percent of the school component, you would be making a new decision knowing that it has a disproportionate impact on the teachers and so your conversation would have to defend why that is true.

If your position is it’s unequal distribution of teachers, that’s possible. If it’s – well, there’s bias in the results then that’s not possible and you wouldn’t want to add the 50 percent in. So Kathy, let me turn this back over to you.

Kathy Hebda: Thanks, Harold, very much. Before I give it to Ronda to leave the committee through a discussion, there are a couple of other things that I want to just add to remembering that in addition to teachers earning value-added scores or earning scores potentially from this model, the principal earns of value-added score for the school as well.

And so when you think about the performance of the school and who earns the value-added score related to these results, it would also include the principal as well on a whole school result.

The other thing I would say too for the committee’s benefit just to keep in mind is that where is SAT-10 is similar to FCAT in that it’s in annual assessment that looks very similar or advances of grade by Grade 1 to Grade 2 and things like that and it has a lot of really heavyweight statistical properties.

It is, in fact, only offered in some districts, so as opposed to an FCAT assessment that we know is statewide and we – the committee included a portion of the school component because you want to make sure you still had a statewide comparison, it’s not possible to do a statewide comparison on the SAT-10 because it really kind of functions like a local district and, of course, assessment.

So in that manner, while it’s – there could be some reasons that as you can see from these data that a covariate adjustment model seems to work well in most cases with this assessment, there is one difference and that it’s only offered in certain districts.

And so we don’t know if that exacerbates the school component but (inaudible) not talking about all schools across all elements (inaudible) and, you know, certain numbers of high schools across the state and only certain districts, but it is something for the committee to keep in their mind is that again, this would be an optional model for a district to use if the district is using SAT-10.

Ronda, do you have any questions or anything that you want to ask of us before you leave the committee to the discussion?

Ronda Bourn: Yes, I want to know if the end result for the committee is that you are expecting a decision and a recommendation?

Kathy Hebda: I think to the extent that the committee can make one will absolutely accept it. If the committee can't make – can't come to a recommendation today or can't come to recommendation to either accept or not accept them all, the committee could come to a recommendation, do, you know, further study or (inaudible) more questions or something like that.

So since the committee is only a recommending body, I think you can work the committee through the discussion and see where it leads you.

Ronda Bourn: Okie dokie. So committee members, we’ll now open it up for questions, comments and discussions.

Stephanie Hall: Ronda, this is Stephanie Hall from the Brevard.

Ronda Bourn: Hi, Stephanie.

Stephanie Hall: Hi. I – one of my questions – I have a couple – one of my questions is how many – what was the sampling that was taken for – I know that they said they could only get a little bit of – they can get everybody in the State of Florida and I get that but how big was the sample that was taken?

Harold Doran: We’re going to get those – stand by, it’s going to take us a moment. We’re going to pull those numbers as we talk to you here, OK?

Stephanie Hall: OK, then I’ll pause. My other concern I guess is that when we look at the R-squared value for the FCAT, we are looking at an R-squared value for Reading and Math generally in the 70s on slide – on the other presentation.

We are looking in the 70s. Fourth grade was the one where it was hit at 66 and 63. And – but then when we look at the SAT-10, we have an R-squared value of 0.62, so again, that’s going down even lower and I – he’s – I don’t know – I can't remember who said that it was a good score.

But can you talk a little bit more about, you know, a great score versus a good score? I know that they said as they get closer to 50, that’s not so good, so I’d like a little further discussion on that.

And if you look at the – my other concern is that even when there is no school component on almost – even though it’s a slight decrease, it still is a slight decrease, you’re dealing with first and second graders. And so even without the school component factored in, there – it is not a – it looks like there is a disadvantaged for the teacher with that value-added score.

Harold Doran: OK, so two (inaudible). One, the N size, the number of students that was included in the analysis that we did for SAT-10 is 10,913, so essentially 11,000 kids.

Stephanie Hall: OK.

Harold Doran: So smaller than what we would have for an FCAT on grade analysis.

In terms of your second question, excellent, excellent question. You noticed that the – that the R-squared in the presentation Juan presented was typically around 0.7 or so, but you’ll also notice that the Grade 4 model had an R-squared of about 0.63, 0.66.

First, let’s be clear, what’s the difference between Grade 4 and all of the other grades? Well, with the FCAT, what made Grade 4 different than all the other grades is it has only one predictor variable. All of the other grades have two predictive variables.

Remember in Grade 5, we can use the Grade 4 and the Grade 3 predictive variable and that’s true in all the other grades. But in Grade 4, we can only use the Grade 3 predictive variable. That is the exact same thing happening with the SAT-10.

Here, we have only one available predictive variable, that is, we have the Grade 2 score and the Grade 1 as the predictive variable. So the most valid and direct comparison is comparing the FCAT R-squared in Grade 4 to the SAT-10 R-squared.

And the very small difference in the second decimal point is nothing. These are on the same order of magnitude, they’re both in the 0.6 region and that is what we see with the FCAT too and when we have only one predictive variable.

If we were to have a different test score, a second predictive variable, it would only increase the R-squared and that’s exactly why it happens with FCAT and that’s the exact same thing that’s happening here.

In terms of your third question when the school component is added in or not added in, there’s a little bit of a disadvantaged. I’m not sure exactly which subgroup you’re looking at either I would assume it’s either economically disadvantaged or non-white. The mean prior is pretty small, that doesn’t concern me at all, and in fact, it’s in an acceptable direction for the mean prior by the way.

Folks, if anyone were to say whether a ceiling effect here, even though this correlation is so close to zero, we could say, no, it’s not, but if you were to say anything at all, you would say, in fact, if you’re a teacher who received high performer kids on average, you tend to have a slightly teeny-tiny advantage and not a disadvantage with these particular results.

But looking at the ED and non-white, you’re right, that correlation of 0.12 is a little bit. It’s not big.

Stephanie Hall: Can you…

Harold Doran: It’s not an alarming correlation. But it’s big enough to where you might have to say, well, am I comfortable with it, that is where it has to fall in your (inaudible), are you comfortable with it? In my position, yes, it doesn’t alarm me but I’m only offering my position and I don’t live within the State of Florida.

Stephanie Hall: Can you, again, speak to the R-value, the closer we get to one, the better predictor? And then when you said that a 0.62 is a good score, what would not be an – what’s your range that you’re talking about?

Harold Doran: Yes, so the range in all of the states we look in is if we see anything less than 0.6, we immediately are concerned. If we see anything 0.6 or higher then we start to feel a little bit more comfortable. There is no exact threshold. There – and then the value-added literature, there was no number that says, anything below this is a problem, but as our own heuristic, as our own gage, we use 0.6.

We are particularly comfortable with this number because the model that we can compare this to most directly is the FCAT Grade 4. With the SAT-10, the R-squared is 0.62 and with the FCAT, it is also in the 0.6 region. So comparing this with the FCAT model, we are on par, we are right there, and that makes us very comfortable.

Stephanie Hall: Thank you.

Harold Doran: Great question.

Kathy Hebda: Do we have any other questions or comments?

Sandi Acosta: Yes, this is Sandi Acosta from Miami. I have a question probably for Harold. OK, I’m having trouble not with – you know, I understand this is what the data show but I’m having trouble understanding why it would be different, so different for the SAT versus the FCAT because other than sample size, the systemic problems you mentioned about rural schools and all that, (inaudible) the same issues?

Harold Doran: Hi, Sandi, how are you first of all?

Male: (Inaudible).

Harold Doran: Well, you’re in Florida, so could things be that? Hey, look, you know, I think – I think, you know, in terms of behavior of the model, we want to see the same patterns but clearly, we’re dealing with a very, very small population of kids here, 10,000 in a particular district.

And Kathy and Christy, you can speak to which districts this mainly involves – off the top of my head, I don’t know. I think it was mainly Hillsborough and a couple of other small districts. So it’s not representative of the state out of the FCAT analysis to us.

And so knowing that, it does cause me to think some of these differences could simply be because we have a different population of kids. In general, we want to see the same patterns, the same zero correlations and so on.

But if we don’t and if you are asking me, Harold, why do you think that’s true…

Sandi Acosta: Yes.

Harold Doran: … I would point you a couple of things. One, it’s a different dataset. It’s a different test. It’s in different grades. And how much those things add up to – well, how – you know, you get a little bit of a difference because it’s a different grade, a little bit of a difference because it’s a different test, a little bit of a difference because it’s different subset of kids.

You know, I don’t know how much that all adds up to but those things and there’s probably other differences that you all are more familiar with than I am. Those things cause me to say, all right, I should expect differences here somewhere.

Now, are these differences that I’m looking at in terms of the impact data so large that I would be horribly concerned and think that it puts you in a compromising position to recommend this model?

Well, with the school component, I think you’d have to come to a good reason why you would include the school component here. You would have to have a good rational and one that’s dependable.

Sandi Acosta: I agree with you there, I’m just trying to find my rational for making it…

Harold Doran: Yes, the other…

Male: (Inaudible).

Harold Doran: With the other school component, Sandi…

Sandi Acosta: Right.

Harold Doran: … I think – you know, I think if we say, “Well, hey, look, let’s start this up to different kids, different tests, different grades.” And by the way, those correlations aren’t so huge that it’s offering a clear systematic advantage. Other than that, I’m not sure I know what else would be causing for those small differences.

Sandi Acosta: OK, fair enough. Thank you.

Kathy Hebda: I think I heard someone else had a question.

Gisela Field: Yes, this is Gisela Field from Miami-Dade.

Kathy Hebda: Hi, Gisela.

Gisela Field: Hi. The first question, I thought that Miami-Dade was part of the sample, so if someone could check because we tested our K1 and 2 on Stanford so obviously if Miami-Dade wasn’t part of the sample then I think, Harold, you’re right in terms of the distribution of who is included, you may not have a good play-out of the ELL an d the ED kids.

But regardless, we obviously agreed that obviously your data indicates that we shouldn’t include the school component, and that’s what I will vote on. My only concern is that now we have different models for different teachers.

You know, under the traditional VAM, we do include the school component under a different indicator, we do not. And so it just opens up for teachers being measured a little bit different and we will have to explain that and, you know, define why these decisions were made and teachers may or may not understand whether it helped or didn’t helped them. So just politically that it’s a little bit harder is all I’m saying.

Harold Doran: We’re going to (inaudible). We may not be able to answer the question by the end of this call but we will certainly find out which districts were included in the analysis. And you’re exactly right, if some districts were included and they have a very small percentage of ED or non – you know, non-white kids then these results just simply might reflect the smaller sample sizes and are not really true of what’s happening if this model has been run with larger districts. We’ll have an answer for you later today, if not, sooner.

Juan Copa: So this is Juan. I just like to thank you again. You’re correct, you were – you did provide us with data. I cannot – at the top of my head, I know there were some – we receive data from various districts ranging a level of completeness and so AIR may – there were some decisions made between AIR and the department in terms of having a complete data set. So again, they’ll get back to you in terms of whether or not the Dade data made it into this analysis.

Sandi Acosta: OK, thank you.

Gisela Field: Let me open up a general question. If we have found that the school component offers a negative impact regarding Stanford, does this open the door for us to reconsider including the school component on the FCAT?

Harold Doran: I like the question, Gisela. I can answer pretty – you know, in a pretty straightforward way. If you go to Juan’s presentation and you look at the impact data, those correlations were all pretty much zero, if not exactly zero. And those results do include the school component.

So there is absolutely no reason to be concerned about the adding the school component in because it does not add any bias at all as the impact data seem to speak to us. So then one might ask the question, what if you took that school component out, would the result look any different? Well, the answer here is, no, the results would not look any different at all because they would only get closer to zero. The results with the school component now are already basically zero.

So with the FCAT, it creates no particular problem. Now, whether or not the SGIC wants to circle back and have that conversation, it’s up to you all but from a data perspective, the adding the school component into the FCAT does not cause or appear to cause any problems at all and the impact data are very clear there.

Brandon McKelvey: Could I respond to that?

Kathy Hebda: Yes.

Brandon McKelvey: This is Brandon McKelvey with Seminole County. So from a model perspective, it might not, but it does impact teachers in how they’re classified depending on how districts make their classification results.

One of the things that the school component influence were how high or low-performing teachers and high or low-performing schools were impacted by classification decisions. So even if it wouldn’t impact the model from the state perspective, it would have a substantive difference on the teachers that would be identified in the rating categories in our district systems.

Harold Doran: That’s a very good point. Let me just remind the SGIC of a conversation that was had, I guess it was a year ago, year and a half ago, that that’s true, we have heard from some districts, well, this teacher was affected negatively because when they had the school component and they are lower and that affects their classification and so on.

But the reason that the school component was added in at least 50 percent was because it was believed that teachers partake in the development of what happens at the schools school-wise so that they should receive some credit for that.

So if a school is – or a teacher has a lower value-added score because they’re in a low-performing school, it’s because as part of their collective efforts, that school is lower performing. So again, whether or not that’s the right thing to do a year after seeing the real world and so on, those are conversations for Juan, Kathy, Ronda and the SGIC, but that’s just a reminder of where we were a year and a half ago.

Ronda Bourn: So it seems to me we have several different things kind of popping around here that we need to systematically tackle. Juan and Kathy, are you willing for us to discuss revisiting the school component and covariate adjustment model that has already been approved?

Kathy Hebda: Ronda, this is Kathy. I think the first thing I would say is perhaps if the SGIC were to do that that you might do that at another meeting or after you’ve dealt with the SAT-10 discussion in today’s agenda since that wasn’t part of today’s agenda. I think that it might be well to deal with these questions first.

Ronda Bourn: OK. So maybe add that to the January face to face agenda?

Kathy Hebda: You can talk about anything you want to talk about it. So if you’d like us to add that to the agenda, we certainly will and we can have that as part of the discussion for January.

Linda Kearschner: Excuse me, this Linda Kearschner.

Ronda Bourn: Yes, Linda?

Linda Kearschner: I – just before we move off of this particular topic, I also wanted to refresh everyone’s memory or reflect on the fact that part of our discussion a year ago was even though it was advised that we would probably – and some of the sample data showed us that we would not see that variation, in other words, we would keep right to that zero line.

Part of the reason for including that school component information was for teachers, for people to see that it had that minor effect, in other words, to give assurance to the data having that additional information including it in there and seeing the results of that, so each of those particular components was important for people’s understanding of the model.

Ronda Bourn: OK, I guess – is there any more discussion on the reasoning behind the initial inclusion of 50 percent school effect in the FCAT covariate adjustment model? Hearing none. Does anybody want to make a motion to include this as a discussion item in January at our face to face meeting or shall we simply let it drop?

Anna Brown: This is Anna from Hillsborough, and I would like for it to be placed – I’d like to make a motion that it be placed on the agenda so that we can at least have a thorough discussion. I also would like to point out that not only do we have a discussion about the inclusion, we also had heavy debate about the weight of the inclusion and I think we should revisit those topics.

Ronda Bourn: There’s a motion on the table for us to revisit the – those – the inclusion of school effect and/or teacher effect whichever way we want to talk about it and the weighting of that in our January face to face meeting. Is there a second?

Gisela Field: This is Gisela Field, and I second the motion.

Ronda Bourn: So motioned and seconded. Any more discussion? Those in favor of placing this as an agenda item in January, please signify by saying aye.

Female: Aye.

Male: Aye.

Female: Aye.

Female: Aye.

Female: Aye.

Female: Aye.

Female: Aye.

Female: Aye.

Ronda Bourn: Those opposed? The motion pass it.

Now, coming back to the information on the slides, it seems to me that we have two things to make decisions about. One is do we feel comfortable even making a recommendation about the SAT-10 model now or do we need more information and then once we decided that I think that the second element is the inclusion or exclusion of the school component for the SAT-10 model?

Gina Tovine: Ronda, this Gina. I had a clarifying question first from Harold, please.

Ronda Bourn: Go for it, Gina.

Gina Tovine: OK, on the slides that Juan showed with the impact data which – the way he explained it was a result from the covariate model (inaudible) FCAT results showed the – each characteristic and their impact on the teacher VAM scores. So now, moving forward to the SAT-10 discussion that we just had with Harold and we were looking at the same type of idea, we were looking at the final piece of VAM score and the impact of these characteristics or the correlation rather on the teacher VAM scores.

So my question is really basic, is what model did he use? I mean I understand the inclusion or the exclusion of the school effect but what model was used actually to run those results, those at the same covariate model or did he use something different because I know when we went through that process for FCAT, I think we went to it like six or eight different models before we made that determination of the one we use?

Harold Doran: Great, good questions, Gina. The model that Juan showed, the results where they – of the final model chosen by the SGIC and those results, the impact results were based on the real person and second year operational results.

The results that we’re showing you with the SAT-10, the statistical model is the same exact statistical model as used for the FCAT. The only difference is that we’re using SAT-10 scores as predictor and outcome variables and not FCAT scores.

All the other covariates are exactly the same as also used in the FCAT. So we have tried to be as consistent as possible with prior work and decisions made by the SGIC as possible.

Gina Tovine: OK, thank you.

Ronda Bourn: Any other questions?

Lisa Maxwell: This is Lisa Maxwell. I just have one last question on this. Given this complexity of adding and layering these tests and I’m thinking particularly elementary, it’s getting more and more complicated to kind of aggregate the scores. Are you planning on coming out with some guidance for the districts on this? What’s kind of the thinking or the game plan for that?

Kathy Hebda: Lisa, this is Kathy. Are you asking whether or not we’re going to provide districts with guidance in addition to – let’s pretend that you decide on the SAT-10 model today or any other day? And this becomes an example optional growth model to use for districts who is also chosen to use SAT-10, that’s also the first choice the districts would have to make if they want to use SAT-10 or not.

Are you asking if we would also provide guidance on ways to classify performance in this area or how to use (inaudible) result?

Lisa Maxwell: Yes, it’s more about, you know, taking all of these tests and aggregating them across school sort of (inaudible) example.

Kathy Hebda: And there are – when we’ve had some of those discussions in the past, the districts are just using the VAM model. I imagine as we go forward and talk with districts who want to use SAT-10 and other kinds of things that we’ll have lots of discussion about aggregation method.

But I think one of the things I should say too is that that’s not something that the DOE has all the information on. We – a lot of that discussion comes from people like Brandon and others that serve in school districts and use this data all the time and find really good methods to use that we can share amongst districts.

Lisa Maxwell: Thank you.

Gisela Field: Kathy, let me make sure I understand the whole Stanford component. I guess what we’re doing here is that the AIR has generated some models and we as committee will possibly take some votes on like the school component. Those models will then be provided to the districts who use and give the Stanford to apply ourselves?

Kathy Hebda: That’s correct. These are the optional – that we keep calling them optional VAMs, the (inaudible) being VAMs exactly but…

Gisela Field: Correct.

Kathy Hebda: This is one of the things that the department has to do is provide guidance, an option for school districts.

Gisela Field: So we will…

Male: (Inaudible).

Gisela Field: Right, so we’ll have to run the models ourselves so the data will be relative to whatever happens only, let’s say, in Miami-Dade County.

Kathy Hebda: That’s a good point, Gisela. And in fact, it really has a lot – it could even relate to why you would or would not want to even use the school component because we’re not talking about schools across the state anymore. We’re really only talking about a district and, of course, assessment for – or chosen assessment for Grade 1 or Grade 2 or something like that.

Gisela Field: Right, right, and it also gets compounded, of course, as we move towards adding models for AP and IB. You could have multiple school components for a teacher who is teaching, you know, 10th grade, AP English as well as the kids taking the FCAT.

The only question I had is whether technically districts that do have Stanford will be able to have the capacity to run these models or would the state be willing to run them where the districts are able to provide the data to the state and give us – using the same (roster) verification process that you do for FCAT, obviously, you’re doing it for all grade levels.

I would think it would be better if whatever models are available, if the state receive the data in a format that’s acceptable to them then they would run the same protocol using the different models and the different tests. It’s just a recommendation for you all to think about.

Kathy Hebda: Thanks. We appreciate that recommendation. We will – we could certainly consider that and also some of the things we thought about too were this one thing for our – the size of Hillsborough or Dade to run their own SAT-10 data but if you have a lot of small districts who were using SAT-10, they’re going to need a bigger N size and just maybe what is in their only small districts.

So there can be a lot of reasons to want to combine across districts or anybody else who’s using the SAT-10 across the state.

So, that's definitely part of the discussion going forward. Thanks for bringing that up.

Ronda Bourn: Any other discussion? Then does anyone want to make a motion on whether or not we want to make a recommendation to use this model or do we want more information?

Female: I’d like – this is Miami-Dade. I’d like to make a recommendation to use the model and also to take the device as AIR and not include the school component.

Ronda Bourn: OK. I think we need two separate motions for that. I may be wrong.

Female: OK. I’ll do the first motion as I recommend under Miami-Dade County that we accept the model that AIR has proposed for the Stanford Achievement Test.

Female: Any second?

Anna Brown: This is Anna. I second.

Ronda Bourn: Any further discussions? All those in favor of accepting the model as proposed by AIR.

Female: Aye.

Ronda Bourn: Please notify by saying I.

Female: Aye.

Male: Aye.

Female: Aye.

Female: Aye.

Ronda Bourn: Anybody opposed? OK, the motion carries. Now, can we entertain motions for the inclusion or exclusion of a school component given the data that AIR has shared?

Gisela Field: This is Gisela Field again. I proposed that we exclude, do not include the school component in the model.

Stephanie Hall: I second that. This is Stephanie Hall.

Ronda Bourn: Hey, Stephanie. Any further discussion? Those in favor of excluding the school component from this stat change model, please signify by saying I.

Female: Aye.

Female: Aye.

Male: Aye.

Female: Aye.

Male: Aye.

Female: Aye.

Ronda Bourn: Anyone opposed? The motion carried. Kathy, what do you need from us next?

It looks like we are on to EOCs.

Kathy Hebda: It doesn’t come through.

Harold Doran: All right. Kathy, would you like me to start?

Kathy Hebda: Go right ahead.

Harold Doran: We’re now on slide number 23. And, of course, there are multiple EOC around the course tests and the one that we will talk about today is Algebra I. We’re on the slide number 24.

A little bit background. The 2010-11 score year was the first time the EOC for Algebra was administered. In addition to that, the students took the comprehensive math FACT scores in grades 9 and 10 and those are no longer administered in those grades.

Slide 25, all students who were enrolled in the course were required to take the exam. According to state law, the EOC for Algebra must have counted for at least 30 percent of the students’ score if they are entering grade 9 in 2010-11. At the state level, there are no states attached to any of the test takers.

There may have been local stakes but not the state level. And there may be student efforts that differ by grade, which could impact the VAM scores. Going to slide number 26, we have the N size this year.

I’m going to also use this as an opportunity to introduce some work to AIR this year. Now, in total, the all-students model has 182,889 kids. Then, you see grade 8 and below, the 65,000, 99,000 and 18,000.

So, first off, if you were to add up the grade 8 and below, the grade 9, the grade 10; you would get 182,000. What we did is we ran the EOC model in four different ways. In version one, we call it the all-students model.

What that means is every student who took the EOC was included in one big analysis, all at once. Then, we ran a second model called great 8 and below and we identified those students who took the EOC in grade 8 or below and we ran and included them in a model all by themselves. Then, we identified those students who were in grade 9 and random model alone for them and then another model for grade 10 and above.

My understanding is that he typical grade or the expected grade is grade 9 and you could see that reflected in the sample sizes that about 100,000 kids took the algebra tests in grade 9 and the end sizes are smaller in the other grades. Moving on to slide number 27, here we have to offer three-year rate definition of what we mean by the term prior score. With the SAT-10, the definition of prior score was always the same.

All the kids were in grade two and so their prior scores will be grade one. But for the algebra EOC, we have different prior scores depending on what grade they took the test, the algebra test in. So, for example, if the student was in grade 8 when they took the EOC – yes, the EOC, we would use the grade 7 and grade 6 FCAT scores.

We would not use – if a student took the test in grade eight, we would not use as a predictive variable their grade 8 FCAT score. You cannot use as a predictive variable a second test that was taken at the same time. You have to use something that comes from at least the prior test.

So, for 12th grade students, 11th grade students and 10th grade students or it turns out all the high school kids, we always use 8th grade and 7th grade scores as predictive variables. The grade 9 and 10 cuts will not be administered in the future. And so, they were dropped for this analysis.

And then, you could see for grade 8, 7 and 6; the scores that were used as predictive variables. Now, we always use the FCAT as the predictive variables in addition to those other covariates we introduced at the beginning. But they come from different grades depending on what grade the current – the students just currently have.

All right. Number 28, we’re about to show you some descriptive statistics. And we just also note that we see these issues in the VAM scores will circle back to that later.

But let me go on slide 29. Similar to what you saw before, these are boxes that reflect – this is aggregated by grade. The dot in the center is the median.

And what we see – the grade 6 sample size is very small. Let me just point that out. So, what we see is that the earlier student takes this algebra test, their EOC scores are much higher.

So, grade 7 students on average performed the highest on the algebra EO 1 – the EOC. In grade 8, they performed higher on average than students who are taking in high school. Students who are taking in grade 12 and 11 tend to perform the – have the lowest average scores.

That makes sense. We understand that students who take it in those later grades are probably either students who have remedial instruction and just simply weren’t prepared to take the algebra course early on. They may be new to the state and, hence, not as familiar with the curriculum or they may be retakes.

So, this follows a pattern that appears to be sensible and reflects what is plausible in the real world. The grade 8 students and the grade 7 students are probably very advanced. That’s why they’re taking it so early and their scores reflect that.

The next slide on slide 30 is the same exact type of plot. But, here, what we’re showing you is their prior year FCAT scores. The first predictive variable that's used.

So, again, we see…

Gisela Field: Harold, I’m sorry to interrupt, but they haven’t moved the slide. You’re still on 29.

Harold Doran: Oh, thank you, Gisela. I’ll wait until it’s there. Someone let me know when you're all there.

Kathy Hebda: Slides now on 30.

Gisela Field: It’s there.

Harold Doran: OK. Thank. Kathy. So, now, looking at slide 30, same exact type of plot, but this is plotting the prior FCAT score.

The story to tell is that students in grade 7 and 8 who took the algebra tests had the highest prior year FCAT scores relative to all the other kids in grades 9, 10, 11 and 12. So, kids in grades 11 and 12 have lower prior FCAT scores than students who took the FCAT in grades 7 and 8. So, again, we see two patterns here on slide 29 and 30.

The later a student tends to take the algebra, they have lower algebra scores and also they have lower FCAT scores. Moving on to the next slide, slide 31. Here, the exact same kind of plot but disaggregated by grade and by gifted status.

And so, we can see for example just looking at one of those plots, the grade 8 panel, we see that students who are gifted in grade 8 performed better on average than students who are not gifted. That’s also true in grade 7, grade 9, 10, 11 and 12. Again, just like we showed you with the Stanford 10, we see a very similar pattern here.

Students who are gifted on average tend to do better on the algebra test than students who are not. No anomalies there. Next slide, slide 32.

Students who are identified as SWD on average tend to have lower algebra scores than students who are – who are not SWD. Same as we understand for them. Same as we see with FCAT.

No anomalies here. This is just on the algebra score, not on the value-added scores. Slide number 33, this is a very important slide.

Let me remind you at the very beginning of today’s conversation. One of the things that we indicated is that it makes for a good predictor is we want the test score to have a high correlation with the – with the outcome score. What this slide shows you on slide 33 is the correlation between the FCAT score and the algebra score.

The grade 6 insights again is very small. And so, let me move on to grade 7. You see in grade 7, 8 and 9; the correlation between the prior FCAT score and the EOC is always above 0.6.

You see in grades 10, 11 and 12 that is always about 0.5. Why is that? Remember, for students in grade 7, 8 and 9; we have an FCAT score that is available just last year.

For grade 9 kids, we use their grade 8 score. For grade 8 kids, we use their grade 7 score and for grade 7 kids, we use their grade 6 score. But for students who are in grade 12, the prior scores, their 8th grade score that was collected four years ago.

A lot of learning has transpired between the 8th grade and 12th grade. And so, there is not an expectation here that the FCAT score would in fact be a good predictor of their algebra score. Not reflected in these correlations.

And that’s why. Grade 10, their prior score is 8th grade. Two years ago.

A lot has happened in those two years. And you can see that these correlations when the time lag is more than a year, the correlation drops. Moving on to slide 34.

Just the summary here. We see that these are just the things I pointed out to you as I walked through these slides. We see large differences between groups.

But exactly as we would expect, nothing that’s alarming. The reason we look at this is to answer the question to the data followed the same patterns we would expect. The answer here is yes.

The only initial concern because we have to answer the red flag question. Remember, that’s the second thing we’re looking at with the descriptives is the correlation in the prior score and the algebra score for students in grades 10, 11 and 12. Those correlations are low.

Lower than what we would hope for. That is one possible red flag. And so, now, we’re going to move on to the results and see if those red flags caused for there to be any issues.

I'm looking at slide number 35. And just like we showed you with the SAT-10 and we’ve shown you before with the FCAT, we’re looking at the teacher and school standard deviations. Remember, one of the things that we showed you before or one of the indicators that we describe before is that the teacher bar should be taller than the school bar.

There’s a reason for that. We expect for teachers to be more different from each other than schools are more different from each other. If we were to do a level score analysis, not a VAM, no covariates, just take this year score and do an analysis and look at the differences between teachers and the differences between schools, we would see larger differences between schools than we would between teachers.

In that world, that would be – that would be expected. But in a value-added growth world, you expect the opposite. You want the teacher bar to be larger than the school bar.

So, now, moving on to slide number 36, what we see is the yellow bar, I'm sorry to the labels, the yellow bar or the green bar here is the school bar. The red bar is the teacher bar. Other than in grade 9, the school bar is always larger than the teacher bar.

And that is the opposite of what we saw with SAT-10. It’s the opposite of what we saw with FCAT in all grades and subjects. The difference in grade 9 is also very, very small.

The data are very, very similar. So, that the – even though, the bar for grade 9 teachers is a little bit bigger, those numbers are pretty close to being similar. So, this is one red flag.

Thoughts on the parents components, one of the things that – this is just describing the level score analysis. If we did the level score analysis before, if you take the level scores and you look at the differences between schools and the differences between teachers, this is what you see. We see larger differences between schools than we do differences between teachers.

Let me move on to slide number 38 and let’s look at the second diagnostic, the R-square. The R-square is an indicator of model fit. At the time we wrote these slides, the R-square on some of it, the other FCAT results was about 0.6 to 0.64.

We’ve seen larger ones with the newer data sometimes up in the 0.7 range. Moving on to slide number 39, here are the R-square values for the algebra EOC. There are four R-squares, one for each of the models that we run.

Here, we see 0.59 for all kids, all grades. 0.51 for grade 8 and below, 0.49 for grade 9 and 0.31 for grade 10 and above. These are very low R-squared values.

Now, let’s move to slide number 40. This is the box that we’ll skip on. And what I did is I took the teacher VAM scores and I disaggregated them by grades.

And one of the things that we see that the grade 7 and 8 teachers on average have the highest VAM scores. And, again, the teachers in grades 12 and 11, on average, have the lowest VAM scores. Let me just tell you my definition of a grade 11 teacher here.

There are no or if very few, teachers who have classes that are entirely grade 10 or entirely grade 9. So, in order to produce this plot, I call a teacher a grade 9 teacher if most of their kids were in grade 9 or if most of their kids are in grade 11 teacher. This is just a heuristic to get a sense of what’s happening.

It’s imperfect because there are no classes that are perfectly homogenous with respect to grade. But there’s an important story here. Remember early on, we saw that students in grades 7 and 8 have the highest algebra scores and students in grades 11 and 12 have the lowest average algebra scores.

What we see here is that those teachers have the – in grades 7 and 8 have the highest VAM scores based on the tests and the kids in grade 11, excuse me, and the teachers in grades 11 and 12 have the lowest VAM scores on average. So, the VAM scores are following the same pattern as what we would call the level scores. This is a little bit of a concern.

It has – it goes into the question. If we were to ask the question, do teachers look good because of who they taught or because of what they did? You would have to come up with an answer to that.

The grade 8 teachers here appear to have the highest VAM scores. But we also know they received the kids who are performing the highest. So, is this really a reflection of the teacher or is it a reflection of who they taught?

The other question is maybe it’s true in the State of Florida and I don’t know that the teachers who are the best at teaching algebra are in the middle schools and not in the high schools. And that could be the reason that we see this pattern. Again, these are just hypotheses.

There’s no way to know with certainty which is true, but we have to – we have to at least explore why these trends seem to be true. Let’s look at slide number 40. I'm going to skip 41 and move directly to slide number 42.

Here and I don’t know what I'm doing now is just 41 just introduces the impact data. We have that slide earlier. Let’s look at the impact data.

Here we see on slide 42, the correlation of the relationship with the percent SWD in a class. And, by the way, you see one – you see one panel called – at the top called great and below mean prior. What we did early on, I'm sorry I didn’t mention this before, is we saw that the average prior score seemed to have an impact on value-added estimates.

And so, we ran one model where we control for the average prior score of the class. Now, for the most part, these regression lines were pretty flat. The grade 8 and below looks a little max flat.

Male: (Inaudible).

Harold Doran: But that’s only because it’s being pulled down by the 100s all the way to the right hand side. I’ll show you the correlations in a little bit. The correlation there is really small.

Stephanie Hall: Harold?

Harold Doran: But back to the relationship between the VAM score and the SWD status here.

Stephanie Hall: Harold, I'm sorry. I have a question. This is Stephanie Hall.

I – when you say that you controlled for the prior year data, what is that – can you explain that?

Harold Doran: Yes. So, we have two variances in the model like prior test scores and SWD status and so on. That’s our covariates.

What we do here in this model is we also use the average – the average higher score as another covariant and we’re trying to use that as another way to help us level the playing field. In other words, we know that futures differ because they receive kids who, on average, performed differently. And so, in order to try and level the playing field, we use the average score.

So, a grade 9 teacher receives a group of 8th grade kids and we use the average 8th grade performance of that class as a covariate. And that’s in an attempt to try and level the playing field to account for the differences in who teachers received. Another way to think about it is statisticians are sometimes like to talk about pure effects and this is one way to also try and control for what’s called the pure effect.

Moving on to slide number 43, same thing here. We see almost no relationship. Very, very small between the VAM score and the percentage of kids who were ELL in a class.

The grade 8 slope was a little bit different than the others, and that’s why I noted here the correlation is only minus 0.8. That’s timing. That’s a – the slope of – in the way this graphic is created makes it look bigger than it actually is.

The correlation here is very small. Moving on to slide 44. This is the relationship with the percent economically disadvantage in the class.

And, here, the slope is always going down. Meaning, the relationship is negative in the correlation. And in grade 8, we see that the correlation is minus 0.34.

And a few slides will actually show you the data, so you’ll see what the correlations are across the board but minus 0.34. What does that mean? That means that as teachers have more students who are identified as economically disadvantaged, they, on average, tend to have lower VAM scores.

In this world, it correlates that 0.34 is very big. Moving on to slide 46 – 45, this is the correlation with gifted. Now, we’re controlling for percent gifted in the model.

Kathy Hebda: Harold, did we lose you?

Harold Doran: No, I'm still here. But I wasn’t sure if you all could hear me because of the buzzing. All right.

It sounds like you can hear me.

Kathy Hebda: We can. Yes.

Harold Doran: All right. So, looking at slide number 46, we see the relationship with the percent gifted. It’s not here with correlation, but there is a slightly positive correlation.

Again, I’ll show you what the number is in just a few slides. But it’s an upward trend. Meaning, that teachers who have more gifted students in their classes tend to have larger VAM scores on average.

Next slide is the percent non-light. Here, there is almost no relationship at all. There is no particular advantage or disadvantage to any teacher given who they teach from that perspective.

Now, let’s go to slide 47. And slide 47 shows the relationship between the VAM score and the average prior achievement. Now, in some respect, we already showed you this.

We showed you the average prior achievement on the algebra in the FCAT and we showed you that those kids tend to have the highest prior scores and that teachers in grade 7 and 8 tend to have the highest VAM scores. Here, it’s just showing the same thing that the relationship is positive.

Meaning, that teachers who receive higher performing students on the FCAT tend to have higher algebra VAM scores. The correlation is about 0.39 in the grade 8 and below model. Let’s go to slide 48 and let’s look at these correlations.

Now, just to interpret these correlations for you, let’s look at the all column and go down. The all column shows the gifted is 0.15 is positive and that's – as a reminder, that’s the model that includes all of the kids in all grades. For FRL, free reduced lines are economically disadvantaged.

It’s negative. It’s minus 0.29 in the all model. Now, if you go across the row, you see what that correlation is in the different models when it includes different groups of kids.

That’s how you interpret this correlation table. So, we see in the prior score column that the correlation is pretty large. It’s 0.3, 0.39, 0.13 and then it gets very small in grade 10 and above, 0.04.

That reflects the fact that the prior scores are not good predictors of the ounce of the test. That’s why there is just not much of a relationship there. But in the early grades, grades 8 and below, there is a strong relationship there.

Let me move to slide number 49. Let me just summarize what’s happening in the slide 49 here. Eventually, slide 49 is saying this.

There is an advantage that this four certain groups of teachers, in particular with the – with the correlation between the mean prior achievement is indicating the teachers in grade 7 and 8 who receive higher performing kids or not just 7 and 8 but in all grades, teachers who receive higher performing kids on average tend to have higher VAM scores. Why that is true is hard to know. Is it because of what the teachers did?

They're good teachers. They just happen to also receive good kids. Or is it because of who they received?

In other words, they may not be great teachers but they were lucky and that they received very motivated and high performing students and their VAM scores reflect that.

Anna Brown: Harold, this is Anna from Hillsboro. I have a question. May I ask it now?

Harold Doran: Yes. Hi, Anna. How are you?

Go ahead.

Anna Brown: I'm good. I wanted to just comment and I want to not delay or belabor your presentation. But I want to ask this.

In our model, you know, we’ve worked through some of this. We did find some of the similar issues that you're finding with our algebra EOC models. And as a result, what we found work very positively for us is that we know by placements, data, etc that the artifact we’re seeing is really truly related to the varying kinds of students that take algebra in 7th and 8th grade versus the students who take algebra in 9th grade and beyond.

And in order to have a more equitable distribution and a more equitable model, we actually divided them in random separately. We and our consultants decided that we could not compare in the same model teachers that teach algebra in middle school and teachers that teach algebra in high school. So, we ended up running them separately.

So, we have a model that provides residuals for teachers that teaches in middle school and a separate model that provides residuals for teachers that teaches in high school. And we found that to be much more correlated with all of the other measures as we go through. So, my question then is, have you done any of those statistics and is that included here or are we looking again only at across the board?

Harold Doran: Anna, that's exactly what we’re looking at, at slide 48 for the exact same reason. We call it chunking. We ran different models depending on the grade.

So, we run one model called all that includes all students in all grades.

Anna Brown: Yes.

Harold Doran: For the same reason as you noted, we ran a separate model called grade eight and below because the – our men and some tact members and so on. Well, maybe it would be sensible because those kids are maybe not comparable to high school kids. Maybe they're different.

Let’s run a separate model. And so, that’s what we did with grade 8 and below. T he grade 9 and the grade 10 and above.

So, it sounds like we did something very similar to what Hillsboro did. Now, when we look at the impact data even still, even just grade 8 and below, we still see that the correlation between grades prior score and the VAM score is still positive at 0.39. So, even if we have a model, which is grade 8 and below and you compare the grade 8 teachers and grade 7 teachers to each other; there is still that correlation.

Now, again, whether or not that correlation means something bad or not, I don’t know. But we did – we did something very similar to what you did if not the exact same thing for the same reason.

Anna Brown: OK. Thank you.

Harold Doran: You're welcome.

Gisela Field: Harold, let me – this is Gisela Field. The data you used with the 11, 12 algebra scores with prior year predictor or was it the 10, 11 baseline algebra after the cut scores were set?

Harold Doran: Christy, do you remember that we use the 10, 11? Is that right?

Christy Honavitz: Yes.

Gisela Field: Yes?

Harold Doran: Yes. 10, 11, Gisela.

Gisela Field: OK. So, the 10, 11 were the ones that circuit without any ramifications. Meaning, it didn’t really matter – I know it counted for 30 students, right?

It didn’t really matter whether they passed or didn’t pass their test.

Christy Honavitz: But then we did re-run it with the 11, 12 data. So, we initially modeled it, made some decisions about the model and also re-run it with the 11, 12 data.

Gisela Field: The reason I'm asking is because when you run it with the 11, 12 data; the issue you're going to have is you got – you’ve got two different tests, right? The 12th graders is going back to like go 8 or 9 or taking the old FCAT. The 9th and 8th graders would have been taking the new FCAT with an equal percentile score.

And so, now, you're talking about very different prior year data, a different test whether or not your equal percentiled it. You're talking about a whole different tests being thrown in to the model depending on what grade level you were in.

Harold Doran: Oh, thanks for pointing that out. I'm going to confirm this after we hang up. But I think in the data we’re presenting to you here, it’s based on the reanalysis in the second year and we used for the student that have the new FCAT, we use that as a predictor.

The FCAT 2.0. And for students for who that score was not available, we use the FCAT 1.0. One, is that striking some right?

Male: I think it would have to be the case if they model any 11th or 12th grade scores, for example, or even 10th grade scores. But I think Gisela is on to a good point. I think we talked about this last time and that’s to – it is one of the potential hypotheses that, you know, we changed standards pretty significantly in Florida in math.

The difference between FCAT and FCAT 2.0 when it comes to math. And so, perhaps, the new standards there that’s 2.0, perhaps maybe a better predictor of algebra performance than the original FCAT. I know we had talked about exploring that as a potential issue and I’m looking at the grade 9 model here on the slides in front of me.

And you see that’s probably the less worrisome when it comes to – the least worrisome when it comes to the correlation. But you would think the 8th grade and 10 below model will also have the case that you're using FCAT 2.0 predictor and it doesn’t – the correlation is you’ve done this on your way there. So.

Gisela Field: The question I have is and exactly what I was thinking of one is that there is a mix match here of assessments. But if you looked at the 11th, 12, algebra data and only used one prior year as the predictor, which would put you on the FCAT 2.0 using a model similar to Stanford, would that give you different results? And, really, I want to say we can almost ignore 10 through 12 right now because there is no – there is no repercussions for those kids, right?

I mean, if you’ve got a 12th grader last year that’s still taking algebra, he sat for the EOC, it was an irrelevant thing for that child, for that student. It didn’t matter. So, the real question is the model stronger if you only use the new FCAT 2.0 to predict the algebra.

Harold Doran: I think that...

Gisela Field: What I hear off the record – what I hear from teachers is that the new FCAT is highly correlated to the algebra and geometry. Much more highly correlated than the old FCAT. Again, that’s teachers’ comments.

But I’d be curious whether or not that changes the statistics.

Male: That's a good – that's a good empirical question. We just limit this to only students that had prior year FCAT 2.0 data. And so, that would call sale pick out the grades 10 and above.

But, as you mentioned, I mean, maybe that’s FCAT to take and perhaps come up with other alternatives for those folks in the upper grade.

Gisela Field: I just don’t want to rule out – I guess what I'm saying is I don’t want to rule out that the FCAT is a poor predictor because we have such a live variety here. In combination, you also have retakers, you know, one on the FCAT grade 10. You’ve got the retaking – retakers necessarily with all the cut scores and passing and all of that. So, you have such a mix of data that I would hate to simply say that the FCAT here is not a true predictor when it may be a little different. That’s all.

Male: I think scores are going in.

Male: I had a question about the grade 8. One that’s more grade 8 prior. Some of these correlations begin to look like the ones that we saw above on the SAT-10 model.

Is it possible that we could give those FRL in prior score correlations close to the zero with the removal of visible component.

Harold Doran: So, we were betting that someone would answer that question. I actually have the data here. The answer is yes.

I will provide these to one, Kathy, and they can show this with the SGIC. Let me give you an example.

The answer is absolutely yes. If the school component – let’s suggest of the all-students model. And for those of you taking notes, I'm going to give you some correlations for the – for the results that does not include the school component.

So, for example, let’s look at a couple of the larger correlations. Look the mean prior. If the school component is not added in, then the correlation with the VAM score and the average prior is 0.19, 0.19, right?

So, it is much smaller. I’m flipping between my slides here. It’s much smaller than the 0.30 that we have here and the results that are showing here now, OK?

That’s 0.19. Now, 0.19 is still pretty big, right? I mean, it’s not – it’s not huge, but it’s noticeable, right?

If we look at the percent, this is for the all-student model again, percent economically disadvantaged. The correlations that you have in front of you, the FRL, that’s minus 0.29. But if I – if I were to exclude the school component and run that correlation, it would be minus 0.17.

So, good question. What does that mean? It means the same thing could be true in this data as we discuss with the SAT-10.

That the school component is capturing some bias. It’s pulling it out. But when you added back in to the teacher component for this model, it could be adding back in a little bit of bias.

So, generally, I can answer this question just generally. If the school component were not added in, all of the correlations would in fact be smaller.

Gisela Field: Harold, that not makes a lot of sense because when you're talking about middle school, you may have only had one teacher teaching algebra, right? So, now, you're looking at a school component for one teacher as opposed to, you know, eight or nine.

Harold Doran: We did look at that.

Gisela Field: You know, just especially – I think going to be prominent in the middle school because not every child was enrolled in algebra at that time, right? There are more kids enrolled now. But I can see why it would be a negative correlation for the school component.

Harold Doran: Let me – let me try and summarize you just a couple of things. Sorry to flip between back to my slide. I'm trying to summarize the – I think where we are.

Looking at slide number 50. The correlations with the percent gifted, it decreased. We – and what I mean by that is we initially gave Florida some results that did not include the percent gifted in the class and we saw that to have a large correlation.

So, then we ran a model where that was included as a covariate. And when you're included as a covariate, the core – the impacted correlation goes down. The correlation was economically disadvantaged goes down, right?

Now, remember, that’s one variable that cannot be controlled for statistically in the model by Florida law. The variance components have a reversal pattern. The school bar is larger than the teacher bar.

The R-square values show you that the – that the R-squares are relatively low. Those R-squares are low because the correlation with the prior scores are also lower and they’re very low – they’re very good than what we see with the FCAT. So, in this particular instance, there are a few red flags in the descriptive statistics.

There are few red flags in the disaggregated VAM scores. There are few red flags in the impact data. Now, what would result those issues is – could be by changing the test scores as Gisela has noted.

It could be that these results are fine and there are plausible explanations why we see what we do. It could be removing the school components from this model and also using different prior scores. So, there could be different solutions.

It looks like we may still need to do some work on this model. The results we have here show on their own, looked like a very strong model when we compare it to what we see with SAT-10 or FCAT.

Stacey Frakes: This is Stacey Frakes at the Madison County. May I speak?

Harold Doran: Yes, go ahead.

Stacey Frakes: I work at the – I'm at small rural county and I work at the only middle school and my concern with these results, especially in terms of school components is lat year, we only had 21 students at the middle school level who took the algebra I EOC. These were our best and our brightest. So, this did hurt to run the model and to determine whether school effect would help or hurt our school?

At this point, I dong think there is enough information. This year, all eligible students who are taking algebra I and, you know, I think that running the model may look very different. Like I said in a small rural county with a high population of lots of economic minority students.

Harold Doran: All right. Let me – let me answer that question. Let me be very clear about something because this is an important point for the SGIC to remember.

Now, we looked at this. On – there are always some schools that have one or even two teachers within the school that taught subject. In large part, that is not necessarily true across the state.

It is true in some instances, particularly those small rural areas as you know. Now, if we run the value-added model and we include – and we drop the school component altogether, this is very important. It is equivalent to adding 100 percent of the school component in to the teacher effect.

So, there is enough – there are enough teachers in the state and in schools, so that we can actually estimate and pull that thing out. I don’t think it would be a good idea to run a model where you drop the school component entirely because all you're doing when you do that is you are – it’s a de facto adding 100 percent of the school component to the teacher component. That has to go somewhere.

Models as well are not here. So, I'm going to push it in the teacher component. The thing that you could do is we could sell – we could still run the model that includes a teacher and school component.

That there are enough teachers in the state to estimate that parameter, but then we just don’t add the school component back to the teacher component and those are important distinctions. Now, on the other hand, if we had – if it were true that in every single school in the state, there was one teacher who taught the subject and one teacher only, then we could not estimate both a teacher and school component. And in that case, my recommendation would be different.

But that is not what we have here.

Male: This is also because if we remove completely without the school component, that would dramatically reduce the model fit, right, as well?

Harold Doran: I don’t know if it would produce the model fit. That I don’t know. But it is equivalent to adding 100 percent of the school component through the teacher component.

And as you can see with the impact data through my conversation, we know that would cause for the correlations in the impact data to get larger, not smaller.

Male: So, the teacher component standard deviations increased almost – would increase in a model with just – without the school component, almost equivalently to just in the parlance of slide 36 taking that green bar and adding most of it to the red bar.

Harold Doran: Exactly right. All of that green bar would just get pushed in to the red bar because the statistical model would say, well, you know, estimating this school thing anymore. I don’t know where to put it, so let me put it in to the teacher component.

It’s exactly right.

Gisela Field: Harold, this is Gisela. So, how is – how is this different from the Stanford scenario?

Harold Doran: It’s not. In the Stanford scenario, the model is exactly the same. There is a teacher component that just made a school component that’s estimated.

And what the SGIC designed it on earlier was let’s just not add school component to the teacher component.

Gisela Field: OK.

Harold Doran: In that particular instance, what I'm saying is the model should be the same. It should separately estimate a teacher component and separately estimate a school component and that you might consider doing the same thing you did to the SAT-10. Just don’t add the school component to the teacher component.

Gisela Field: OK. I'm sorry that I misunderstood. OK.

That's what I thought we wanted to hear. OK. OK, thanks.

Stephanie Hall: Harold, I'm – this is Stephanie from Brevard County. I'm concerned about the red flag of 10, 11 and 12 with – on slide 33 with their value being at 0.51, 0.52 and 0.51 respectively. I would be in favor of taking the school component away or adding 100 percent of the school component to – because I'm concerned about that prior year data because the – in 10, 11 and 12; their prior year data is so far away, it seems like that is what has dropped that number down.

Harold Doran: You're exactly right. The prior year data was so far away for those kids and so much have happened, which in the years in between, that that prior score is not in the data partly or that data – that covariate is just not as good of predictor as it is for students when the FCAT score was just last year.

Stephanie Hall: But is that going to be enough to make that a number; 0.51, 0.52, 0.51 increased to a point where we feel that that would be a good value or a good predictor like wit the other grade levels at 0.62 to 0.7.

Harold Doran: Well, there’s only two things to look at. I think I heard (Juan) Gisela say that the FCAT 2.0 is a better predictor of algebra performance. This particular kids don’t have the FCAT 2.0 scores and I don’t know when it would become available.

I don’t know when the FCAT 2.0 was first administered. Only time will tell. Once they have FCAT 2.0 scores, then we could rerun the correlation and see if it is in fact higher.

So, I think one was recommending a policy decision in Gisela. If it’s now – if we – if it is true that the FCAT 2.0 is in fact a better predictor, then if you subset mod – and one that only for students who have those scores, right now, it wouldn’t include students in grades 11 and 12. I don’t think.

I need to check that. But over time, it would include those kids and then the correlation would be higher. We’re hoping because it’s a better predictor.

Gisela Field: I think your answer than what you're saying Harold then I think what’s going to eventually happen is when 10th graders who were last year 9th graders who took the algebra and didn’t pass it and retake it and/or move on to geometry, then you're going to start to see better data for 10, 11 and 12. But bottom line is that to me there is – there is a board that divides this analysis; grade 9 and below versus 10 and above because it’s different to prior year data and different requirements.

So, it’s almost like not worth looking at, at this point. It may not be a true picture of what the data would show.

Stephanie Hall: I agree.

Harold Doran: OK. So, my presentation on the algebra is done. Ronda and (Juan) and Kathy, I’ll turn it back over to you.

I, of course, will continue to answer questions as the – as just you need them.

Ronda Bourn: Further comments, questions, concerns about the algebra I EOC model? Hearing none. I think what our job now is to do – is to determine whether or not we want more information about the algebra I EOC model before we make any motions.

And if there – if that is true, what information do we want?

Female: From Miami-Dade County, I think what we would like to see is to run the model again using only students that have FCAT 2.0 data even if it boils it to only one prior year data to see whether or not the model is strong.

Ronda Bourn: Harold, can we do that?

Harold Doran: Yes. I mean, technically, it’s all possible. And so, the way it works is one in cap, you tell me what to do.

Kathy Hebda: This is Kathy. The answer to that would then be yes.

Harold Doran: Good.

Anna Brown: This is Anna. I have another question if it’s OK.

Ronda Bourn: Go ahead, Anna.

Anna Brown: I’m just wondering – I don’t – my brain is not remembering well. So, we may have already discussed this at a previous meeting. But one of the things that we found useful in Hillsboro for most of our models when we’re talking about 11th and 12th grade students because there is always going – in many cases, going to be gaps in pre-measure being available is that we have combined and used several pre-measures.

So, we use not just FCAT, but we – and I’m not sure if this is available in a state database for everyone. But we also use PSAT scores as a pre-measure when they’re available. We also use SAT, ACT, those types of college readiness scores when those are available and we found that by including multiple pre-measures and/or one when the other is not available, we get a much stronger model and we have a little bit better prediction for our 11th and 12th grade students.

And we also have the ability because we’re using so many pre-measures, sometimes there is the ability to impute a score for a missing score. And I'm just curious as to what Harold thinks about that.

Harold Doran: I agree. I’d like (Juan Cal) to speak to what data are available. But if other prior scores, the ACT, PSAT, some other measure were available and if were more recent, it’s tractable and we could answer the question, is it a good predictor because we could certainly test it out.

But whether or not they’re available in the safety to warehouse, I would only defer to (Juan) and (Kathy).

Kathy Hebda: This is Kathy. (Juan) is trying to check on that right now.

(Juan Cal): I will say that I know we certainly have student level; ACT and SAT data. I do not think we have student level PSAT plan data that can be linked to your data to all the data. But I'm going to confirm that.

Anna Brown: Yes. (Juan), you should have PSATs for every 10th grader in the district because it is a state required assessment.

(Juan Cal): It…

Gisela Field: In other word to this point…

(Juan Cal): It’s available. PSAT plan is available to all 10th graders. But in terms of getting that data from college board at an identifiable level…

Anna Brown: I get. I see.

(Juan Cal): Actually...

Anna Brown: And we get that to College Board. We received a data tape. But you're right you may not get that from them for the entire stay.

(Juan Cal): Right.

Anna Brown: But we can provide it to you from our county.

Ronda Bourn: Any further information that we want on about the algebra I EOC model? Are there any motions any to be made in terms of this or are we – is the motion to bring back more information and examine this more in a subsequent meeting?

Anna Brown: Miami-Dade recommends that we do wait for additional data and then discuss, you know, the model again in a subsequent meeting.

Ronda Bourn: Is that a formal motion?

Anna Brown: Well, make it a formal one if it needs to be made. I just don’t want to put us in a position where there’s no need.

Ronda Bourn: Well, we’re going to vote on it. So – all right.

Anna Brown: So, I’ll make a motion that we hold off on the decision on the algebra until we receive additional data analysis that we can discuss.

Ronda Bourn: Any second?

Stephanie Hall: I second.

Ronda Bourn: Will you please identify yourself?

Stephanie Hall: Stephanie Hall from Brevard.

Ronda Bourn: Hey, Stephanie. Is the motion seconded? Is there any further discussion?

All those in favor, please signify by saying I.

Anna Brown: Aye.

Stephanie Hall: Aye.

Female: Aye.

Female: Aye.

Ronda Bourn: Any opposed, please say nay. And the motion carried. Kathy, (Juan)?

Kathy Hebda: OK. I just want to make sure that you were ready for us. I think the only thing we have right with us would be to make sure we know at least in general terms what you're interested in.

I have notes on a couple of things if you want me to read those back to you and then you can tell me if that’s correct or not.

Female: I think that would be great.

Kathy Hebda: OK, the first thing will be to run the model but only students who have an FCAT 2.0 prior score.

Ronda Bourn: Right.

Harold Doran: And that would be FCAT 2.0 exclusively. So, for example, if they have student whose immediate prior year score is FCAT 2.0, but the two years were moved with FCAT, we don’t want to include the FCAT. We want to keep prior performance, pure to FCAT 2.0, correct?

Gisela Field: Yes. That’s what I was recommending. This is Gisela.

Kathy Hebda: OK. That's number one. Number two was the discussion about running with additional predictors.

We were talking about ACT and SAT and if we’re going to check on plan or PSAT data to see what we have. So, to the extent that we can include any or all of those, we can do that if that’s the committee (explosion).

Ronda Bourn: And knowing that that would only apply to the high school students.

Harold Doran: Correct.

Kathy Hebda: Correct.

(Juan Cal): That's a good point, Ronda because then with the first condition of FCAT 2.0 limitation, those students would likely drop out to the model to begin with.

Kathy Hebda: So, perhaps we’re really talking about running two different analyses. One using only the students who have the 2.0 FCAT as the predictor variable and then a separate one with students who don’t but have other predictive variables that we could examine.

Ronda Bourn: That is what I was recommending.

Kathy Hebda: OK. We got that. And that the voting line maybe between 9th and 10th grade the way I think one of the committee members brought up on slide 33.

Looks like it might be. We’ll see how that works with the predictors. The other thing I understood that you're asking was whether we – as Harold was already talking about, provide you the impact data when we ran that model the same way we do where we estimated teacher component and the school component that we in fact leave all of the school components out and don’t put 50 percent back in to see if the impact data changes.

Was that something else you wanted done?

Ronda Bourn: I think yes. We all wanted to see that as well and see if it has the same effect that the SAT-10 did.

Female: Yes, I agree with that too.

Kathy Hebda: OK. So, those are the three things I took notes on. Is there something else, Ronda, that the committee would like us to do?

Ronda Bourn: I think there is also that question and I just want to clarify this that in my personal opinion, I think I heard this from the committee as well during the discussion portion was that you only use data from 11, 12 forward knowing that 10, 11 may have not been best effort of most students considering that it wasn’t high stakes at that point.

Kathy Hebda: OK.

Ronda Bourn: So, to clarify, you mean, using the EOC data from 11, 12 forward?

Kathy Hebda: Yes. Correct.

Anna Brown: Miami-Dade agrees on that also.

Ronda Bourn: OK.

Harold Doran: And, actually, the first – the first point – that’s an excellent suggestion. But the first limitation on the FCAT 2.0 prior year, pretty much…

Ronda Bourn: We’d get at that.

Harold Doran: At a requirement.

Ronda Bourn: Right.

Harold Doran: Eleven, twelve on EOC.

Kathy Hebda: OK. Anything else, Ronda, from a committee on the algebra I?

Ronda Bourn: I think that that summarized all the things that we had talked about.

Kathy Hebda: OK. Ronda, can you call me in my office for that? OK.

Probably because we’ve hit 11:28. Ronda, is it OK with you and the members if we jump to the talking about the January meeting?

Ronda Bourn: I absolutely think that it is especially because I think that while in geometry discussions are going to be a long and protracted one.

Kathy Hebda: They are and they’re – some of the preliminary data starts to look just like the algebra I data. So, we’ll see if we can’t do some work on those in the meantime as well. Members, you can see that we are trying to schedule a January meeting face-to-face in Orlando with our usual plot at UCS to provide a lot of these data that you can see on the – and, of course, in other kinds of things that really lend us a face-to-face meeting.

We’re looking at the end of January. Do we know at this point if you still – both with another hold with some optional meeting dates for people to respond to. But if you could just give us an idea, as of last week in January out for anybody.

Ronda Bourn: We’re going to need a second to check calendars.

Kathy Hebda: OK.

Gisela Field: Gisela Field at Miami-Dade is fine for the last week in January.

Kathy Hebda: OK. Is there anybody who have to rule out the last week in January?

Ronda Bourn: Kathy, this is Ronda. I have to rule out two days of it.

Kathy Hebda: OK. What two days are those?

Ronda Bourn: The 29th and the 31st.

Kathy Hebda: OK. Anybody else who have to rule out dates the last week in January?

Tamar Woodhouse-Young: Tamar Young, the same dates.

Kathy Hebda: OK. I’ll give everybody a second to look their calendars.

Lori Westphal: Kathy, this is Lori Westphal.

Kathy Hebda: Yes.

Lori Westphal: And me too rule out the 30th.

Kathy Hebda: OK. OK, so, we’re going to send you a notice to put to mark on your calendars to see which is best, the Monday or the Friday and then we’ll set up a meeting and go from there. Ronda, is there anything else that you want to cover today before we close?

Ronda Bourn: No. I think we’re all set from the sense.

Kathy Hebda: OK. Well, members, I can’t say thank you enough for your participation today as always and I appreciate your patience with my giving these four or five different numbers to call in on today and finding the right one. I appreciate that.

And thanks to all with their presentations, and we will be in touch soon.

Harold Doran: Thank you, everybody.

Ronda Bourn: Thank you.

Female: Thank you.

Kathy Hebda: Hey, operator, we’re done with the call.

Operator: Thank you, ladies and gentlemen, this concludes today’s conference call. You may now disconnect.

END

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download