Less technical post about VAM: What ‘value-added’ is and is not

The New York Times, yesterday, released the value-added data on 18,000 New York City teachers collected between 2007 and 2010. Though teachers are irate and various newspapers, The New York Post, in particular, are gleeful, I have mixed feelings.

For sure the ‘reformers’ have won a battle and have unfairly humiliated thousands of teachers who got inaccurate poor ratings. But I am optimistic that this will be be looked at as one of the turning points in this fight. Up until now, independent researchers like me were unable to support all our claims about how crude a tool value-added metrics still are, though they have been around for nearly 20 years. But with the release of the data, I have been able to test many of my suspicions about value-added. Now I have definitive and indisputable proof which I plan to write about for at least my next five blog posts.

The tricky part about determining the accuracy of these value-added calculations is that there is nothing to compare them to. So a teacher gets an 80 out of 100 on her value added — what does this mean? Does it mean that the teacher would rank 80 out of 100 on some metric that took into account everything that teacher did? As there is no way, at present, to do this, we can’t really determine if the 80 was the ‘right’ score. All we can say is that according to this formula, this teacher got an 80 out of 100. So what we need to ‘check’ how good of a measure these statistics are some ‘objective’ truths about teachers — I will describe three which we will see if the value-added measures support.

On The New York Times website they chose to post a limited amount of data. They have the 2010 rating for the teacher and also the career rating for the teacher. These two pieces of data fail to demonstrate the year-to-year variability of these value-added ratings.

I analyzed the data to see if they would agree with three things I think every person would agree upon:

1) A teacher’s quality does not change by a huge amount in one year. Maybe they get better or maybe they get worse, but they don’t change by that much each year.

2) Teachers generally improve each year. As we tweak our lessons and learn from our mistakes, we improve. Perhaps we slow down when we are very close to retirement, but, in general, we should get better each year.

3) A teacher in her second year is way better than that teacher was in her first year. Anyone who taught will admit that they managed to teach way more in their second year. Without expending so much time and energy on classroom management, and also by not having to make all lesson plans from scratch, second year teachers are significantly better than they were in their first year.

Maybe you disagree with my #2. You may even disagree with #1, but you would have to be crazy to disagree with my #3.

Though the Times only showed the data from the 2009-2010 school year, there were actually three files released, 2009-2010, 2008-2009, and 2007-2008. So what I did was ‘merge’ the 2010 and 2009 files. Of the 18,000 teachers in the 2009-2010 data I found that about 13,000 of them also had ratings from 2008-2009.

Looking over the data, I found that 50% of the teachers had a 21 point ‘swing’ one way or the other. There were even teachers who had gone up or down as much as 80 points. The average change was 25 points. I also noticed that 49% of the teachers got lower value-added in 2010 than they did in 2009, contrary to my experience that most teachers improve from year to year.

I made a scatter plot with each of these 13,000 teacher’s 2008-2009 score on the x-axis and their 2009-2010 score on the y-axis. If the data was consistent, one would expect some kind of correlation with points clustered on an upward sloping line. Instead, I got:

With a correlation coefficient of .35 (and even that is inflated, for reasons I won’t get into right now), the scatter plot shows that teachers are not consistent from year to year, contrary to my #1, nor do a good number of them go up, contrary to my #2. (You might argue that 51% go up, which is technically ‘most,’ but I’d say you’d get about 50% with a random number generator — which is basically what this is.)

But this may not sway you since you do think a teacher’s ability can change drastically in one year and also think that teachers get stale with age so you are not surprised that about half went down.

Then I ran the data again. This time, though I used only the 707 teachers who were first year teachers in 2008-2009 and who stayed for a second year in 2009-2010. Just looking at the numbers, I saw that they were similar to the numbers for the whole group. The median amount of change (one way or the other) was still 21 points. The average change was still 25 points. But the amazing thing which definitely proves how inaccurate these measures are, the percent of first year teachers who ‘improved’ on this metric in their second year was just 52%, contrary to what every teacher in the world knows — that nearly every second year teacher is better in her first year. The scatter plot for teachers who were new teachers in 2008-2009 has the same characteristics of the scatter plot for all 13,000 teachers. Just like the graph above, the x-axis is the value-added score for the first year teacher in 2008-2009 while the y-axis is the value-added score for the same teacher in her second year during 2009-2010.

Reformers beware. I’m just getting started.

Continued in part 2 …

Part 1

Less technical post about VAM: What ‘value-added’ is and is not

Thanks for the post. Where did you access the raw data? Or did you have to request?