Nov 07 2011

Does Value-Added Correlate With Principal Evaluations?

Perhaps the most controversial issue in Ed Reform is whether or not it is fair to tie teacher evaluation to their ‘performance’ as defined by reformers as how their students do on standardized exams.  Since even reformers acknowledge that teachers aren’t able to take students from a low starting score to any absolute target of high performance, they have devised something that is intended to be fair.  It is known as ‘value-added.’

The idea, which has been around for about 30 years, is that there could be a way to compare how a teacher’s students do on some test with how those same students would have done in a parallel universe where they had an ‘average’ teacher instead.  If it is possible to make such a measurement, it would determine that teacher’s individual contribution to his student’s ‘learning.’

To someone who is not a teacher, this sounds reasonable enough.  When you’ve spent time in schools, though, you know some of the basic problems with standardized tests.  For one, if you’ve ever watched a class of students taking a standardized math test, they don’t seem to take it very seriously.  The multiple choice aspect leads to students not doing much scrap work as one answer seems to ‘jump out’ to them.  Also, since this test doesn’t ‘count’ for a grade, students might not do their best.  But since students for other teachers are doing the same thing, it shouldn’t matter since every teacher is operating with the same handicap.  Another issue, especially for math, is that the test is like a comprehensive final exam.  Maybe you did a good job teaching and your students did well on the individual unit tests throughout the year, but they simply ‘forgot’ the earlier material, or got overwhelmed by having to recall all these different topics.

This ‘Value-Added’ metric is the number one issue for the Ed Reformers.  They believe that teachers are lazy and will be forced to work harder when they know they will be judged on how their students do on these tests.  Race To The Top applicants had to change their laws so teacher evaluations would be tied, in part, to ‘performance’ in this way.  Now, for No Child Left Behind waivers states will also have to get this worked into their laws.  Already, many states have incorporated this.  Washington D.C. has it count for 50% of some teacher’s evaluations.  Colorado is working on a way to get it to be about half.  New York has passed a law to make it 40%.

The biggest problem with ‘Value-Added’, not everyone knows, is that this type of metric, even after 20 years of development, is extremely inaccurate.  Mathematica Policy Institute, who does the Value Added for Washington D.C. published a report called “Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains” for The Department Of Education which on page 31 estimated the error rate to be over 33%, meaning that over a third of the time it will give an effective teacher an ineffective rating and vice versa.

This is why on page 35 they advise against using this type of calculation in as strong a language they can considering they make a lot of money doing these calculations for D.C. schools.

Now many teachers are concerned that when value-added becomes a significant part of teacher evaluations, it will cause many teachers to be unfairly fired because of this high error rate. That actually is not my biggest concern with value-added. Since even Michelle Rhee admits that these scores shouldn’t be the ‘sole’ determination of teacher evaluation, it seems that 50% is about the most anyone wants to use them for. As I’ll demonstrate later in this post, the variation among these value-added scores are so random and small that they probably won’t cause anyone to get fired, and may even save some ineffective teachers who happened to, by no action of their own, add a lot of value.

No, the real danger of value-added is that it is currently being used as a way to judge and shut down schools. In New York City, 85% of the school’s report cards are based on these value added calculations. When a school gets an ‘F’ based on this, the shut-down machine gets fired up, as has recently happened to 47 schools in New York City. That such inaccurate measures are being misused for such drastic decisions is sad.

I like, from time to time, to read a research paper that is cited by the corporate reformers. One of the gurus of Value Added is Tom Kane from Harvard. He was Michelle Rhee’s advisor there. Well, he co-wrote an influential paper in 2006 called ‘Identifying Effective Teachers Using Performance On The Job‘ in which he argues that Value Added is accurate enough to prove that alternatively certified teachers are as effective as traditionally certified teachers. He cites on page 19 a different report which, he says, proved that the Value Added evaluations correlate with the standard Principal evaluations. This intrigued me since I had always wondered if anyone had checked that. If they correlate a lot, then why do we even need them? Why not just use the principal evaluations we already use and save all the money and stress it takes to do it the other way? If they don’t correlate, then we have to wonder how accurate they are? Is the problem really that principals don’t have the ability to accurately assess their teachers?  The stats he cites, though, do not sound very convincing.

So I looked up that 2005 report “Principals as Agents: Subjective Performance Measurement in Education” by Jacob and Lefgren. In this paper they claim that there is a significant correlation between the value added statistic and the principal evaluation statistic. Yet, when I looked at the appendix and saw their own scatter plot, I found that there is essentially no correlation between the two.

Notice that they distort the plot by having the principal evaluations go from -3 to +3 standard deviations while the value-added only goes from about -1.5 to +1.  Also, see that pretty much everyone is between -.5 and +.5 on the Value Added scale.  Had this chart been made ‘to scale’ it would be more clear that everyone gets about the same value added score.  There is little difference between the value added for teachers who had poor principal evaluations and for teachers who had good principal evaluations.  Notice that the sample with the lowest principal rating actually has a higher value added than the two highest rated people by the principal.

This is why the main danger with value added is the school ratings that use them for 85%, and not the teacher evaluations that use them for 50%. Before, when just principal evaluations were used, teachers were in one of two categories: effective or ineffective. Now, with this random stat factored in, there will be four categories: effective / high value added, effective / low value added, ineffective / high value added, and ineffective / low value added. So they will have to keep everyone except the ineffective / low value added. In essence, they have to keep more teachers than they would with the old system. They won’t be able to fire the ineffective teachers who happen to score high on the value added component.

12 Responses

  1. Michael Fiorillo

    Aside from its lack of statistical validity, value added (“the difference between the sale price and production cost of a product”) fantasies have a far more insidious quality: they help institutionalize the perception of students as commodities.

  2. Cal

    “They won’t be able to fire the ineffective teachers who happen to score high on the value added component.”

    Wouldn’t this mean that the principal didn’t like some aspect of the teacher that had nothing to do with classroom performance? It seems to me that this one is the reason to have value added–to protect against administrator bias.

  3. Lynn

    I appreciate Michael’s Fiorillo’s insight about the dangerous perception of students as commodities. The other term we have bantered around at my school is widgets–students are not widgets–they are humans. And despite their test scores, each one of them has an array of unique skills/talents/interests that a high quality educational system should 1. notice, 2. value and 3. nurture. It saddens me to no end when I see students (often dually served as ELL/SPED) with passion and talent for art or music completely down-trodden by this factory system of education that forces them to take 2 hours of Math and 2 hours of reading, Science, half a year of PE (which breaks the federal law) and half a year of SS. They HATE school–it does nothing to help them discover their strengths–it spends all day forcing them to do the things they struggle with and hate most. I suppose I should quality all of this by saying YES I am a teacher and YES I know everyone needs to be proficient readers and mathematicians. I am a teacher because I love students and I love the learning process. BUT these kids often drop out because they HATE school and get to spend no time doing what they are good at–and skills that society needs and values. Okay, rant is done.

  4. Tom

    I’ve been reading your blog for a while. I think that you are probably the most educated and least sensational of the anti-TFA crowd. That being said I had a major problem with this part of your post

    “No, the real danger of value-added is that it is currently being used as a way to judge and shut down schools. In New York City, 85% of the school’s report cards are based on these value added calculations. When a school gets an ‘F’ based on this, the shut-down machine gets fired up, as has recently happened to 47 schools in New York City. That such inaccurate measures are being misused for such drastic decisions is sad.”

    Without going into the actual standard deviations from the report, it is clear that the error rates of an entire school (many classrooms) will be far far lower than the error rates for any individual classroom

    • Brian Ford

      Not if there is a systematic bias. That bias could come from socio-economic status or it could come from feeder schools. You might remember last year that two elementary schools in Brooklyn were accused of improper coaching that resulted in higher than deserved test scores. When they go to the next school, they are recorded as starting at a higher level and the teachers at the school they fed into are thought less adequate because their students were recorded as having made less progress.

      This is not an isolated incident. In the incentive structure that is created by high stakes testing being applied to teacher and school assessment, cheating in endemic. Those that cheat help themselves at the expense of the next teacher down the line.

      We might also think that those that teach to the test and do not produce conceptual understanding do the same thing.

  5. Randy Traweek

    VAM was invented by Dr. William Sanders, a statistician working in the field of agricultural genetics at the University of Tennessee in the 1980′s. He was, quite literally, a bean counter. He believed he could use his statistical models used to produce plump, ripe tomatoes (and probably beans) to evaluate teaching. Then-Governor Lamar Alexander said, “Go for it.” Children are neither tomatoes nor beans and teaching is not agriculture.

  6. Ms. Math

    I went to an interesting statistics talk-they looked at various ways that teachers could change the “value-added” statistics by asking certain kids to be absent, moving kids to other teachers, asking kids not to do well on the first test, or not giving them enough time to finish it.

    They found that because so much of the statistical variation in student performance was due to factors like socio economic status and mother’s education level that small changes could dramatically change which teachers looked like they were performing well. The impression that I got was that this statistician didn’t think that value added was a good model for making serious decisions.

  7. Ray

    Maybe you did a good job teaching and your students did well on the individual unit tests throughout the year, but they simply ‘forgot’ the earlier material, or got overwhelmed by having to recall all these different topics.

    Do you have any idea how bad that statement makes you look? Doesn’t it occur to you that a good teacher would take the time to review material throughout the year? This is very basic. I am shocked that you would suggest it is OK for a teacher to let end of year mastery slide like that.

  8. Hi Gary,
    The best piece I’ve ever read about value-added is by John Ewing, the president of Math for America. He does a very detailed analysis of value-added. It’s available here:

  9. Brian Ford

    I’m glad you pointed out a few things, especially the way most teachers clump together in the middle.
    But I do have a criticism of many of the the critiques, especially those that label VAMs as junk science. Since I just published a book that is highly critical of Value-added measures, that might seem surprising, but they do have applications, just not in education.
    The reason many people respond positively to VAMs is because they have a good rep — Value-added methods and measurements have their origin not with Tennessee bean counters, but in micro and, esp., macro economics. The analysis focuses on how much value was added to the final product at each stage of production. Often this is as much an art as a science, requiring judgment on the part of the analyst as to how to divide the known value that is added by the entire production process among the different phases of production. After all, how much is a good design which produces ease of use worth compared to the diligent production regimes that result in reliable performance? How can you isolate one from the other? And isn’t the end value of the product not merely arrived at my added the two values, but by multiplying them? If good design rates a 10 out of 10 but diligent performance results in 1 out of 10 reliability, then I would probably prefer more reliability combined with clunker design.

    Of course, now that these methods are also being utilized in Education as part of a national movement towards teacher evaluation. accountability and evisceration, the joke is that not only is this problem of isolation ignored, but there is no accurate measure of the value added. They use a proxy method — test scores –, treat is as if it were accurate and ignore the negative effects of testing as if it were a neutral enterprise.

    This type of measure which mistakes the true end of teaching — informing an individual for a lifetime, not preparing them to bubble in an answer sheet, defiantly calls itself Value Added Modeling or Measures when applied to teaching. I think we would be better calling it joke science rather than junk science.

    Why? You sound like an old crank calling something that is regularly used in economics junk science. I much prefer to say, “VAMs when applied to teaching are an elaborate joke, thus the term ‘joke science.’”

About this Blog

By a somewhat frustrated 1991 alum

High School

Subscribe to this blog (feed)

Subscribe via RSS


Reluctant Disciplinarian on Amazon

Beyond Survival On Amazon

RSS Feed