Sep 27 2011

Indisputable proof that NYC school closings based on statistically invalid metrics

I knew that if I had enough patience the corporate reformers would eventually let slip some data which would prove, once and for all, how unscientific are the metrics they’ve been using to shut down schools.

That day came earlier this week. I’ll encourage anyone to recheck my calculations, just in case, but if I’ve found what I think I’ve found, it will be the ‘death blow’ to the New York City ‘value-added’ model they use to rate and close down schools.

Schools are shut down for getting multiple years of poor progress reports. The progress reports are what give schools their letter grades, A, B, C, D, and F. The way the progress reports are calculated are as follows: 15% is based on school environment, 25% is based on student performance, and the majority 60% is based on something called student progress.

This is defined by the DOE in the guidebook as

I. Student Progress (60 points): measures how individual students’ proficiency on state ELA and math exams has changed in the past year, as they move from one grade to the next. The Progress Report measures individual students’ growth on state English and Math tests using growth percentiles, which compare a student’s growth to the growth of all students in the City who started at the same level of proficiency the year before. A student’s growth percentile is a number between 0 and 100, which represents the percentage of students with the same score on last year’s test who scored the same or lower than the student on this year’s test. To evaluate the school, the Progress Report uses the median adjusted growth percentile. The metric is calculated for all students and for students in each school’s lowest third, in both ELA and mathematics. Each of these four metrics counts for 15 points.

The premise is that since it is unfair to blame a school for getting kids with low starting scores, they want to measure the ‘growth’ or how the school ‘moves’ students. So what they do for each student is take his starting score and ending score. Then they check all the other students in the state who had the same starting score and calculate out what percent of those students this student did better than on the test a year later. Then they take the median of all the students and that becomes the schools student progress score (they do it with math and English and then also with math and English for lowest third of students) and this becomes the student progress score which makes up 60% of the progress report which determines if the school gets an A, B, C, D, or F, and which can lead to the school being shut down.

New York City just released the progress report database for the 2010 to 2011 school year. To see how good of a statistic this ‘progress’ metric is, I though I’d compare how different elementary and middle schools did when they were scored in the 2009 to 2010 school year. Both files, if you want to re-check my calculations, are available here.

Now, I always suspected that this number didn’t really measure much. When I got the two databases, I sorted the 1,100 schools by this progress score from lowest to highest for both years. Then I combined the databases to see how the schools had changed relative position in a one year time. If this metric was at all reliable, there would be some kind of correlation between the two numbers. So a school that was 100th from the bottom in 2009-2010 would probably be pretty close to that number in 2010-2011. After all, they’ve got mostly the same students and mostly the same teachers so there shouldn’t be a major difference.

So after I got all my data sorted out, I made my scatter plot and instead of getting the linear correlation that one would expect, I got this:

As anyone can see, there is, if any, a very ‘weak’ correlation between the two years.

A summary of some of the main results:

Out of 1,100 schools

266 moved under 100 spots.

218 moved between 100 and 200 spots.

164 moved between 200 and 300 spots.

127 moved between 300 and 400 spots.

96 moved between 400 and 500 spots.

84 moved between 500 and 600 spots.

75 moved between 600 and 700 spots.

40 moved between 700 and 800 spots.

24 moved between 800 and 900 spots.

8 moved between 900 and 1000 spots.

6 moved between 1000 and 1100 spots.

So over 60% of the schools moved over 200 spots in one year!

To me, this is the most rock-solid proof that this metric is completely unreliable. Schools just don’t get that much better or that much worse in one school year.

My hope is that some people will independently confirm my calculations. I checked the individual school progress reports for some of the outliers, just to make sure that I hadn’t made some horrible error that you can only make with a computer. All the data is right there on their website.

If I’m correct in all my calculations, this would mean that the entire progress report system is a farce and many schools have been unnecessarily shut down and communities have had to suffer the unnecessary shame that goes with a school being shut down.

In my next post, which you can read here, I discuss what sorts of conclusions about charter schools in NYC can be derived from the progress report database for 2010-2011 under the assumption that the progress metric is valid.

7 Responses

  1. Michael Fiorillo

    Thank you for your important work. Hopefully, as the pseudo-science of value-added (a gross, reductive and revealing term to apply to children in the first place) evaluations is revealed, it be thrown on the dung heap of every other false educational panacea.

    However, while from a strictly statistical point of view the metrics are a farce, no one should kid themselves that it might just the typical incompetent or misguided efforts of educrats. It is a far more malevolent process than that, involving highly capitalized foundations, think tanks, and specialized academic programs (Harvard’s PEPG, for example) established for precisely these purposes, in which these ideologically-driven and manipulated measures are created and implemented as a lever to shut down, and ultimately privatize, neighborhood public schools, and transform teaching into temporary, at-will employment.

    A few years ago there was an (apparently) minor betting scandal in the NBA, where a ref was caught betting on games he worked. In the course of news coverage, it was revealed that the league keeps detailed statistics on every call refs make in every game, which are tabulated and used to rate and manage them. Sound familiar? As the head of the referees union said, “They control the information, and they us the information to control us.”

  2. Sean

    I don’t think it makes sense to use the rank of the progress scores as the variables. Rank of schools on progress is not used to determine which schools are closed. If you use the actual scores the schools received, there’s a clear positive correlation between years.

    • Gary Rubinstein

      Sean, This is a good point. When the data is plotted that way, there is a much higher correlation. But I disagree with you that ranks are not used to determine which schools are closed. They pick a percentage ahead of time, I think it is 5%, and all the schools in that percentile get ‘F.’ Also the whole premise of this stat is that students are ranked against one another. If a student gets a lower score than most of the students who had the same starting score, the school is penalized. It doesn’t matter if that student made a reasonable amount of progress. All that matters is what percent of similar students that student ‘beat’ in the new test.

      I will put a scatter plot of what you describe in my next post which will discuss what sorts of things we can learn about schools if we do accept that this model is statistically valid. (spoiler alert: it won’t be good for charter schools!)

      Gary

      • They use rank throughout the progress report system, really.

  3. Michael Markowitz

    One more facet that SHOULD add stability, but does NOT:

    Schools are not truly graded along a citywide curve, in that their scores is a WEIGHTED MIX of citywide (25%) and “peer group” (75%). The composite score is then curved to a letter.

    Not only are the grades on a curve, but the curve is on a curve, depending on the school. I call that one screwball curve.

About this Blog

By a somewhat frustrated 1991 alum

Region
Houston
Grade
High School
Subject
Math

Subscribe to this blog (feed)


Subscribe via RSS

”subscribe

Reluctant Disciplinarian on Amazon

Beyond Survival On Amazon

RSS Feed

Subscribe