GothamSchools — daily independent reporting on NYC public schools

Eye on Education

Randomness is Not a Fluke

“I think there’s nothing wrong with anything.”  So spoke Chancellor Joel Klein at yesterday’s release of the 2009 elementary and middle school progress reports.  As Anna Phillips reported, 84% of the schools received a letter grade of A, and an additional 13% received a B.  Only two schools out of 1,058 received an F, and just five more were awarded a D.

The letter grades were driven by the remarkable/suspicious gains in 2009 on the state’s ELA and math tests.  Schools weren’t actually compared to one another on their performance this year to derive the letter grades.  Rather, they were compared to last year’s peer and citywide benchmarks.  To use a football metaphor, because test scores rose across the board, virtually all schools moved up the field, but the goalposts didn’t move.  I wasn’t sure that the progress report letter grades could actually be less useful this year than last, but Chancellor Klein’s administration has achieved that dubious feat.  When 84% of the schools receive an A—the top grade, which everyone understands to signify excellence—what useful information about the school’s relative performance is being conveyed to parents, students, educators, and others with a stake in our schools?  Not much, in my view.

Last year, my blogging partner Jennifer Jennings (who, for those keeping score at home, is now Dr. J) and I were sharply critical of the 2008 school progress reports.  Writing on Jennifer’s eduwonkette site, we demonstrated that student achievement growth over the past year—which makes up 60% of the overall progress report letter grade—was highly unreliable.  Schools that demonstrated high gains in student achievement from 2006 to 2007 were no more likely to show gains from 2007 to 2008 than schools that showed low gains in 2006 to 2007.  We concluded that the measure of student progress making up 60% of the overall progress report grade was picking up chance fluctuations from year to year.  And if 60% of the score is random, there’s not much genuine information about school performance in the progress report grade.

It wasn’t a fluke.

I’ve now replicated our analysis using the 2009 progress report grades.  Many measures of school performance—its environment, based on parent and teacher surveys, its attendance, even the percentage of students achieving Level 3 or Level 4 on the state ELA and math tests—are fairly stable from year to year.  But once again, the student progress measure is wholly unpredictable.  The schools that showed big gains in student achievement from 2007 to 2008 were no more likely to show big gains from 2008 to 2009 than those schools that had small gains from 2007 to 2008. 

Here are some simple graphical representations, using the roughly 600 elementary schools in New York City that received progress report grades in 2008 and 2009.  The first graph plots a school’s academic expectations in 2009, based on surveys, with that school’s academic expectations in 2008.  There is a strong positive correlation of .75, which indicates a great deal of stability from 2008 to 2009 in the relative position of a school compared to other schools.

acad-expec

The second graph plots a school’s attendance rate in 2009 with that school’s attendance rate in 2008.  Attendance is highly stable from year to year:  the correlation between 2008 attendance and 2009 attendance is .95.  There’s almost as much consistency in the percentage of students scoring at levels 3 or 4 on the state ELA test over these two years.  The correlation between the 2008 percentage and the 2009 percentage is .94, which is a very strong association indicating that schools that are relatively high in 2008 are likely to be relatively high in 2009.    

attendance
levels-3-4

As you look at the three graphs above, you can see a clear pattern in the data:  elementary schools that are relatively low compared to other schools in 2008 are also relatively low compared to their peers in 2009.  Similarly, elementary schools that are relatively high compared to other schools in 2008 are likely to be relatively high in comparison to other schools in 2009.  Now, look at the graph below, which plots each school’s student progress from 2008 to 2009 by that school’s progress from 2007 to 2008.  Do you see a pattern?

ela-growth

 If you do, there’s probably a psychologist somewhere salivating at the idea of administering a Rohrschach inkblot test to you.  The correlation between student progress from 2007 to 2008 and student progress from 2008 to 2009 is -.02, which is indistinguishable from zero.  This means that there really is no pattern to the results, and certainly not a pattern that demonstrates consistency or stability from one year to the next.

It’s no surprise to see this, since it follows from what we know about the instability of test score gains from one year to the next in schools across the country.  As was true last year, there’s a bit more consistency in progress in math, but the state ELA and math tests on which these scores are based are coming under increasing scrutiny for their predictability, easiness, and poor content coverage.  The New York City Department of Education has hitched its accountability wagon to a runaway horse.

When it comes to annual test score gains, randomness is not a fluke.

  • Fort Tryon Teacher

    Great analysis, Aaron. The DOE has chosen an unreliable metric (if that’s the right word) for measuring student progress, and has weighted it far too strongly. This is a very helpful addition to my own analysis, which is pretty much, “C’mon, Joel, you kidding me?”

    But we need to measure student progress somehow, right? As a teacher, I want my students to make a year’s worth of progress in a year in my class, and I want the students in my school to demonstrate that sort of progress in all their classes. How can we do better measurement? Are there different criteria we can use?

    I also notice that this links with Daniel Koretz’s work on the unreliability of value-added measures. These failed progress reports show that you can’t judge a school on the basis of year-to-year changes in test scores, even when you put together the work of dozens of teachers. If this is so, how could you possibly evaluate individual teachers on the basis?

    The “accountability” geeks have got a lot of work to do. Hope they’re listening to what you’ve got to say.

  • Michael M.

    Huzzahs!
    “A” for skoolboy!

    I get the “fluke.” I do NOT get why the DOE is resistant to FIXING it. Specifically, I have called on DOE to REDUCE the weight on the flawed progress metric. Instead, between Yr 1 and Yr 2, they increased it from 55% to 60% at the expense of the performance metric, which went from 30% to 25%.

    Dr. P (or Dr. J),
    I need some professional help.
    Sincerely, please look at my comments on related stories.

    1) What is the impact of negative percentiles, or over-100th percentiles on the resulting scores? You touch on this above with your reference to the lack of resetting the benchmarks. That intuitively contributes not only to inflation, but I believe to the undercutting of the internal roll-up math as well.

    2) Clearly, no one at DOE understands calculus, specifically derivatives — rates of change OF rates of change can get pretty tricky.

  • http://www.grand-rounds.blogspot.com Jennifer

    Which measure of student performance did you use? Scale score, level or the NYC values?

  • Michael M.

    J,
    Not sure if you’re addressing skoolboy or me.

    I was reading the negative and over-100 percentiles from within the 2008-9 School Progress Reports — off a mix of the various performance and progress sub-metrics — for the only two schools that got an “F” this year and first year reported; from the top UWS school that went D-A-A over the last three years; and from PS8 in Brooklyn that went C-F-A. In order:

    Manhattan’s Washington Heights Academy
    Manhattan’s Harlem Link Charter School
    Manhattan’s MS243
    Brooklyn’s PS8

    Email me (witzeroo at yahoo dot com) if you want the links via the DOE’s school data portals. I don’t want to get stuck in moderation.

  • http://www.grand-rounds.blogspot.com Jennifer

    Sorry for the confusion Michael – I was wondering about the values Skoolboy used in creating the chart.

  • Michael M.

    Just for you Eduwonkie:

    There once was a young educator,
    Said “‘Progress Reports’ are simply inflators!
    But Chancellor Joel Klein
    Says, “There doing quite fine…
    We’ll bump the other 16% later.”

  • Aaron Pallas

    Michael M: There are two different uses of % in the Progress Reports. The first, mentioned by Fort Tryon Teacher in a comment on another post, pertains to the location of a school in the distribution of all other schools citywide, based on the overall score. That % is in fact a percentile.

    The second %, which is used for calculations of scores for each of the components of the Progress Reports, is not a percentile. Instead, it represents the location of a given school in relation to its peer group or in relation to schools citywide. Consider attendance. Each school has a peer group of 40 similar schools, each of which has an average attendance rate. Suppose that the lowest of the peer schools is 83%, and the highest is 96%. (There’s some way of tossing out outliers that are judged to be unusually low or high, but that process isn’t explained in detail.) The “peer horizon” thus spans the range from 83% to 96%. Now suppose that School X had an attendance rate of 86%. School X’s attendance covers 3 percentage points of the 13-percentage-point range from 83 to 96. 3/13 is .23, or 23%. So School X has traveled 23% of the distance of the peer horizon.

    Now, suppose that School X has an attendance rate of 97%–it actually exceeds the attendance rates of all 40 of the schools in its peer group. School X has covered 14 percentage points above the peer minimum, which exceeds the 13 percentage points of the peer horizon range. 14/13 is 1.08, or 108%. So School X’s attendance is 108% of the peer horizon range. But it’s not a percentile. Similar logic if School X were to have a value that’s below the minimum of the peer horizon range; that’ll result in a negative percentage of the amount of the range that’s covered by School X’s value.

    Jennifer: I’m using the percentage of students making at least one year of progress, as defined by the DOE. This uses the so-called Proficiency Ratings, which are the DOE’s transformations of the state’s four proficiency levels. No reputable psychometrician I know approves of this peculiar transformation, but it’s probably not at the root of the problem of unstable year-to-year student “progress”.

  • http://www.grand-rounds.blogspot.com Jennifer

    Thanks for clarifying – it’s a metric that seems to be used ONLY in New York City.

  • Michael M.

    Aaron,
    Thanks.

    However… DOE then uses the negative percentawhatevers to pull down — below zero — what should be re-normed to zero. Seems like a double-hit of sorts.

    Next, look at how non-intuitive a number of the so-called “progress” result bars are in the only two failing schools.

    Repeating myself from a prior post’s comment: How does it pass the sniff test that at one school, there can be *simultaneously* above the 100% of range (average change in ELA proficiency for kids at Level 3 or 4) and below the 0% of range (similar for kids at Level 1 or 2). That’s some dichotomy.

    And yet… the actual change in proficiency was 0.10 and 0.09 respectively. That’s some dead heat.

    Above true for both peer and citywide spectra.

    This happened at Harlem Link Charter, one of the two that got an “F.”

    Time to bring back the predictor monkey, only this time get the monkey drunk first.

  • Michael M.

    How is it that the kids get a 1, 2, 3, or 4 on their home report cards (though everyone kibbitzes about being a “high 3″ or a “low 4″ etc)… but the School Progress Reports say proficiency is measured on a scale of 1.00 to 4.50? I’m ok with averages between integers, but that last 0.5?

  • http://www.specialeducationmuckraker.com Dee Alpert

    Might be interesting to see the no. and %age of kids in each school, or the schools which reportedly made great progress over the prior year, to see how many kids were reported as “not tested” or “absent” for this batch of scores vs. last year’s test results.

    Then there’s the no. and %age of kids to whom principals, in their sole, unreviewable discretion, could grant testing modifications and/or accommodations, such as extended time, etc., because the principals decided the kids had experienced temporary medical disabilities … . I understand that NYCDOE’s %age of kids given these mods and accoms on the NAEP tests were off the charts. Why would one assume it was just limited to NAEP testing?

    And how do scores which (allegedly) show whether a kid has attained standards for one grade tell whether the kid has made a year’s worth of progress measured by whether the kid has answered questions relating to a totally different set of content standards the following year? It seems to me that this is like comparing apples and oranges.

    It’s not as though one must master every area of the ELA curriculum in 4th grade for one to master every area of the ELA curriulum in 5th grade, is it?

    If these were diagnostic reading and math tests, comparison of year-to-year scores would show something significant. But if fractions aren’t in both the 4th and 5th year math curricula, and if knowing 4th grade fractions isn’t necessary in order to master the 5th grade math curriculum, or at least score well on most, but not all, of the 5th grade math questions … progress? I don’t think so, but I’m not a psychometrician.

    But then, I don’t recall that Joel Klein-Bloomberg is, either.

  • ahb

    Great work! I will be sharing this analysis at my school. I was wondering if anyone here has analyzed the Value Added models that are the foundation of DOE’s Teacher Report cards?

  • Pingback: I’m Going to have this Graph Tatooed to my Chest « iThinkEducation.net!

  • http://ljohnson562@charter.net Linda/Retired Teacher

    I don’t understand all this technical stuff, but I know from experience that a huge fraud is being perpetrated on the American people with all these testing “miracles.” Thank you, Aaron and Jennifer, for persisting in bringing this to everyone’s attention.

  • Michael M.

    Hi Linda,

    It’s all about “regression to the mean.” While focusing on the streaks.

    Derek Jeter just tied Lou Gehrig for most hits by a Yankee. Ever. Last week, he went 0 for 12.

    Same thing.

    Next week, odds are he’ll go 1 for 3 at any given at bat, regardless.

    Klein would give him an F, then an A. Or send him to a charter and give Girardi a C.

Tips, questions, feedback?

Contact us at .

Follow GothamSchools

RSS

Recently Posted Jobs

Chalk It Up

Recent Comments

0 comments so far today

Events Calendar

Our Twitter Updates

  • Several readers have asked us to change our site so that external links open in a new browser tab. Anyone disagree? Anyone agree? 13 hrs ago
  • 12 NYC schools on @Newsweek's list of best 1,000 American high schools (not the same as last week's list): http://t.co/1tqx9o2C 14 hrs ago
  • Dennis Walcott runs, sings in a church choir, and makes the week's meals on Sundays: "I can do everything except sew." http://t.co/vVOBRp5G 17 hrs ago
  • @JBrownDPost: surprised to see that federal gov. doesn't have a way of deciding "what worked" for the SIG schools over 3 yrs process. #EWA12 1 day ago
  • .@tkonz: Of 18 SIG schools, only about 4 teachers were recruited from strong schools because they "wanted to be part of a big change" #EWA12 1 day ago
  • More updates...

Archives

May 2012
M T W T F S S
« Apr  
 123456
78910111213
14151617181920
21222324252627
28293031