GothamSchools — daily independent reporting on NYC public schools

skoolboy
Aaron Pallas

Comparing Small Apples to Large Apples

I’m not sure how much credibility the Progress Reports at the heart of the NYC Department of Education’s accountability system have left.  The elementary and middle school Reports issued earlier this fall were ridiculed for their inability to distinguish one school from another, since 97% of the school’s received A’s or B’s (and 84% received A’s).  Moreover, I showed that the student progress measures that make up 60% of a school’s overall score were highly unreliable from one year to the next.  As long as these reports are tied to year-to-year changes in state test scores, they’re likely to be fatally flawed.

On Monday, the Department released the 2008-09 Progress Reports for high schools.  Anna Phillips reported that Chancellor Joel Klein said that the high school Progress Reports were more stable and accurate than those for elementary and middle schools because they’re based on multiple measures.  Huh?  Welcome to the party, Chancellor Klein.  I hate to tell you that measures such as credit accumulation are not necessarily accurate measures of a school’s contribution to student learning and development. 

But the high school Progress Reports have a bigger problem.  Three-quarters of a school’s score comes from a school’s location in relation to a group of 40 peer schools.  The idea of comparing a school to peer schools is to create an “apples to apples” comparison.  It’s actually a good feature of the Progress Reports that they seek to compare a given school to how schools across the city are doing as well as to how schools that serve similar students are performing.

But it only works if the right criteria are used to determine a school’s peer schools.  Wednesday, Jenny Medina and Robert Gebeloff broke a story in the New York Times that high schools with higher percentages of poor, black and Hispanic students received lower grades on the Progress Reports.  In 2009, they wrote, the high schools which received A’s enrolled an average of 77% black and Hispanic students.  In contrast, the high schools which received C’s, D’s and F’s enrolled an average of 91% black and Hispanic students.  This pattern, found in 2007 and 2008 as well, suggested that the school grading system doesn’t adequately adjust for racial and ethnic differences among schools.

A high school’s peer index is based primarily on its students’ average eighth-grade scores on the state ELA and math exams (using the peculiar metric the DOE has developed for converting the exam’s scale scores into a 1.0 to 4.5 proficiency scale), minus two times the percentage of special education students and minus the percentage of overage students.  A high school with an average proficiency of 3.10, 6% special education students, and 12% overage students would have a peer index of 2.86.  One with an average proficiency of 3.70, 2% special education students, and 5% overage students would have a peer index of 3.61.

Although the formula tries to take special education and overage status into account, I suspect that its designers were unaware that it is dominated by the average proficiency value, because there is far more variance from school to school in average proficiency than in special education and overage status.  But a larger question is, why these factors and not others?  Why not the percentage of English Language Learners (ELL’s)?  Why not the percentage of students eligible for a free or reduced-price lunch?  Why not the racial/ethnic make-up of the school?  (And when is the DOE going to wise up that it can’t treat black students as equivalent to Hispanic students, and Asian students as equivalent to white students?  These groups have different learning trajectories.)

And why stop there?  If the goal is to try to isolate the impact of the school on student performance and progress, then logic would dictate that we should seek to control for all factors that are prior to selection into one school versus another, and potentially related to students’ outcomes.  That includes a range of demographic criteria, to be sure.  But there are at least two other factors that I think ought to be taken into account.  The first is school size.  Schools in New York City generally have little control over their size, and if small schools provide certain advantages for students, then we should compare small schools to small schools and large schools to large schools.  The second is per-pupil expenditures.  Even in the Fair Student Funding era, there are disparities in per-pupil expenditures across schools that are not accounted for by demographic differences in the students attending different schools.  I’ve spoken to principals who are indignant that their peer schools have higher expenditures, and yet they are being held to the same performance criteria.

Does all this matter?  You bet.  Let’s look at just one of the many measures in the high school Progress Reports:  the percentage of second-year students accumulating ten or more credits.  (The pattern I’m going to describe is found for many of the performance and progress measures in the Progress Reports.)   Citywide, the 2009 average was 72%, with a standard deviation of 15%.  Schools are compared to their “peer range,” a school’s location in relation to its lowest and highest peers.  Citywide, schools were, on average, 59% of the distance between the lowest and highest peers on their percentages of second-year students accumulating ten or more credits.

But some schools were advantaged in these calculations, and others disadvantaged, even though the peer horizon scores are explicitly designed to compare “apples to apples.”  The figure below compares schools in the lowest quarter of a given demographic feature to schools in the top quarter.  Schools with high concentrations of black and Hispanic students;  large schools;  schools with a higher proportion of special education students;  and schools with more English Language Learners all score lower relative to their “peer” schools than do other schools. 

apples-to-apples

What these figures suggest is that New York City’s high school Progress Reports systematically penalize some schools and reward others.  So when you see the DOE touting the superiority of the progress made by the small schools opened during the Bloomberg/Klein era, remember that it’s no accident:  it’s built into the accountability system.

12 Comments

Subscribe to comments with RSS or TrackBack

  1. Gideon

    Wouldn’t throwing all of the possible factors into the mix be a problem since many of them are correlated with each other?

  2. Aaron Pallas

    Gideon,

    Correlated factors actually make things easier, because the correlations reduce the number of dimensions on which schools must be compared. The approach would be quite a bit more complicated than the peer index that the DOE uses, but generally the idea is to use the observed factors to classify schools into a set of groups. A technique known as latent class analysis would sort schools into groups in which the schools are as similar as possible within a given group, based on these observed measures, but with each group as different as possible from other groups.

  3. Leonie Haimson

    How about the taking into account critical factors like class size and overcrowding? The large HS have class sizes about 5 students on average larger than the small HS — and tend to be far more overcrowded, which tends to lead to low attendance, more disciplinary problems and less opportunity for after school tutoring, etc. Indeed, class size is said to be a factor in the teacher data reports — by which the DOE is evaluating individual teacher effectiveness, and thus implicitly acknowledging that the larger the class, the less likely a teacher will be effective. There’s no reason that this factor should be excluded from the formula to determine school grades. Is a school that has class sizes of 34 on average, and is at 150% of its capacity really on a level playing field with a school that has class sizes of 25 or less, and is only 75% full?

  4. Is any sort of unidimensional peer index really going to work for these purposes?

    When different factors are put together like this, can’t they end up offsetting and counteracting in unexpected ways? I know that propensensity scores compress many attributes into one dimension, but is that what we want to do here? I mean, do we we want to create propensity scores for doing well on one test or the other?

    If we did that, wouldn’t we need multiple peer groups for the different factors we want to look at? Like a peer group for attendence, a peer groups for teachers experience, a peer group for test scores, etc..

    While methodologically more rigorous and more useful for those who want to be careful about meaningful comparisons, that might work. But that doesn’t seem to be the goal of this DOE or such public reporting generally.

    So, what is the alternative? A single peer index, like this? How good can such an approach ever really be?

  5. Aaron Pallas

    Leonie,

    Class size is a good example of some of the challenges involved in doing this kind of matching. The idea of this kind of matching is to control for factors that are logically prior to a student’s entry into a school, but *not* to control for factors that are under the school’s control as a matter of school policy or practice. The DOE might argue that average class size is a choice made by principals about how to allocate the resources under their control. On the other hand, you might argue that class size reflects a school’s enrollment, the amount of physical space available (i.e., capacity) and a school’s budget, all of which are outside of the school’s (and principal’s) control, because they’re imposed on a school by the DOE. Adding class size as a factor wouldn’t add any new information if it really is highly determined by enrollment, capacity and budget, but if there were differences across schools in their average class size even when enrollment, capacity and budget were equal, then we might think of that variation as something under the school’s control, and hence not to be matched on in constructing peer groups.

    I strongly believe that, if we are to make inferences about the relative performance of schools, we should take account of the different resources available to different schools, and these certainly include enrollment, capacity and budget measures.

    Incidentally, the use of class size in the teacher data reports is less ambiguous, because even if class size is under the control of the school and principal, teachers generally don’t have the ability to choose the size of their classes. That’s one of the limits of the teacher value-added approach: if teachers are assigned to classes of different sizes in a nonrandom way (i.e., a principal wants to reward or punish a teacher with a small or large class, or just thinks that a particular teacher might do better with a smaller or larger class), then simply controlling for class size may not adequately explain why a teacher winds up with one class versus another.

  6. Aaron Pallas

    Alexander,

    I understand the tradeoff that the DOE faces in constructing comparisons that are simultaneously meaningful and easily communicated to diverse audiences, some of which are sophisticated about research and statistical analysis, and others of which are not. Value-added approaches are challenging to explain; the DOE’s peer index, for better or worse, is not. But the evidence is clear that the peer index does not adequately adjust for real differences in a school’s resources, and therefore results in a distorted picture of how schools are faring head-to-head.

    I’m not sure I understand your point about propensity scores, which generally work best when there are two clearly-defined treatment and control groups, and the researcher models the propensity for a case to be selected into the treatment condition. Generally, the propensity model is indifferent to the particular outcomes to be examined, because those outcomes are observed only after selection into treatment or control condition has occurred. You wouldn’t have separate propensity models for different outcomes.

    But it’s true that the attributes on which we would want to match schools might be multidimensional. That’s why I’m suggesting a latent-class approach, in which schools are classified into latent classes that are as homogeneous as possible on the set of attributes.

  7. Leonie Haimson

    Class size is highly determined by resources and space, whether the principal can cap enrollment at an acceptable level, as well as the relative numbers and composition of the student body. IE if you have alot of special ed students who require smaller classes by law, that limits how many classrooms you have in ways that are not taken account of in the space formula in the blue book. Our principal survey found that 86% of principals said that they were unable to provide a quality education because of excessive class sizes, and they were limited most of all by lack of space and an inability to control enrollment, and third, by lack of funding. There is another problem with the peer indexes which relates to competition vs. collaboration. Liebman always said that the system was designed to that principals could learn from each other in the schools that were similarly situated — ie from their peers, to see what they were doing differently that might give them better student outcomes. In ARIS, a big part of its design was supposed to facilitate collaboration between teachers in helping to devise lesson plans etc (though its not clear if that is happening or not.) But principals have understandably admitted that they are reluctant to help out other principals in their peer groups, since they are being judged in competition with them. Wonder what you think of this problem.

  8. [...] parents and the public. As Aaron Pallas calls to our attention on the Gotham Schools blog (“Comparing Small Apples to Large Apples“) the tool itself does not pass muster.   … the student progress measures that make [...]

  9. Diane Ravitch

    Aaron, good analysis. Remind me again, what is the point of the school grades? I can’t remember.

    Diane Ravitch

  10. Smith

    Diane, there one purpose for high school grades: to get principals and teachers to produce numbers that will justify positive press about Klein’s tenure. “Achievement” levels are raised by pressuring schools to award more credits and diplomas. Meanwhile, Klein’s new small schools are made to look successful by comparing them to larger schools with more difficult populations.

  11. Leonie,

    It sounds from your analysis that all those principals would be better off running charter schools.

    They could cap their enrollments, balance their own budgets to produce the class sizes they want, and generally remove the impediments that you describe.

  12. Leonie Haimson

    KS — if all NYC principals could cap enrollment at any level they liked there were literally be hundreds of thousands of kids on the street, with no place to attend public school. Is this the sort of future you want?

Leave a Reply

Tips, questions, feedback?

Contact us at .

Mapping the Budget Cuts

Post a comment about the budget cuts at your school on our interactive comment map. more »

Chalk It Up

Recent Comments

20 comments so far today

Our Twitter Updates

  • That was anticlimactic: Chancellor Klein just announced that school is closed tomorrow. Go stock up on cocoa now! 3 hrs ago
  • What are odds that tomorrow will be a snow day in NYC schools? Mayor Bloomberg is holding a 1 p.m. presser to discuss the city's snow plan. 3 hrs ago
  • Citywide Council on High Schools meeting is set to proceed as scheduled, for now. Same goes for the PEP meeting rescheduled from Jan. 26. 22 hrs ago
  • From the DOE: In anticipation of inclement weather, the Specialized High School open houses scheduled for Weds. have been postponed. 22 hrs ago
  • @datadiva What do you see as the biggest changes? We're having trouble figuring out what to make of the 2010-2011 changes. in reply to datadiva 22 hrs ago

Events Calendar

Archives

February 2010
M T W T F S S
« Jan  
1234567
891011121314
15161718192021
22232425262728

GothamSchools by Email

Technology in Education

The blogroll is a work-in-progress; to be added or if you've been miscategorized, send us an email at .