GothamSchools — daily independent reporting on NYC public schools

Eye on Education

What Counts as a Big Effect? (II)

On Friday, I began talking about what counts as a big effect.  Turns out I’m reinventing the wheel, as there is an excellent paper by Carolyn Hill and her colleagues at Manpower Development Research Corporation on this topic, entitled “Empirical Benchmarks for Interpreting Effect Sizes in Research.”  But I’ll press onward nevertheless.

Last month, the federal Institute for Education Sciences released the third-year report on the evaluation of the DC Opportunity Scholarship Program, which provides vouchers for K-12 children and youth in the DC Public Schools who win a lottery to attend a private school.  The key outcomes in the study were scale scores on the Stanford Achievement Test (SAT-9) in reading and mathematics.  (Scale scores are converted from “raw” scores based on the number of correct responses to the test.)  The evaluators found that, after three years, students who were offered a voucher scored 4.46 points higher on the SAT-9 reading test, which represented an effect size of .13.  This effect was statistically different from zero.  Interestingly, the impact of being offered a voucher on reading scores was not reliably different from zero for male students.  In mathematics, there was no evidence of a positive effect of being offered a voucher:  after three years, students offered vouchers scored .81 points higher on the SAT-9 math test, an effect that was not statistically different from zero, and which corresponded to an effect size of .03.

Based on how these effect sizes equate with percentile changes, these are pretty small effects, and the presence of an asterisk denoting statistical significance for the effect of being offered a voucher on reading scores for girls alone, and no effects on math scores for either boys or girls, doesn’t justify the political spectacle that surrounds the program.  After three years, the net movement in reading for voucher students starting at around the 34th percentile nationally is about five percentiles;  in math, it’s about one percentile.  Anyone who thinks that effects of this size are altering the life trajectories of DC children is kidding himself.   

Part of the hoopla stems from another way in which the size of the voucher effect is being reported:  months and years of additional learning.  The overall effect of 4.5 scale score points in reading is reported as equivalent to 3.1 months of additional learning for members of the treatment group, and the 5.3 point scale score gain for those who actually used the voucher is reported as 3.7 additional months of learning.  The Wall Street Journal’s op-ed page, always good with math, rounded this up to “Children attending private schools with the aid of the scholarships are reading nearly a half-grade ahead of their peers who did not receive vouchers.”

Where do numbers like this come from?  They hinge on the fact that the SAT-9 is vertically-equated across grades K-12, which means that a common scale is used for the forms of the test that are administered at different grades.  Using the same scale across grades facilitates the measurement of growth over time.  Although a given scale score is supposed to represent the same level of proficiency regardless of what grade a student is in, the reality is that the skills tested at widely-differing grade levels don’t overlap much, so that a given scale score in the third grade may represent a different set of content skills than that same scale score in the seventh grade.  (It’s for this reason that the oft-cited claim that, based on the National Assessment of Educational Progress, white students in the 12th grade are, on average, four years ahead of their African American peers is unsupportable.  Although there is a single NAEP scale, a given score represents different competencies in eighth grade than that same score does in the 12th grade.)  Vertically-equated scale scores in adjacent grades are much more credible than score in grades that are far apart.

The DC evaluation report states that the conversion to months of learning is based on dividing the impact effect size by the effect size of the weighted average annual increase in scale scores for the control group.  In other words, if control group students gain 10 points a year, on average, on the SAT-9 reading test, and the group using a voucher scored 5 points higher than the control group, then the voucher group is 5/10 = .5 years, or 4.5 months, ahead of the control group.    

 What this implies is that if a test shows relatively small gains in performance from one year to the next, then a given effect will look like a larger difference, in terms of months or years of learning gains, than if that test shows relatively large changes over time.  Hill and her colleagues show that, for most nationally-normed tests, the largest changes over time occur in the earliest elementary grades, and get progressively smaller as students move into secondary school.  This could mean that students simply learn less in high school than they do in elementary school.  But it might also mean that tests with a common scale aren’t very good at picking up changes over time in the content of what is taught or learned.  The reason that the effects of using the voucher in the DC study appear relatively large in terms of months or years of learning is that there wasn’t much evidence of learning in the control group population—much less learning than is implied by the national norms on the SAT-9 test or students’ scores on DC’s own Comprehensive Assessment System (DC-CAS).     

The moral to the story:  when the effects of an intervention are reported in terms of months or years of learning gains, treat the numbers with a healthy dose of skepticism.  The magnitude of an effect size has to be placed into a meaningful context, which includes knowledge of what Hill et al. refer to as the “natural growth for its target population.”

Tomorrow I’ll have a few more things to say about the DC study, and some anomalies in the scores that I find troubling.

  • http://franzimmerman.com Frances O’Neill Zimmerman

    Say what? I wish data crunchers would master the English language so that our eyes would not glaze over when we try to read what they allegedly have to say. Is there no way to translate? Help!

  • Elizabeth Green

    Fran,

    Aaron should correct me if I have this wrong, but I think the main point is to be skeptical of a certain kind of statistical conclusion: The kind where researchers declare that a certain policy gives a student a boost of X number of learning years. E.g., in the Wall Street Journal, “Children attending private schools with the aid of the scholarships are reading nearly a half-grade ahead of their peers who did not receive vouchers.”

    I spot two reasons to be skeptical. The first is that the statements may simply not be correct because they are grounded in tests that can’t support that kind of a conclusion. For instance, tests may not have been designed to compare how much children known across grade levels; what a child scores on the eighth grade test cannot be compared to what another scores on a 12th grade test. So there’s no way to look at trends of data across years and spot patterns.

    The second reason to be skeptical has to do with a numbers trick. The more a test shows large gains in learning, the less a specific policy will seem to have a big effect, . That has to do with an arithmetic trick, which Aaron explains this way: “What this implies is that if a test shows relatively small gains in performance from one year to the next, then a given effect will look like a larger difference, in terms of months or years of learning gains, than if that test shows relatively large changes over time.”

    Aaron, hope I have this right. Please correct me and amplify what I’m not explaining?
    Elizabeth

  • Michael M.

    Re “Hill and her colleagues show that, for most nationally-normed tests, the largest changes over time occur in the earliest elementary grades, and get progressively smaller as students move into secondary school. This could mean that… ”

    I am reminded of the difference between horsepower and torque, as affected by weight, and as applied at different speeds. Then again, I’m an engineer and a car guy — not an edumetrician. ;-)

    Seriously, a high school kid may know a lot more than a kindergartener, and is probably capable of learning “more” in a given year, but it’s still easier to make a bigger relative increase in the knowledge of a kindergartener over the course of a school year. Something like that?

  • Aaron Pallas

    Elizabeth, thanks for the English translation.
    Michael, I’m not a car guy, so am not sure about the horsepower and torque analogy. It may be that our measuring instruments aren’t very sensitive to the kinds of learning that takes place in high school or college, so that year-to-year gains don’t look very large. In contrast, think about the volume of new vocabulary words that children can learn in the early grades, and how large the annual gains in vocabulary might look.

  • Matthew

    In other words, perhaps less helpful to Fran but certainly in Michael’s vocab, some gains may be asymptotic

    http://en.wikipedia.org/wiki/Asymptote

  • http://stuartbuck.blogspot.com Stuart Buck

    Anyone who thinks that effects of this size are altering the life trajectories of DC children is kidding himself.

    This seems rather one-sided in its mockery . . . it should be balanced by noting that anyone who wants to kill a program with effects of this size (which are rarely found for any educational intervention) is acting in bad faith.

  • Aaron Pallas

    Stuart, what counts as a life-changing impact for a social or educational program is surely a matter of judgment. In my view, few programs have impacts of that magnitude. For the reasons that I’ve noted in several posts, I judge the measured impact of the DC Opportunity Scholarship Program to be small. You’re entitled to your own opinion.

  • http://stuartbuck.blogspot.com Stuart Buck

    Did anybody say “life changing”? That’s a straw man.

    The proven positive impact may be small — and this is assuming that the ONLY purpose of education is higher test scores (an assumption that you might dispute in any other context besides vouchers). Even so, my point still stands: The people trying to get of an educational intervention with even a small positive impact are acting deplorably.

  • http://stuartbuck.blogspot.com Stuart Buck

    trying to get RID of . . .

  • Aaron Pallas

    Stuart: “straw man”? Do you think I make this stuff up? Please consider the italicized text below.

    “We hope that President Obama, Education Secretary Arne Duncan, Sen. Richard J. Durbin (D-Ill.) and others who have questioned the effectiveness of school vouchers were tuned in to yesterday’s Senate hearing on the District’s program. They would have heard moving testimonials from students whose lives have been changed by their ability to get an appropriate education…” Washington Post editorial, 5/14/09

    “The National Education Association, a teachers union allied with the Democrats, has been on a mission to end the program as soon as possible, with no regard for the children whose lives would be affected.” Washington Post editorial, 5/7/09

    “It is about being given an education that makes D.C. students free to choose their future for themselves or one that often condemns them to the life of poverty many were born into.” Marta Mossburg, Washington Examiner, 5/8/09

    “Five years later, some lawmakers are ready to rescind this lifeline for some very deserving District families. Are we letting partisanship interfere with our children’s futures and the success of our country?” Bruce Stewart, Senate testimony, 5/13/09

    “The precipitate cause of this rally is the Democrats’ passage of an amendment tucked into the omnibus spending bill. Sponsored by Sen. Richard Durbin (D., Ill.), the amendment effectively ended the Opportunity Scholarship Program, a lifeline now used by more than 1,700 schoolchildren to escape one of America’s most miserable public school systems.” Wall Street Journal editorial, 5/5/09

    “Just the sort of sneaky maneuver that’s so handy when you don’t want inner-city moms and dads to catch on that you are cutting one of their lifelines.” Wall Street Journal editorial, 3/3/09

    “I cannot imagine trying to explain to my daughters why one had a chance to soar and the other didn’t.” Margaret Spellings, Washington Post, 5/10/09

    “How many more have to go without a proper education and give up their dreams before we say, ‘Enough’? … The D.C. Opportunity Scholarship Program is a necessary means of educating children who otherwise would be lost; it must be maintained and allowed to flourish.” Anthony A. Williams & Kevin P. Chavous, Washington Post, 4/14/09

    “a program that offers a way out of D.C.’s hope-killing factories and into other schools…Yet it didn’t think it fit to spend an infinitesimal $14 million on a thriving program that makes a palpable difference in the lives of children desperately in need of help… [Obama’s] daughters’ futures are too precious to be sacrificed on the altar of politics. But, evidently, the futures of other children are not.” Shikha Dalmia, Forbes Magazine, 4/17/09

    “Father Moore said the program ‘is life-changing and in some cases life-saving. Students are able to flourish in a positive … Catholic setting.’” Catholic Standard, 4/14/09

    “Lives have been changed,” “life-changing,” “lifeline,” “otherwise would be lost” … these are bold claims for a program that has shown modest effects on reading test scores, and no reliable effects on math test scores. The superheated rhetoric about the effects of the DC Opportunity Scholarship Program didn’t originate with me. I don’t think there’s anything deplorable about pointing out the mismatch between the rhetoric and reality surrounding this program.

  • Socrates

    You don’t think a kid who was zoned for a horrible school but wound up at Sidwell had her life changed? The fact is that good schools can, and do, change lives. The problem is that the data about a) what constitutes a changed life and b) which schools change lives will always be murky. So, Aaron, you’ll probably need to visit the schools that people say “change lives” as well as those that really don’t. A visit to a couple of of these schools, in my experience, is what is required to fully understand the myriad unmeasurable – or at least unmeasured – ways that great schools can dramatically alter the arc of a student’s life. I don’t think that the DC Voucher experiment changed a ton of lives, because it didn’t make it possible for enough kids to go to life-changing schools. But that some students’ lives were changed by it is undeniable.

  • http://stuartbuck.blogspot.com Stuart Buck

    “Lives have been changed,” “life-changing,” “lifeline,” “otherwise would be lost” … these are bold claims for a program that has shown modest effects on reading test scores, and no reliable effects on math test scores.

    First of all, those aren’t my statements. I was responding to the claim that I’m “entitled to my opinion” about the life-changing impact.

    Second, are you really saying that the only way in which people’s lives are ever changed is by having higher test scores? I mean, even if we were talking about the most fanatical supporters of NCLB-type testing, it would be a caricature to say that they think of test scores as the ONLY value in life. What is it with vouchers that makes people who normally acknowledge the broader import of schools suddenly act as if test scores are the only thing that anyone cares about in life?

  • Aaron Pallas

    Socrates, let me see if I understand your argument. (1) Great schools can alter children’s lives in unmeasurable or unmeasured ways. (2) Sidwell Friends is a great school. (3) Therefore, Sidwell Friends changed the lives of the students who attended via the DC Opportunity Scholarship Program. (4) Therefore, it is undeniable that some students’ lives were changed by the DC OSP.

    Sorry, but I don’t buy it. You have presented no evidence that Sidwell Friends changed the lives of the handful of students who attended via the voucher program, and neither do any of the evaluation reports. For that matter, you haven’t presented any evidence that Sidwell Friends is a great school. To be sure, it’s an expensive school that recruits affluent families and their high-achieving children, and many people would infer from this that it’s a great school. But if we are to define a great school as one in which students have better outcomes than if they had attended other schools, then there’s not much to go on in claiming that Sidwell Friends is a great school, or that it has changed the lives of the students attending via the DC OSP.

    Even if we were to identify a small fraction of students whose lives were changed via participation in the DC Opportunity Scholarship Program, it would be a peculiar basis for arguing for the continuation of the program. The statutory mandate for the evaluation of the OSP calls for an assessment of the impact of the program on the academic achievement of participating eligible students. The evaluators, with the full support of the Department of Education, operationalized this as the difference in the average test performance of voucher recipients (and the subset who used the vouchers) and those who sought to participate in the scholarship program but were not selected. This is standard practice in program evaluation. The impact of program participation on individual students is irrelevant to this assessment except in the way that it contributes to the average difference between the treatment group and the control group.

  • Aaron Pallas

    Stuart,
    You asked, “Did anybody say ‘life changing’? That’s a straw man.” I responded with evidence that lots of people are saying the equivalent of “life changing,” including your prolific colleague Socrates.

    I certainly do not think that higher test scores are the only way in which people’s lives are changed. I believe that a careful reading of my comments here at GothamSchools and at the eduwonkette site will reveal that I believe that the purpose of schooling is much broader than raising test scores. But I think it’s disingenuous on your part to presume that it’s those who question the efficacy of vouchers who are acting as if test scores are all that matters. In fact, the Wall Street Journal, Washington Post, relentless bloggers, and other outlets all are emphasizing the effects of the Opportunity Scholarship Program on test scores as a rationale for resurrecting the program. Perhaps this is because the evaluation has not found effects on students’ satisfaction or reports of safety. It’s also the case that 57% of program participants who used their vouchers were entering grades K-5, and only 6% were entering high school, so some of the longer-term outcomes we might wish to observe simply aren’t available yet.

    For an earlier account of this from my point of view, see http://blogs.edweek.org/edweek/eduwonkette/2008/07/the_rhetoric_of_reform_does_re.html

  • Socrates

    Aaron,

    If you were a DCPS parent who had the chance to send your kids to Sidwell instead, would you do it, and do you think it would have an important effect?

  • http://stuartbuck.blogspot.com Stuart Buck

    QUOTE: But I think it’s disingenuous on your part to presume that it’s those who question the efficacy of vouchers who are acting as if test scores are all that matters.
    Well, as I see it, the voucher supporters, as a whole, are saying “Look, there are at least modest increases in test scores; plus parents are happier; plus it’s a good thing to give poor parents and children more opportunities and autonomy over their own lives; plus there can be competitive pressure on the public schools, as several other voucher studies show; plus some kids end up in a schooling environment more to their tastes or values or capabilities, and this is good even if it doesn’t show up in test scores.”

    To which voucher opponents say, “But the test scores didn’t go up that much. Therefore vouchers don’t change anyone’s life. QED.” Which is what you are arguing here, and which assumes that test scores are the only possible way in which anyone’s life can be changed.

Tips, questions, feedback?

Contact us at .

Follow GothamSchools

RSS

Recently Posted Jobs

Chalk It Up

Recent Comments

0 comments so far today

Events Calendar

Our Twitter Updates

  • Several readers have asked us to change our site so that external links open in a new browser tab. Anyone disagree? Anyone agree? 11 hrs ago
  • 12 NYC schools on @Newsweek's list of best 1,000 American high schools (not the same as last week's list): http://t.co/1tqx9o2C 12 hrs ago
  • Dennis Walcott runs, sings in a church choir, and makes the week's meals on Sundays: "I can do everything except sew." http://t.co/vVOBRp5G 15 hrs ago
  • @JBrownDPost: surprised to see that federal gov. doesn't have a way of deciding "what worked" for the SIG schools over 3 yrs process. #EWA12 1 day ago
  • .@tkonz: Of 18 SIG schools, only about 4 teachers were recruited from strong schools because they "wanted to be part of a big change" #EWA12 1 day ago
  • More updates...

Archives

May 2012
M T W T F S S
« Apr  
 123456
78910111213
14151617181920
21222324252627
28293031