GothamSchools — daily independent reporting on NYC public schools

skoolboy
Aaron Pallas

What Counts as a Big Effect? (I)

I woke up yesterday morning to read Norm Scott’s post on Education Notes Online about a new study of the effects of charter schools on achievement in New York City.  The study, by economists Caroline Hoxby and Sonali Murarka, finds a charter school effect of .09 standard deviations per year of treatment in math and .04 standard deviations per year in reading.  I haven’t read the study closely yet, but I was struck by Norm’s headline:  “Study Shows NO Improvement in NYC Charters Over Public Schools.”  The effects that Hoxby and Murarka report are statistically significant, which means that we can reject the claim that they are zero.  But are they big?  That’s a surprisingly complicated question. I’m going to argue that the answer hinges on “compared to what?”

The standard deviation is a basic measure of how spread out a given attribute—such as a test score—is in a population.  When scores are widely spread out away from the average, the standard deviation is large;  when the scores are narrowly bunched around the average, the standard deviation is small.  Many distributions, whether in nature or by design, take on the shape of a bell curve.  The family of such distributions are called normal distributions, and they have some properties that are really useful for making sense of a given effect.

The figure below shows a standard normal distribution, with a mean of zero and a standard deviation of one.  A standard normal distribution is symmetric, with 50% of the cases above the mean and 50% below the mean.  About 34% of the cases are between the mean and one standard deviation above the mean, and a similar fraction is between the mean and one standard deviation below the mean.  An additional 13% on each end or so are between one and two standard deviations away from the mean, and about 2.5% on each are more than two standard deviations away from the mean.
normdist
What this means is that we can use the standard deviation as a way of thinking about the distance between two groups expressed as the group’s average percentile in the population.  For example, the average difference between Blacks and whites on many standardized tests is about one standard deviation.  This means that, regardless of the scale of the test, if the scores take on the shape of a bell curve, and if the typical white student is scoring at the 50th percentile, then the typical Black student is scoring 34 percentiles below that, at around the 16th percentile.  That seems like a very large difference, and we can see the impact of differences of this magnitude in the underrepresentation of Blacks in settings where access is based on standardized test performance.

So it’s easy to see that a one standard deviation difference is a big difference.  What about differences of the magnitude reported by Hoxby and Murarka?  Are these large or small?  The figure below helps us judge.  This figure shows the percentile differences associated with particular differences between two groups:  .05 standard deviations, .10 standard deviations, .20 standard deviations, and .50 standard deviations.  What’s a little tricky is that the percentile differences depend on where in the distribution we start;  they’ll be largest if we start at the middle, and smaller if we start further away from the middle.
normdist2
In the figure, .05 standard deviations is represented as the distance between the blue column and the red column.  If we start away from the middle, at one standard deviation below the mean, a .05 standard deviation difference equals the difference between the 15.9th percentile and the 17.1st percentile-a 1.2 percentile shift.  If we start at the middle, a .05 standard deviation difference is the difference between the 50th percentile and the 48th percentile.  A .10 standard deviation difference (the distance from the blue column to the green one) is the difference between the 15.9th percentile and the 18.4th percentile-a 2.5 percentile shift.  A .10 standard deviation difference also corresponds to the difference between the 50th percentile and the 46th percentile.

Larger effect sizes correspond to larger percentile differences.  A .20 standard deviation difference, represented in the figure as the distance from the blue column to the yellow column, ranges from 5.3 percentiles to 7.9 percentiles, and a .50 standard deviation difference-half of the performance difference between Black and white children and youth on many standardized tests-is 15 to 19 percentiles.

Subjectively, it seems to me that differences smaller than .10 standard deviations are pretty small.  Moving a group of students up 2.5 to four percentiles in a year may be a challenging accomplishment, but it’s not a big move.  On the other hand, moving a group up three or four percentiles a year for several years in a row seems like a bigger deal.  Even so, five years in a row of an effect of .10 standard deviations would move a typical Black student from the 16th percentile of the population distribution to around the 30th percentile, still well behind the performance of a typical white student.

And we need to be especially cautious about claims that are made about the cumulation of effects over time.  It’s tempting to extrapolate from the effects that are observed in a particular year to what we would see if those effects accumulated over several years.  But in most social and educational interventions, effects “fade out” over time, reducing in intensity as time goes on.  If the largest effect is observed in the first year of an intervention, it can be substantially misleading to assume that similar effects will be seen in subsequent years.

I’ll have more to say about what counts as a big effect next week.

3 Comments

Subscribe to comments with RSS or TrackBack

  1. ceolaf

    Aaron,

    This is a great explanation, and I’d like to add a couple of things.

    1) To get a sense of scale, the test that we all know best is probably the SATs, whose standard deviation is supposed to be 100 points in each test (i.e. verbal or math). That means that this latest Hoxby study show effect sizes that are proportional to average of 9 more points on the SAT math test (i.e. 0.09 * 100) and 4 more points on the SAT verbal test (i.e. 0.04 * 100).

    2) Is that a lot? I’ll leave that for you get into a bit more next next week. As you said, however, it *is* statistically significant, which is not a comment on effect size. I want to say that again: statistical significance is *not* a comment on effect size. So, a lot of studies show “small but [statistically] significant” effect sizes.

    3) I very much like that you pointed out that a given difference as measured in standard deviations means different things in term of percentiles at different points along the distribution. However, studies like this recent one from Hoxy about differences from the mean or between means of different populations. She is writing about the average scores of all NYC charter students compared to the average scores of all NYC non-charter public students — or should be. These kinds of studies are looking near the mean (i.e. the highest point of the bell curve).

  2. eduwonkette

    Fabulous figures. Thank you!!!

  3. [...] are “significant.” Aaron Pallas, at Gotham Schools is instructing us, a sort of Statistics 101.   As the evidence mounts we can conclude, not surprisingly, that “all teachers are not [...]

Leave a Reply

Tips, questions, feedback?

Contact us at .

Mapping the Budget Cuts

Post a comment about the budget cuts at your school on our interactive comment map. more »

Chalk It Up

Our Twitter Updates

  • That was anticlimactic: Chancellor Klein just announced that school is closed tomorrow. Go stock up on cocoa now! 3 hrs ago
  • What are odds that tomorrow will be a snow day in NYC schools? Mayor Bloomberg is holding a 1 p.m. presser to discuss the city's snow plan. 4 hrs ago
  • Citywide Council on High Schools meeting is set to proceed as scheduled, for now. Same goes for the PEP meeting rescheduled from Jan. 26. 22 hrs ago
  • From the DOE: In anticipation of inclement weather, the Specialized High School open houses scheduled for Weds. have been postponed. 22 hrs ago
  • @datadiva What do you see as the biggest changes? We're having trouble figuring out what to make of the 2010-2011 changes. in reply to datadiva 23 hrs ago

Events Calendar

Archives

February 2010
M T W T F S S
« Jan  
1234567
891011121314
15161718192021
22232425262728

GothamSchools by Email

Technology in Education

The blogroll is a work-in-progress; to be added or if you've been miscategorized, send us an email at .