Post a comment about the budget cuts at your school on our interactive comment map. more »
It’s not news to report that statistics can be deceptive. But when a new set of test scores come out, it’s worth repeating nonetheless.
Teachers College sociologist Aaron Pallas tackles the subject in the Community section of GothamSchools today, by taking a closer look at two middle schools that the Post has recently highlighted for exceptional performance and finding that both schools admit their students selectively. He writes:
Due to their selective admissions, IS 187 and, to a lesser extent, IS 364 were born on third base. The New York Post thinks they hit a triple.
Some schools might have hit something closer to a home run. Manhattan’s citywide Anderson School, for instance, admitted every single one of its students in grades 3-8 on the basis of their scores on an IQ test and in-person interview. Not a single student at Anderson failed the math test, and in fact it was the only school citywide with a clean 100 percent of all students in a single grade scoring at the very highest level, in the sixth grade.
Not all successful schools handpick their students. (more…)
Schools Chancellor Joel Klein has for years been a proponent of paying some teachers more based on their performance, and he has made some headway in introducing merit pay in the city schools. But the policy has plenty of critics, from teachers who say merit pay divides them to statisticians who point simply to flaws in the measures on which pay calculations are based.
In the video above, University of Virginia psychology professor Dan Willingham gives six reasons in three minutes why paying teachers based on their students’ test scores isn’t statistically sound. But Willingham doesn’t totally rule out the prospect of paying better teachers more: “Merit pay can’t work until there’s a way to measure teacher performance that’s fair,” he concludes.
Columbia sociologists Jennifer Jennings and Aaron Pallas (also known as Eduwonkette and her sidekick, skoolboy) take a long, hard, statistical look at this year’s progress report grades and find that the mayor and chancellor’s claim that the new, higher grades reflect school improvement is way overblown. They conclude:
At best, there is no correlation over time in the DOE’s reports of which schools are good at inducing growth in ELA achievement. At worst, the DOE’s system finds that the schools that were better than average in 2007 were actually worse than average in 2008.
Last year, the first round of progress reports attracted anger and ridicule. Perhaps because far fewer schools received low grades, the response this year has been more muted, making room for measured, evidence-based discussion of the DOE’s methodology in constructing the reports.
Over at Eduwonkette, Harvard education professor Daniel Koretz offers a lengthy critique of the progress report methodology. He notes that test scores alone are not a legitimate way to evaluate schools; New York State’s tests were not designed to be used in “value-added” analysis like that behind the progress reports; and the progress reports, like all accountability systems, place pressure on school administrators that likely leads to score inflation. In addition, he writes that the DOE’s formula does not take into account “interval scaling,” or the reality that different amounts of “value” are required to move students from one proficiency level to the next at different points on the proficiency spectrum. (In June, I wrote about how interval scaling might contribute to the finding that No Child Left Behind has helped high-performing students less than their low-performing peers.)
But those problems exist in many test-based, value-added accountability systems — Koretz writes that New York’s progress report system has its own set of errors. The tremendous variation in schools’ grades from last year to this year probably has less to do with school improvement than sampling and measurement error, he writes.
Here’s an illustration of the effect of error. I first calculated the variation in schools’ grades between last year and this year and then graphed it against their enrollments. (more…)
When the state released graduation figures earlier this month, I wondered what the city’s old formula for determining graduation rates would have said about the class of 2007. Yesterday, Edwize pointed us to a 276-page report available on the DOE’s website that includes the answer to that question and much, much more.
Although the state’s graduation figure of 52 percent is the official one thanks to an agreement between the city and state last year, the DOE still calculated the graduation rate for the class of 2007 using its old formula, which gave credit for students graduating in August and for students completing a GED or IEP diploma rather than a local or Regents diploma. According to this formula, 62 percent of students entering the city’s high schools in the fall of 2003 graduated on time, an improvement of 2.3 percentage points over the class of 2006. (more…)
The internet has seen a flurry of activity recently over the DOE’s claim that it has reduced the achievement gap between black and Hispanic students and their white and Asian peers. Testing that claim, the New York Sun submitted the ELA and math scale score data for students in grades 3-8 to three independent analysts, who concluded that the gap has decreased in ELA, but has stayed flat since 2002 in mathematics, confirming much of Eduwonkette’s analysis.
The new analysis emphasizes the difference between closing the proficiency gap by comparing the percentage of students who score at a level 3 or 4 on state tests, and closing the achievement gap by comparing mean scale scores.
I have to take issue with Klein’s dismissal of statistical significance, as reported by the Sun:
The National Center for Education Statistics also concludes that upward trends in the reading scores of black and Hispanic fourth-graders lauded by Mr. Klein are not statistically significant.
Mr. Klein criticized the National Center on Education Statistics analysis.
“Those are just confidence levels. Nobody is saying this is a science,” Mr. Klein said. He added: “If three points is flat, and four points is statistically significant, then what you’re doing is, you’re playing something of a game.”
Um, Chancellor Klein? Statistical significance isn’t game-playing - it’s a standard measurement of how likely a result is to be non-random. Did test scores rise because of DOE policies, or random chance? Statisticians don’t just draw a line wherever they want - they use widely-accepted significance tests that you learn in any basic stats class. Or perhaps you’re suggesting that the NCES plays with confidence levels to downplay New York’s progress?