GothamSchools — daily independent reporting on NYC public schools

state wobegon

Looking for the culprits behind tests’ dropping standards

What does it mean for tests to get easier? And is that really what happened to New York’s tests?

The analysis that has spurred that idea in the last few weeks actually found something slightly different. The tests aren’t necessarily easier, in the way that a kindergarten spelling bee is easier than the SAT. Instead, between 2007 and 2009, students who hadn’t learned much came out looking like they had.

This is an important distinction because it points to a different culprit behind the dropping standards than simply the individual test items themselves. Instead, Harvard professor Daniel Koretz – the lead author of the analysis commissioned by the state education department — names two possible causes: a phenomenon called “score inflation” and a possible psychometric error tied to an obscure state law.

The actual questions on the test play a role in both, but just as important is the practice of prepping students extensively for tests. Another key is a state law that forces New York to release all test items publicly, making it easier for teachers to practice test prep and making it harder for officials to keep tests consistent over time.

What Koretz found: A dropping hurdle

The question that motivated this week’s scrutiny of the state tests was: Is the increasing number of New York students passing the tests a sign that they know more — or is it a mirage?

In other words, imagine that the passing score of Level 3 out of 4 is a hurdle. Koretz wanted to figure out if more students were leaping over it because more of them could actually jump higher or, alternatively, because the bar had somehow been tugged down.

Maintaining a “proficiency” bar at the same height over time is harder than you might think, because unlike physical height, academic performance is abstract. An entire field of statistics called “psychometrics” exists just to keep the bars at the same height over time.

Likewise, it was a challenge for Koretz to test whether the Level 3 bar for the exam he studied first — the eighth-grade math test — stood at the same height in 2009 as it did in 2007. To compare two abstract things, Koretz needed a stable measurement of students’ raw competence. How much competence did it take to score a Level 3 in 2009 versus 2007? If the hurdle had stuck at the same height, the knowledge needed to clear it would be exactly the same.

To approximate raw competency, Koretz used the NAEP exam, which is the most respect national test and on which — conveniently for Koretz’s purposes — performance was relatively stable overall between 2007 and 2009. Using a mix of national and state test results, he could estimate the rough percentile rank on the NAEP that students had to get to achieve a Level 3 on the New York State test.

The move was like saying, If you gave the New York test to students nationally, what percentage would fail? (Scoring at the 80th percentile on a test means that you have reached a level that 80 percent of people couldn’t.) The national failure rate to match New York’s Level 3 was a rough way of knowing the New York “proficient” students’ raw competency.

If the number stayed the same between 2007 and 2009, then the bar must have stayed put. If raw competency dropped, Level 3 must have sunk, too.

As Koretz put it, “If people have to jump over a similar hurdle, the proportion failing to get there shouldn’t have changed dramatically – because NAEP scores didn’t change very much.”

But this is not what he found. “In fact,” he said this week, “the hurdle had been dropped so much that almost no kids would have failed to jump it.”

In 2007, 12 percent of students nationwide failed to reach the NAEP level equivalent to a Level 2 on the math exam. In 2009, the percentage had dropped to 2. For Level 3, the percentage dropped from 36 in 2007 to 19 in 2009.

Why?

Koretz says he can’t yet be certain why the Level 3 hurdle dropped over time, but he has two guesses. The first — and the one he suspects most strongly — is a phenomenon called “score inflation.”

Score inflation’s primary cause, Koretz told me, is what he calls “inappropriate test prep” — coaching students on material that teachers know will be covered on the test to the exclusion of other material covered by state standards, but that for a range of reasons doesn’t get tested. It can also be caused by deliberate attempts to game the tests, like by barring certain students from taking the test.

The result is that students get better at scoring high on tests over time, but they don’t learn more.

The other possible explanation Koretz cites has to do with the test’s makers, who are charged with “linking” tests from one year to another so that a Level 3 holds the same meaning over time.

In New York, linking is especially challenging because of a law we first wrote about last year that requires the state to release all its test items publicly. That prevents the state from following the industry-standard method of linking, which is to hide secret test questions from one year to the next, and use them as benchmarks that stay constant between years. New York instead has to use a less-reliable method called field testing, in which the state gives separate tests each year that aren’t attached to high-stakes.

“The problem,” Koretz explained, “is kids know it’s a field test.” They don’t take it as seriously as they take the state test, and the results, therefore, are compromised.

A failure to “link” properly doesn’t mean that McGraw-Hill, the company that makes the state tests, broke rules. But, said Koretz, “Even though the process that the contractor used was kosher, it doesn’t mean it worked.”

Moving forward

How do you fix score inflation and bad linking? Koretz said it’s not enough simply to raise the score that equates to “proficient.” But he said that, so far, state education officials are taking the right steps to do more.

Though they haven’t yet decreed a ban on test prep (something that would be hard to do), they have asked McGraw-Hill to redesign the tests so that they are less easily gamed. That includes trying to test a broader set of subjects within math and reading, as happened with this year’s math (but not English Language Arts) tests. It also includes making the test less predictable from year to year. (See a story we ran last year showing how the annual math tests repeat themselves.)

Koretz also said that McGraw Hill has performed “complicated psychometric work to reduce the affect score inflation might have on the linking.”

And the tests will be entirely re-written when the national common core standards effort to re-make assessments is completed in the next few years.

It’s all a big departure from what New York State was saying just two years ago, when Koretz first requested permission to analyze the state’s tests. Then, a spokesman for the State Education Department told me:

“All of New York’s tests are checked many times to be sure that a score this year means the same next year… The only way for a student to improve performance is by learning the curriculum — reading, writing, and math.”

The full Koretz five-page memo summarizing his findings so far:

Memo: Evidence about the leniency of 8th-grade standards

  • Michael Fiorillo

    While the mechanics of this fiasco (at best)/fraud (at worst) are useful to know, why no mention of the political dynamics at work during this period?

    When it comes to spurious test scores in recent years, qui bono?

  • http://www.classsizematters.org Leonie Haimson

    I don’t get this. What about that fact that a student could randomly guess the answers to multiple choice questions in several exams and get a level two? That’s fairly obvious evidence of test score inflation; unrelated to either test prep or the difficultly of equating over time.

    Moreover, I have heard that it’s common practice for test contractors to put experimental, unscored sample questions into exams, which they do not release to the public, in order to make it easier to equate the exams over time. Did CTB/McGraw and/or NY state ever do this? And if not, why not?

  • A Teacher

    Guys, they made the tests and grading easier so that Bloomberg could show improvement. He got re-elected. Now they are setting things back so that in a few years they can show improvement again. When Cuomo becomes governor he’ll be inheriting some nice low scores that he can also improve. The problem with basing everything on tests scores is that, in reality, they can only be improved so much. Politicians like them though because they provide them with statistics which can be used and manipulated to show anything. In a few years they will be talking about how much higher the scores have gotten under Cuomo/Bloomberg and no one will really remember or care about how the scores went down that one year.

  • http://www.classsizematters.org Leonie Haimson

    And another thing; the field tests themselves showed the tests were getting easier, something that the SED and their technical advisory committee should have noticed and probably did. Doesn’t anyone remember Erin Einhorn’s article on this from the Daily News in 2007?

    On our blog, we reproduced the News’ chart showing the changing “p scores” on the field tests, which showed that the questions were getting easier, and closely predicted the rise in the test scores. For those who’d like to see it, the posting is dated Wednesday, September 5, 2007, in a column by Steve Koss, called “Here are the numbers. Everything is improving. Trust us.” The original Daily News article seems to be off the web now, but thankfully we kept the chart.

    Susan Ohanian’s blog at the time also reprinted Erin’s article which has the following quote: “There’s less than a 5% chance that [the results] are due to chance,” said James Corter, a professor of education and statistics at Columbia University’s Teachers College. “The fact that it’s a small sample is not at all important. I think you’ve got really good evidence here … that the 2005 test was easier.”

    and: “In 2005, the happy news that city fourth-graders had dusted their predecessors and showed record gains on the state exam helped buoy Mayor Bloomberg’s reelection drive just weeks before Election Day. ”

    “Did Bloomberg’s leadership boost the scores? Not if you ask 9-year-old Kirwin Seger. When he participated in The News’experiment, he was among several students who said he finished the 2005 test faster than the 2002 version. “The 2002 questions were more complicated than in 2005,” he said. “In 2005, they kept it short, simple and sweet.”

    And here we are, three years later, and we’re supposed to believe that the state made an honest mistake because they believed the field tests were actually harder than they were? The revisionist history on this is rather startling, to put it mildly.

  • http://ednotesonline.blogspot.com/ norm

    “The question that motivated this week’s scrutiny of the state tests was: Is the increasing number of New York students passing the tests a sign that they know more — or is it a mirage?”

    Gee, this is a simple one to accomplish. Pick a few hundred kids randomly and go in and test them – in late September. Then really see the fur fly.

    Why not issue a ban on all test prep and test prep materials? Oh, poor McGraw Hill and all the other vultures- so don’t ever expect this to happen.

    Not that teachers – motivated by tiny little threats of being fired – how fast we moved from paying cash bonuses to entire schools – wouldn’t find a way to do test prep.

  • Akademos

    I’m sorry but this is absolutely unacceptable negligence at the very least. Even people who admittedly don’t know curricula or pedagogy shouldn’t need a rundown on using tests to prepare for tests, especially if they are running a school system, using scores to rate and close schools, and intend to use scores to rate and dismiss teachers.

  • Pogue

    A strong condemnation of the ease of these tests can also be found in Diana Senechal’s fantastic GS article, “Guessing My Way To Promotion.” (8/17/09) A real eye-opener.

  • Jeff S

    If you’re looking for a culprit in this, you can start with Jim Kadamus who wrote a letter to the editor in today’s NY Times defending the whole examinatin procedure in New York State and talking about standards and how the NYS testing program was based on the most up to date (1996) research on testing. Of course Mr. Kadamus is the same person who after the 2003 Math A and Physics Exam fiascos (where even suburban kids were failing so it couldn’t just be thrown out as a product of the poor educational shystems in the big 5 cities of New York State) blamed the teachers for not adhering to the standards and would not accept the fact both exams were idiotic and messed up (which his boss, the infamous Richard Mills tried to defend also).

  • philip nobile

    The current scandal over the integrity of test results should not obscure the other fraud factor: extensive principal and teacher cheating. No matter where the cut score is set, as long as teachers grade the exams and principals wink, there will always be tampering. Neither the DOE nor the UFT has any interest in stopping the cheating that masks the true measure of the achievement gap. What a pity that nobody has clean hands in this mess, except the blessed whistleblowers who seem to be running scared these days without support from the UFT. I know a teacher who appealed to Mulgrew and NYSUT lawyers, personally and repeatedly, to help him make a dicey corruption report about phony credit recovery in a school. It’s been three weeks now without a response to his phone call and follow-up emails.

  • Pingback: Business Blog

  • http://www.parentadvocates.orgnycteachers.com Polo Colon

    The current FAILURE REPORT of Chancellor Bloomberg, “The Education Mayor” and his Robot, Klein only proves that the numbers had been fudged, the tests were dumbed down and the stats were juked!

    TEN years, approaching 200 BILLION DOLLARS, several “REORGANIZATIONS”, decimation of the Teaching Corps with administration and what we have here, is a result of the tremendous inane and insane ineptitude of administrative terrorists that know nothing about how to educate children in NYC!

    Bloomberg & Klein nwould serve the world better as the glorified CPA’s they are, having little to no teaching experience in the trenches! Their incompetence is only exceeded by their arrogance, ego and the snowjob they’ve done to New York!

    The lack of accountability on their part is murderous! Their spin has been horrendous! The lack of public outcry and apathy is SCARY!

    So, apparently, the current state of the American conscience is summed up by the credo: money talks & bs walks”!

    Sycophants, brown-nosers and mercenaries of the gangs of the Banana Republic of New York at the New Tamany Hall, Tweed, you must be proud of yourselves, but the chickens will all soon come home to roost!

    Everyone in the schools were commenting about
    how EASY the tests were this year! And yet, THE SCORES STILL WERE DOWN!!!

    FIRE BLOOMBERG & KLEIN!

  • Pingback: Times tracks test troubles - EdVANTAGE Blog - The Official Blog of the New York State Council of School Superintendents

Tips, questions, feedback?

Contact us at .

Follow GothamSchools

RSS

Feb. 10: You’re invited!

Chalk It Up

Recent Comments

0 comments so far today

Events Calendar

Our Twitter Updates

  • Despite some tense confrontations between protesters and police, nothing ever got physical and a lieutenant just said there were no arrests. 3 hrs ago
  • He's been frozen in that stoic position all night MT @lisafleisher: A protester speaks with his middle finger. http://t.co/xLar4NRU 3 hrs ago
  • Last of the occupy protesters just walked out together, shouting expletives and insults on their way out. #toughcrowd 3 hrs ago
  • Frank Thomas, DOE spokesman just told me no arrests have been made tonight at PEP despite confrontation between protesters & police earlier. 4 hrs ago
  • RT @leoniehaimson: It's been shown repeatedly that as one schl closes another overwhelmed w/ high needs kids that small schls won't take 4 hrs ago
  • More updates...

Archives

February 2012
M T W T F S S
« Jan  
 12345
6789101112
13141516171819
20212223242526
272829  
?>