GothamSchools — daily independent reporting on NYC public schools

test cred (updated)

Calls for investigation into test credibility go unanswered

State Board of Regents Chancellor Merryl Tisch is calling for state exams to be more “defensible,” but a study investigating test score credibility requested a year ago by the state’s testing oversight board has still not received a go-ahead.

The committee first formally asked the state education department to join an academic study on the state tests in the fall of 2008, said chair Howard Everson. The education department declined but did not rule out future participation. Since then, Everson has received no requests to revisit the idea, he said in an interview yesterday.

“It’s hard to trust the data right now,” said Everson, a psychometrician who is also a senior fellow at the City University of New York. Everson’s committee, the state Technical Advisory Group, is charged with monitoring the state testing process.

The study, which Everson is developing with Harvard education professor Daniel Koretz, would investigate a phenomenon called score inflation. Score inflation happens when rises in test scores reflect something other than actual learning — for instance, bending the test rules (e.g., giving students more time) or even cheating. Tests may also become so predictable that teachers learn to coach students on how to ace them, experts say.

The researchers propose measuring what test scores really represent by creating a “test within a test” that would assess the same skills but in a variety of formats. If students score worse on the new test than on the regular test, Everson said, that would suggest that score inflation is at work.

The key is to give both tests to the same group of students at around the same time. This would eliminate variables that may account for existing differences in results between state exams and alternative assessments such as NAEP.

If such a study happened, Everson said, it would be the first time that New York state accountability tests have ever been rigorously examined for possible score inflation.

Requests for comment by a state education department spokesman on whether a study on score inflation is being considered were not returned.

The research study designed by Everson and Koretz is still in its preliminary stages, Everson said, and the researchers are in early discussions with several states to participate.

New York would be an excellent site for research because of the high stakes attached to the exams, Koretz said in an interview earlier this year.

“In the old days, when the pressure to raise scores wasn’t so high, it mattered less that tests were somewhat predictable,” he said. “But when the pressure is as high as it is now – everywhere, but particularly in New York with value-added measures for teachers and so on – people have every incentive to look for predictable patterns and to narrow their instruction to focus on those patterns.”

Columbia University sociology researcher Jennifer Jennings has found that almost identical questions have appeared on each state math exam since 2006, making it easier for schools to teach to the test.

Koretz said that with increased attention to high-stakes testing in New York, he hopes that the pressure for a thorough look into score inflation will mount.

“I hope that New York can be what finally breaks the ice,” he said.

Everson was quoted in the Times as saying that the state tests are “about as good as we can build them.” In an interview with GothamSchools, Everson elaborated on that appraisal, saying that the technical operations of state testing–how the tests are administered and scored–are strong. But he made a distinction between the way the tests are being given and the way the results are being interpreted.

“It’s just a hypothesis, but we worry that there would be more score inflation,” Everson said. “We think there would be more teaching to the test.” But there is no way to be sure without more rigorous examination, he said.

UPDATE: The New York State Department of Education is currently researching the possibility of using an “audit mechanism,” like the “test within a test” that Everson described, to guard against score inflation in their standardized tests, department spokesman Jonathan Burman said.

9 Comments

Subscribe to comments with RSS or TrackBack

  1. Maybe the new Commissioner will order steroid tests for tests?

  2. David

    Why don’t u whip up a request for some of the Obama Race to the Top dollars, a mere million or so, to fund such research … Arne, after all, is data driven …

  3. Michael M.

    We need to make a distinction as to the two kinds of inflation here:
    1) Raw scores (then converted to scale); and
    2) Cut scores (based on scale scores).

    Techies are welcome to correct my abuse of these terms, but I think you all get my point.

    Regardless of whether there is raw score inflation, that there is cut score inflation is at the heart of the Mayor’s PR onslaught that the number of “proficient” kids is soaring, not only on his watch but due to his policies.

    If it turns out there is ALSO scale score inflation, it would seem the two effects compound each other.

    Per DB above, it’s easier to hit it out of the park when they move in the fences, even if you’re not on steroids.

  4. You write above, “Everson was quoted in the Times as saying that the state tests are “about as good as we can build them.” In an interview with GothamSchools, Everson elaborated on that appraisal, saying that the technical operations of state testing–how the tests are administered and scored–are strong. But he made a distinction between the way the tests are being given and the way the results are being interpreted.”

    What? First of all , if he believes that the areas tested may have narrowed over time, as implied above, that is a problem with the way the tests are constructed (or built.)

    Not only is there evidence to suggest that the way the tests are built is flawed, with repetition of the same types of questions year after year, but also the way in which they are scored — with consistently lower cut scores for proficiency.

  5. Michael M.

    There once was a psychometrician,
    Who, informed of his sagging position,
    Was heard to exclaim,
    Though most found it lame,
    “I’m a head-tester — not a magician.”

  6. One of the most dramatic changes in the last few years is technology … as soon as test scores are available principals download form NYSTART, the SED testing site, disaggregate the data, create an error matrix, and provide teachers w/ student specific data re student weaknesses … teachers create lessons to address the student deficiencies … is this “gaming the test,” or simply using data to drive instruction … some teachers/schools are skillful, and others create deadening “drill and kill” lessons …

    a careful analysis of the entire testing system is long overdue, and, I have real confidence in Daniel Koretz as chief investigator.

  7. One of the most dramatic changes in the last few years is technology … as soon as test scores are available principals download test results from NYSTART, the SED testing site, disaggregate the data, create an error matrix, and provide teachers w/ student specific data re student weaknesses … teachers create lessons to address the student deficiencies … is this “gaming the test,” or simply using data to drive instruction … some teachers/schools are skillful, and others create deadening “drill and kill” lessons …

    a careful analysis of the entire testing system is long overdue, and, I have real confidence in Daniel Koretz as chief investigator.

  8. Three issues - ways to see what’s what now, without waiting for a big study:

    1. Somebody FOIL the reports to NYSED from its outside regrading firm (for the grades 3-8 ELA and math exams) to see what they say. It’s never a good idea to rely on what NYSED says reports contain when it hasn’t written them.

    And then there’s the issue of what actually happens, or happened, to a school or district when and if the regrading firm found that grades had been inappropriately inflated. NYSED took the latter material off its web site (no surprise there) a year or two ago, but from what I recall, the answer to “what happens if a school or district is found to have inflated test grades is ‘nothing.’” As in, a letter is written to somebody saying “tch, tch.” No scores are changed. No entire school or district tests are regraded under NCLB reporting. Nothing to do with accountability is altered in any way. It’s just “bad boy, now don’t do that again or we’ll smack you with a wet noodle. So there!”

    I think I’m trying to say that the entire NYSED testing program is probably permeated with one kind of fraud or another. Where a district wishes to tamper with test grades to its advantage, there is nothing in the NYSED system to stop it from doing so. Again and again and again. And when test score numbers get to be weapons in political campaigns, the first casualty is … truth.

    The devil’s in the details, my friends. Let me commend them to you. The boring ones you find in NYSED’s manuals when you have insomnia and it’s 3 a.m. and you figure poring over its web site is better than tossing and turning for another hour or three. Been there; done that. Many times.

    2. Get the NYCDOE’s enrollment-by-grade figures and compare them with the no. of kids tested in each grade. A little birdie tells me that the NYCDOE doesn’t test all the kids it should.

    When you get to the Regents exam level, NYSED used to put out yearly reports showing how many kids with disabilities for the State of NY, and separately, for the NYCDOE, should have taken each Regents exam but were “not tested.” The percentages of “not tested” for the NYCDOE averaged about 50%. Yes! 50%. I have some of these archived if anyone wants to see them.

    Now, these were the mildly disabled kids whose IEPs said they should take the regular Regents courses and regular Regents exams at the end of these high school courses. So how did NYCDOE get away with a 50% “not tested” rate? It should have automatically failed every high school Adequate Yearly Progress measure, as should each of the community and other “districts” - not to mention individual high schools - for which scores were reported. And there should have been a big stink about this. Not!

    I see absolutely no reason why the NYCDOE would “not test” about 50% of the kids with disabilities who should have been tested at the high school level but still test almost all, or all, of the kids with disabilities who should have taken the grades 3-8 ELA and math tests because their disabilities were mild.

    And please remember - kids w/disabilities are close to 15% of the NYCDOE’s enrollment. So if a fat chunk of those kids should have, but did not, take the tests, and the NYCDOE has a truly miserable and incompetent special ed. program, you’re looking at a great way to insure that reported scores get higher and higher and higher and higher, because you’ve “not tested” the kids most likely to get horrific scores. Right out of the gate.

    Since the accession of Michael Bloomberg to the throne, NYSED more and more began tampering with data it released to keep the really bad numbers under some rock. The “not tested” numbers for kids w/disabilities who should have been tested is just one example. There are many more.

    Since Regent Merryl Tisch appeared to be such good friends with Joel Klein, et al., is there any reason to believe that Merryl Tisch, Chair of the Board of Regents, is really going to sing a different tune … and do a different dance?

    3. NYSED publishes a big analysis, yearly, of the grades 3-8 ELA and math tests’ scores. I recall that these are from the tests’ publisher(s). All kinds of statistics in them and frankly, most make my eyes glaze. But then there was always, until maybe 2 years ago, a direct comparison of grading of these exams by NYCDOE teachers v. NYC private school teachers. Guess who graded higher? I sent out comments to a few well-placed folks and … NYSED stopped publishing this specific bit of information when it published these lengthy analytical reports.

    As with everything else-NYSED, when information stops being reported which previously was, you can - and should - assume that they’re trying to hide something embarrassing, humiliating, or occasionally, downright nasty. I think all of these are in play in terms of why NYSED stopped making this information available to the public.

    So, friends … dig deeper. Five will get you ten that if you do FOIL these materials, NYSED will stall you off ’till the cows come in. Or else claim - probably falsely - that it doesn’t collect or receive this information anymore.

    Believe that … and … hey, there’s this great bridge for sale, just down the road apiece … . Wanna buy a piece of it?

    Dee Alpert, Publisher
    SpecialEducationMuckraker.com

  9. Philip Nobile

    It amazes me that discussions of test inflation almost never deal with the real issue–principal and teacher cheating. Even this thread hardly touches the topic whereas all of us know that the crime of tampering is universal in our schools and will not die until grading is totally blind. Tweed reflexively pooh-poohs cheating and the UFT does not dare admit its members routinely fiddle with scores. Research should begin with a survey. This no brainer was adopted by the Chicago Teachers Union last year. Unfortunately, the Unity caucus lacks the nerve and the ethics to follow suit. How can I be sure? I suggested a survey to President Mulgrew. His response via flack Peter Kropotkin: no time in his schedule for a yes or no answer. 

Leave a Reply

Tips, questions, feedback?

Contact us at .

Mapping the Budget Cuts

Post a comment about the budget cuts at your school on our interactive comment map. more »

Chalk It Up

Our Twitter Updates

  • Kanye West on education (no really) RT @kanyewest cool ain't cool no more, it's a new day education is the new motivation 1 day ago
  • "Lady Gaga is doing just fine with just a year of college" says NYC Mayor Bloomberg. 2 days ago
  • Pass rates on state tests fell sharply this year, after the state raised the scores needed to be deemed proficient. http://bit.ly/b5PK3I 2 days ago
  • Duncan on Michelle Rhee's firings: "It’s a race to the top. I don’t think anyone’s going to fire their way to the top." 3 days ago
  • What should $4b for bottom 5% schools go toward? Duncan: "Whatever it takes, we want folks to do." 3 days ago

Events Calendar

  • No events.

Archives

July 2010
M T W T F S S
« Jun  
 1234
567891011
12131415161718
19202122232425
262728293031  

GothamSchools by Email

Technology in Education

The blogroll is a work-in-progress; to be added or if you've been miscategorized, send us an email at .