GothamSchools — daily independent reporting on NYC public schools

counterintuition

Higher pass rates could be due to tougher tests, expert says

The number of correct answers needed to pass state exams is falling — but the head of the state’s testing oversight board says that’s because the tests are actually getting more difficult.

Critics charge that the tests have become so easy that students can guess their way through them. But there might be a good reason for the shift, said Howard Everson, chair of the state body that oversees the testing process: As the individual exam questions have gotten harder, students need to answer fewer of them correctly to earn the same score.

“The idea you have to remove from your head is that a test has a certain number of questions and all of those questions have the same weight every year,” Everson said.

Instead, he said, the state has asked CTB/McGraw-Hill, the company that publishes the exams, to make test questions slightly harder every year. The publisher then adjusts the scale that calculates a student’s final score from the number of correct answers according to the difficulty level of that year’s questions.

The modifications ensure that the test is scored fairly from year to year, Everson said, so that a student correctly answering seven relatively easy multiple choice questions one year would not receive the same final score on an exam as a student correctly answering seven harder questions a different year.

But a side effect is that students have to answer fewer questions correctly each year to pass the tests.

In 2006, for example, a seventh-grade student needed to earn 28 out of 50 possible correct answers on a combination of multiple choice and open-ended questions to score a Level 3 on the math exam, indicating that the student met state learning standards. In 2009, a seventh-grader needed only 22 of the 50 correct, a decline of nearly 12 percent.

Similarly, fifth-graders in 2009 needed half the total number of correct answers on their math exam for a Level 3 score, down more than 8 percent from 2006.

If the state wanted to head off this trend, it could modify the scale and raise the “cut scores” that separate proficiency levels, Everson said. (The cut scores are based on a complicated point-to-score conversion process detailed here.) But the scales and cut scores have not been reviewed since 2004, he said.

The trend is especially relevant in New York City, where students in grades 3-8 must now score at least a Level 2 on the exams to be promoted to the next grade.

The ease with which students can hit the Level 2 mark may account for the dramatic reduction of the numbers of failing students in New York City. Only a tiny proportion of city students now score at the lowest level on the state tests.

Some critics dispute Everson’s assertion that test questions are getting harder. They point to a recent study that revealed that some test questions are reused year after year in virtually identical form.

Everson said his committee determined that the tests were technically sound. But if student performance is truly improving, he said, the way the tests are graded should change.

“It’s certainly time for another assessment of the assessments,” he said. “We do want to make adjustments if we’re testing a higher-ability population in 2010 than we were in 2000.”

Everson emphasized that the elements that influence test score results are complex. Without an updated review of whether higher  scores truly reflect greater learning, he said, it is difficult to know how to interpret exam results. Everson has been calling for a review but so far has not persuaded the state to undertake one.

“That’s the question that goes begging at the moment,” he said. “Are the abilities of the children really improving? And if they are, what does that imply for the testing program?”

  • Michael M.

    Reminds me of the old line: “What this country needs is more one-handed economists.” Or in this case, test assessors.

    I get the suggestion that there are competing rates of change. But are the kids getting brighter faster than the tests are (allegedly) getting harder? (This despite recent articles showing:
    a) rerun questions from prior years; b) active coaching to the rerun questions; and c) random-plus-one “earns” a Level 2.)

  • canwetalk

    If a students only needs to answer a few questions because of the difficulty of the test to get to level 2, then make it super-easy so that a student can answer ALL of the questions to get to level 4. Using this approach every student will meet the NCLB mandates by 2014. This way every test publishing company, politician, and educrat can celebrate their accomplishment. Of course, prepare yourselves for the devasting economic repercussion. Countries that take pride in their education system will be preparing their students for the great demand of outsourcing coming from the U.S. Congratuations to our politicians and educrats for bringing down the economy in the future – No Economic Recovery Found in Sight!

  • http://www.classsizematters.org Leonie Haimson

    I don’t get this posting at all. Even assuming Everson is right and the questions have gotten harder (with no evidence provided of this assertion) what difference does it make if the cut scores are set so low that a student can reach a level two by random guessing?

    Did you ask Everson this question? If not, can you?

  • Diana Senechal

    I agree that there are many factors involved in analyzing test scores. But some figures are startling nonetheless. For instance, isn’t it a little odd that only 0.2 percent of NYC sixth graders scored at level 1 in ELA this year? One out of every five hundred? In any case, this calls for more investigation, and I am glad to hear that Everson has been calling for a review.

  • Michael M.

    I dare say that by this point, the folks following this topic “get” the difference between raw scores and scale scores and cut scores.

    Whether the questions are getting harder or easier might surely affect the raw scores. (Even a Level 2 parent like myself can get that one.) But that should NOT affect the *converted* scale scores, let alone the *judgement-call* cut scores.

    Now let’s revisit the above Everson quote:
    “The idea you have to remove from your head is that a test has a certain number of questions and all of those questions have the same weight every year.”

    Ouch! Not sure which is more objectionable — the ‘tude or the straw man.

    P.S. to CWT: I suggest it’s time to take out the Test Reformers Assessing Student Heuristics. Agreed?

  • Michael M.

    I took a test,
    ‘Cause tests are fun.
    I rolled the dice,
    And got a ONE.

    My parents said,
    “That just won’t do.
    With just one right,
    You’d get a TWO.”

    And then I heard,
    From DOE,
    “Proficient means,
    You’ll need a THREE.”

    I took a test,
    I got a score.
    The score I got,
    Was number FOUR.

    I must progress!
    My school’s in fear!
    I’m praying for a FIVE!
    Next year.

  • http://edintheapple peter

    These are questions for psychometricians … or whatever is the appropriate term for test creating/ analysis experts … maybe a task for the Independent Budget Office (IBO) now that they are the statuatory organization for audit/analysis of DOE data … anecdotal comments from grades 3-8 teachers say the questions are fair and appropriate but the “scaling” of the tests pushes down the bar … Merryl Tisch? David Steiner? You’re first big challenge …

  • http://jd2718.wordpress.com Jonathan

    “Everson, chair of the state body that oversees the testing process” Could you identify the name of the board and Everson’s actual title?

  • Diane Ravitch

    These comments are a hoot. Is this a satire?
    Diana Senechal reached level 2 by guessing at random, and the tests are harder? The number of students at level 1 has dropped dramatically in the past three years, and the tests are harder. C’mon, this is funny. Everson has a great sense of humor!

  • Michael M.

    DR,
    Thank you for cutting to the essence.

    And who cares if the questions are harder if the cut scores have been lowered to the point of ludicrosity?

    What “Level” would a lab mouse score on a lucky day?

    Would having an unlucky lab mouse repeat the curriculum ensure its success the following year? Statistically… yes.

    And thank gawd we have Mayor Mike and mayoral control to thank for it.

  • Michael M.

    There once was a psychometrician,
    Who tested a bunch of magicians.
    Much to his surprise,
    They all quickly got wise,
    As they told him just what he was wishin’.

  • http://edintheapple peter

    Michael

    I nominate u for poet laureate … u only need eight votes on the PEP …

  • http://jd2718.wordpress.com Jonathan

    “the head of the state’s testing oversight board”
    I’m still trying to figure out who he is. It looks like maybe he doesn’t really work for the state? Some sort of academic?
    If you have an article that’s based on the judgment of some sort of expert, it would be nice to know who he was.

  • http://edintheapple peter
  • http://www.gothamschools.org Philissa Cramer

    Jonathan, Howard Everson is the chair of the state’s Technical Advisory Group, an oversight body that monitors state testing. He is also a psychology professor at Fordham, focusing on psychometrics. His CV is here: http://www.fordham.edu/academics/programs_at_fordham_/psychology/everson/index.shtml

  • http://jd2718.wordpress.com Jonathan

    This appears to be a narrow-charge committee (in which case membership should not serve as a credential). “Oversees” is misleading. I hope he did not misrepresent himself to you.

  • Howard Everson

    Thanks to Philissa for providing some background, and clarifying my role. The Technical Advisory Group (TAG) referred to in earlier posts to this site does not provide “oversight” or policy advice. Nor does the TAG make policy with respect to proficiency standards and what they are intended to mean. We, the TAG, do provide technical advice to the NYSED on the psychometric characteristics of the State’s accountability tests. Virtually all the states convene technical advisory committees to help their education staffs and other policymakers understand the measurement issues that underpin their use and reliance on standardized tests for accountability under the NCLB Act.

    There has been much discussion of late about the question of chance-level scoring on the State’s tests. This is a well-known issue in educational measurement, particularly when a test includes a large proportion of multiple-choice questions. The average chance score is
    defined as the number of items divided by the number of alternatives. So for a 32-item multiple-choice test with 4-choice (A,B,C,D) response options, the average random score will be 8. Knowing this, the test sponsor (the State) and the test user (the schools or school districts) have to make decisions (policy) about what those chance-level scores mean.

    In the case of NY, the chance-level scores that have been discussed in the press and on the web are interpreted, for the most part, as indicating that the student is unable to demonstrate basic understanding of the content and/or concepts of the academic domain being assessed. This is not an uncommon interpretation of chance-level scores. There are, however, some school districts and jurisdictions within states that add other “layers” of interpretation onto a state’s proficiency scale. When this happens the validity of the score scale and the standards may be compromised. But, again, when these issues surface, as they do in many states, the measurement issues begin to intersect with policy issues. And again, this, too, is not an uncommon problem when measurement methods are used within a larger social or public policy context. We see this, for example, when metrics are developed to inform health policy, public safety, economic policy decisions and many other societal issues. Educational reform and accountability are vulnerable to many of the same shortcomings.

    The debate over test scores and what they mean is a good one and it needs to continue, if we are to make progress in improving the productivity of our schools.

Tips, questions, feedback?

Contact us at .

Follow GothamSchools

RSS

Feb. 10: You’re invited!

Chalk It Up

Recent Comments

46 comments so far today

Our Twitter Updates

  • Despite some tense confrontations between protesters and police, nothing ever got physical and a lieutenant just said there were no arrests. 16 mins ago
  • He's been frozen in that stoic position all night MT @lisafleisher: A protester speaks with his middle finger. http://t.co/xLar4NRU 18 mins ago
  • Last of the occupy protesters just walked out together, shouting expletives and insults on their way out. #toughcrowd 22 mins ago
  • Frank Thomas, DOE spokesman just told me no arrests have been made tonight at PEP despite confrontation between protesters & police earlier. 58 mins ago
  • RT @leoniehaimson: It's been shown repeatedly that as one schl closes another overwhelmed w/ high needs kids that small schls won't take 1 hr ago
  • More updates...

Archives

February 2012
M T W T F S S
« Jan  
 12345
6789101112
13141516171819
20212223242526
272829  
?>