GothamSchools — daily independent reporting on NYC public schools

skoolboy
Aaron Pallas

Why NAEP Matters

NYC Chancellor Joel Klein’s response in Wednesday’s New York Times to Diane Ravitch’s op-ed last week provides a lot to chew on.  Today, I’m focusing on his comments about the National Assessment of Educational Progress (NAEP), which is also known as the Nation’s Report Card.  NAEP began collecting data in 1969, and remains the only federal assessment designed to report on trends in the academic performance of U.S. children and youth.  All 50 states and the District of Columbia participate in NAEP, as does New York City and an increasing number of other urban school districts.  NAEP has an annual operating budget of more than $130 million per year, which represents a significant share of federal investments in education research.  Though not an expert on testing and assessment, Diane Ravitch has a long-standing interest in NAEP—she was appointed to the bipartisan National Assessment Governing Board (NAGB), which oversees NAEP, during President Bill Clinton’s second term, and remained on the board until 2004.

One of the ways that NAEP differs from many other standardized tests is that NAEP is designed to yield a much wider picture of the subject-matter knowledge the test is intended to measure.  Many standardized tests are designed to provide an accurate picture of a particular child’s performance.  It’s efficient to do so by having all test-takers respond to the same set of test items.  If a group of fourth-graders all answer the same 45 items in a 90-minute math exam, we can learn a lot about performance on those particular items, which are chosen to be representative of the content domain they are supposed to represent (such as fourth-grade math).  But such a test would tell us little about student performance on other items that might have a different format, or address different fourth-grade math skills.  NAEP addresses this problem by having many more test items, but no child answers all of the items, because that would take hours and hours of testing time.  Instead, each child responds to a sample of the items, and the performance on these items is combined across children to yield a picture of the performance of children in general.  Testing experts such as Dan Koretz at Harvard believe that assessments such as NAEP are less vulnerable to score inflation than state assessments because it’s more challenging to engage in inappropriate test preparation when there are so many potential test items a student might respond to.  But the tradeoff is that NAEP is not designed to provide a reliable and accurate measure of performance for a particular child.   

Let’s look at what the Chancellor had to say about NAEP:

“The national tests [Ravitch] cites are not the measure of federal accountability, are given only to a small sample of schools, and are not aligned with New York State standards and therefore with what we teach in our classrooms. (That said, our fourth-grade scores on those tests are strong.)”

Not the measure of federal accountability.  The No Child Left Behind Act delegated to states the responsibility of developing systems of learning standards and assessments designed to measure progress towards universal student proficiency by 2014.  It’s true that the tests that are used to assess the performance of the New York City schools for NCLB purposes are state assessments, not NAEP.  But it is misleading to say that NAEP is not a measure of federal accountability.  The tests administered by the 50 states vary considerably in their difficulty, with some states reporting much higher rates of student proficiency than are indicated by student performance on the NAEP assessment.  In New York City, 56% of fourth-graders in 2007 were judged proficient on the New York state English Language Arts test, whereas only 25% reached proficiency on the NAEP reading assessment.  New York City and New York State are by no means distinctive in finding much higher rates of proficiency on state tests than on NAEP—many states have even larger disparities—but the unevenness of the proficiency standards across states, and the fact that state tests change frequently over time, has led Congress and the U.S. Department of Education to rely on NAEP as the primary measure of trends in the performance of American schoolchildren over time.  Moreover, Education Secretary Arne Duncan has recently advised state superintendents that they should report state NAEP performance in their state and district report cards documenting performance under NCLB.  In these ways, NAEP is very much a measure of federal accountability. 

Given only to a small sample of schools.  For the life of me, I can’t figure out why the Chancellor thinks this is relevant.  A well-designed sample will yield estimates of student performance that are unbiased and accurate, and the New York City sample is designed by leading statisticians to be representative of the population of New York City students and large enough to detect meaningful differences between New York City and other jurisdictions, as well as meaningful differences over time.  

Not aligned with New York State standards and therefore with what we teach in our classrooms.  It would seem unfair for New York City schoolchildren to spend the year studying Shakespeare, and then be assessed on their knowledge of contemporary American fiction.  In reality, the curricular content of NAEP and the New York State assessments doesn’t diverge that much.  For example, in eighth-grade mathematics, the state specifies 104 distinct standards in the arenas of problem-solving, reasoning and proof, communication, connections, representation, number sense and operations, algebra, geometry, and measurement.  (Keep in mind that these 104 standards are assessed via only 45 test items.)  The NAEP framework allocates test items to number properties and operations (20%), measurement (15%), geometry (20%), data analysis and probability (15%), and algebra (30%).  I’m not going to do a detailed comparison, but I invite readers to look at the NAEP standards and see if they represent content that you think is unimportant for eighth-graders to know.      

Our fourth-grade scores on those tests are strong.  Surely the Chancellor must know that, when a test is administered in both the fourth and eighth grade, and he claims that the fourth-grade results are “strong,” and says nothing about the eighth grade, a reasonable person might wonder about the eighth-grade results.  In fact, there have been no statistically significant gains in eighth-grade performance in New York City in either reading or math between 2003 and 2007 on the NAEP assessment, and no gains in fourth-grade reading either.  Fourth-grade scores in New York City are “strong” only in the sense that there were significant gains in fourth-grade math performance from 2003 to 2007. 

A final note:  New York City has been participating voluntarily in the NAEP Trial Urban District Assessment since 2002, so presumably the Chancellor believes that there is something to be learned from the performance of New York City’s children on the NAEP assessments.  And the Department of Education’s press office has had no qualms about crowing about NAEP results when the Department believes there is good news to share.  But a Department, and a Chancellor, truly committed to transparency would be willing to acknowledge the bad with the good, and present a balanced picture of successes and failures.  Writing off NAEP as if it doesn’t matter fails to meet that standard.

19 Comments

Subscribe to comments with RSS or TrackBack

  1. ceolaf

    Great explanation, Aaron!

    Anytime some defends their own performance or analysis by attacking the NAEP, it is clear that either they generally don’t know what they are talking about or are hiding something. Either way, nothing else they say on the matter on which they are defending themself is credible. (The same is true for statistical sampling. Attacking both is a double-whammy to credibibility.)

    This, of course, it not to say that sampling and the NAEP are invulnerable to criticism. But they are so well established and lauded that such criticism should stand on its own, so that it can get full hearing.

  2. Thanks for that lucid explanation of the NAEP. I always wondered why NCLB didn’t simply replace state benchmark assessments with a test like the NAEP; it would seem that the nifty test design not only precludes individual student performance comparison, but due to sample size it wouldn’t tell you much about school performance either.

  3. Aaron Pallas

    That’s right, KitchenSink. NCLB was designed to hold individual school buildings accountable for student performance, and that would require a complex and costly sampling design to obtain reasonably precise measures of performance on the NAEP assessments for every school. The subgroup reporting provisions of NCLB make it even more challenging to contemplate.

  4. ceolaf

    There also are serious political issues at play here.

    Conservatives generally have been wary of federal standards, which is a big reason why NCLB has states each set their own standards and tests. Liberals are a bit more likely to go for federal standards, but they haven’t had a lot of faith the kind of testing regime that NCLB has mandated.

    So, there was no natural constituency for linking a federal test to accountability provisions.

    However, after all these years of NCLB, things are shifting around a little bit. There are more calls for national standards from the right, and more calls for better testing from the left. The left does not automatically reject the idea of testing as much as it used to, and the right is more accepting of bigger role for the DOE in Washington. What used to be inconceivable now just seems really unlikely.

    This doesn’t mean that the NAEP can be used — but most people with bully pulpits don’t understand that the NAEP doesn’t work like other tests. So, there has been more of a call for its adaptation than is warranted.

    I, for one, have no idea where this is going in the next 8 years.

  5. qm

    Roughly how much greater would be the cost of implementing naep-like assessment in NYS than the cost of the current system? And what’s your stance on national standards and assessments?

  6. Aaron Pallas

    I don’t know enough to estimate the cost of implementing a NAEP-style assessment as a replacement for the current assessments in New York State. Many of the provisions of No Child Left Behind could not be met with a NAEP-style system, because NAEP does not yield precise scores for individual students, and currently doesn’t provide precise measures of the performance of individual schools. To further the goal of all students reaching proficiency by 2014, NCLB requires near-universal participation in annual assessments, and a determination of whether each student is proficient or not in English and math. NCLB also seeks to hold individual schools and districts accountable through an assessment of whether they are making Adequate Yearly Progress (AYP). Doing so requires assessing AYP for subgroups of students, including students in major racial/ethnic groups, economically disadvantaged students, limited English proficient students, and students with disabilities. NAEP cannot do this with enough precision to satisfy the NCLB provisions.

    What’s my stance on national standards and assessments? Ambivalence. The lack of national standards, and assessments of whether those standards have been met, results in an ambiguity about what a high school diploma means. Given the variability across states in standards and assessments, one can graduate from high school without knowing anything in particular. Requiring students to meet a common set of standards would remedy this.

    But I’m not sure who I trust to come up with national standards. In a democracy, the content of what is to be taught and learned in public schools is something in which all citizens have a legitimate stake. I’m wary of delegating the authority to develop national standards to elected politicians, whose interests and experiences may not mirror the remarkable heterogeneity of American society.

  7. ceolaf

    NAEP-style assessment is simply incompatible with even the most laudable goals of NCLB.

    1) Everyone loves the idea of disaggregation of student data. Rather than looking at whole-school or whole-grade averages, NCLB requires that subgroups be independently reported (i.e. all students, black, hispanic, white, asian, american indian, english language learned, migrant student, low income and special ed, as I recall). Of course, there have been issues with this approach, and certain amounts of controversy. But the idea/goal has received, as best I can tell, universal acclaim.

    Unfortunately, there is a technical problem with this approach. For statistical purposes, the more kids/subgroup, the more accurate the reporting. A subgroup with just 5 students, for example has a HUGE margin of error. The real statistical need for subgroups of at least a certain size is at odds with the moral goal of making sure that schools pay attention to their local minority groups. It’s a tension without an idea answer, leading to compromises that satisfy neither the testing/statistical expert nor the other advocates.

    2) There are a aspects to the NAEP that make it such a high quality project. One of them is that the folks who run it know that they are looking at the status of very large numbers of students as a group, not as individuals. So, even in the smallest states, there are thousands of kids in each grade.

    The NAEP does not give each kid the whole test, but rather — as Aaron explains above — just gives a faction of the kids a faction of the questions each. Because there are so many kids in the first place, each question is answered by many many children, even though no one child gets that many questions. This allows for a much larger set of questions, overall — and therefore better content coverage and greater reliability (in the statistical sense).

    3) Well, this NAEP approach makes it even harder to do sub-group reporting. The NAEP requires many more students be available to take part in testing because no one kid does every question. That would blow the minimum subgroup size needed for valid reporting — which is already set to small in most states for it to be statistically reliably while simultaneously being so large that millions of minority students’ scores are not reported as part of subgroups. (i.e. If a school’s subgroup size does not meet a certain state-dictated threshold, it is not independently reported.) This is already a problem in small schools, even without NAEP style testing.

    In other words, a NAEP style test would virtually eliminate subgroup reporting on the school level, and no amount of money can fix that.

  8. crusader

    If the NAEP is so important then why doesn’t the federal government administer it every year? I know why, they read the Constitution just like I do. There is no federal role in education. Our forefathers left that responsibility to the states thankfully. So while we talk about “national standard” hoo ha it’s all just the construct of some worthless education bureaucrats down in D.C. Who care about the NAEP — I’d rather look at the assessments my child gets every year, and those are the ones New York State gives.

  9. ceolaf

    Crusader,

    Assuming that you are being serious….

    * If the feds read the Constitution the way you do in this regard — and you are right, there is no mention of “education,” and “promote the general welfare” might not include a federal role for education — there wouldn’t be any NAEP at all. Doing it less often is no answer to unconstitutionality. So, clearly they do not read it the way you do. In fact, only a tiny minority reads it the way you do, and they are not in a position to make your reading binding.

    * There are decades of jurisprudence that leverage the interstate commerce clause into a wide variety areas. Perhaps you think that it overreaches, but you must acknowledge that it has some applications.

    * Your objection certainly is strongest when applied to federal mandates. However, the federal government does rather little with mandates in education. Usually, it is done with conditional moneys. That is, if states wants federal money for Title I (for example), they have to do certain things. But they can avoid doing those things if they simply forgo those funds. By this mechanism, the Feds are not exerting power over the states, and therefore — so the theory goes — not violating the Constitution, particularly not violating the 10th amendment.

    * This does raise significant questions about national standards, no doubt. How would the feds enforce national standards? Would they continue to link them to Title I funding? Put too many strings on Title I funding, and a large enough coalition will form to turn it down.

    * Penultimately, I think you are mistaken about the purpose of the NAEP. As a parent, the NAEP does not help you with the education of your current children. There is no way to get NAEP scores for a single child or even a single school. The reporting and analysis takes a long time — and continues for years. The kind of information you might want, as a parent, is not there there in the NAEP. Teachers concerned their students or trying to improve their own pedagogy is similarly out of luck, with regard to the NAEP.

    On the other hand, policy-makers, researchers and other education professionals often look for different kinds of information. If you believe that *serious* reform (e.g. policy change, curricular change and even pedagogical change) takes years to implement, the relative infrequency is not so much a problem — quite different than a parent’s concern about his/her child’s individual progress. More importantly, having a common high quality test across states is critical. NCLB — which obviously you don’t support — assumes that different states will have slightly different approaches. Learning from this diversity requires some kind of common measure. The same can be said of charter school school legislation and any number of other state-level educational policy initiatives. Very few states have multiple large urban centers, so comparing different approaches in those contexts requires is much aided by a common national test.

    * And last, whether or not you depend on the annual evaluations you receive from your local state should be informed by the validity and reliability of those measures. If the NAEP tells us that state testing is reporting inflated gains, that *does* apply to your children. This does not mean that the NAEP tells you about your child. Rather, it suggests that might instead look to other assessments. No doubt, your child’s school offers multiple reports a year about his/her progress. They might prove more useful to you than that state’s reports.

  10. crusader

    Ceolaf, this is where there is a disconnect between parents and educrats. I simply don’t care about national standards or how one state’s tests compare to another state’s. As far as my child is concerned, I only care about how he’s meeting New York’s standards in a way I can see each year. If my child was ever asked to take that stupid NAEP assessment I would refuse. This NAEP is something to give educrats something to talk about. It does not help kids.

  11. ceolaf

    Crusader,

    While I am sympathetic to your meaning, I think that your words could hardly be more wrong.

    The NAEP *does* help kids. It doesn’t help the kids who take it, I understand. But it *does* help kids that will follow them. The kids who took the NAEP in the past help today’s kids. It is an ongoing thing.

    This gets at that important distinction between private good and public good to which I keep referring. I understand that your child taking part in NAEP testing does not contribute to that child’s private good, but it does contribute to the larger public good. I understand that you — when thinking about your child and your child’s time — need to be primarily concerned with your child’s private good, but you also need to consider the public good. I am not saying “either/or.” I am saying “both/and,” with room for individuals to balance them differently.

    I believe — and I think that I’ve written about this on this site — that we need to think carefully about how much time/effort we demand of students for things that just make it easier for the adults. But that doesn’t mean that we don’t ask of the kids anything that does not benefit them directly. It means that we need to be careful about how often and how much.

    However, it is not reasonable to expect that your child should be exempt from contributing to the betterment of our schools.

    *********************************

    Another problem with what you have written is how you equate state standards and state testing. Have you looked state standards and the state tests? Are you sure that the latter is good reflection of the former? I support your interest in seeing your child progress and advance on state standards, certainly. But how do you check the tests?

    The NAEP is one way to do that. That is, the NAEP can help all of us tell whether the state’s tests are shifting in difficulty over time. It can tell us if gains on state tests are a reflective of teachers getting better at preparing students for the particular kinds of items on the state’s tests, rather than at teaching the underlying constructs that those item are supposed to test.

    To do that, we need NAEP results over time, which means we need kids in the past, present and future take the NAEP — or another similar test. It is in your child’s interest that we be able to do this, even if s/he does not benefit directly from his/her own participation in the test.

    **********************************

    If you are happy with the state of our schools, we probably have nothing to talk about. But if you think that they should be better — not just your child’s school but schools generally — then we need ways to measure progress. Perhaps the NAEP is not the way to do it, but your objections would apply to any substitute for the NAEP.

    **********************************

    And last, what you mean by “educrats”?

  12. crusader

    Yes, ceolaf, I am quite happy with my son’s school. I do not care about anyone else’s. This is not an unreasonable outlook. Parents only have to care about the education of their children — not anyone else’s. I’m only responsible for my children’s future. Let other parents be responsible for their own kids.

  13. ceolaf

    Crusader,

    This is you view as a parent, and whether or not it is unreasonable, it certainly is understandable.

    But what about your view as a citizen and/or a member of a larger community? Do you have any responsibility there, or is it all only about what you and yours can get out of the system?

    And what about kids who don’t have parents as active and involved as your own are lucky enough to have? As a citizen and/or member of the community, do you have any responsibility to look out for them at all?

  14. Have the demographics and familiy income levels of of New York City students changed significantly between 2001 and now?

  15. Sen. Huppenthal,

    Isn’t that a political question? I mean, the Thompson campaign is claiming that the costs of living in the city have gone up far faster than wages. How do you want to correct for that?

    I point this out to get an the important distinction between proxies and the real underlying constructs. We would expect to adjust income for inflation, but in the short or medium run that will be misleading in this kind of calculation.

    Think about it this way: Your socioeconomic status (or the things we lump in with) has indirectly helped your kids to do well in school, right? You are able to give them certain advantages — including simply speaking to them with your large vocabulary, providing a quiet place for them to do their homework and demonstrating the importance of education without even having to lecture them about it. These are good things. But if you income dropped in half for five or eight years, you’d still provide virtually all of the same advantages.

    However, someone whose income was always half of your would probably not have been providing all of the advantages.

    You see, short term change in income does not really match long term socio-economic status (a vague, though important construct). Or, you can think of the Jane Austen novels, in which the highest SES families married the highest income families (i.e. the new rich) to restore some income to their recently impoverished families.

    I think that you mean to ask if the composition of the city’s public schools has changed sufficiently in the last eight years to have an impact on expected test scores, right? I don’t know the answer to that question, and serious researchers would probably have trouble figuring out how to answer it in a truly valid and rigorous fashion.

  16. Michael M.

    Sen JH and ceolaf,

    How would we sort out the change in socioeconomics of any given family, or collection of families…. and the flat-out change in who lives here?

    But it wouldn’t surprise me if the “Bloomberg bump” were to prove out to be a combination of rising tide floating all boats (i.e. statewide rising test scores, even in cities WITHOUT mayoral control let alone Mayor Mike), and, people filling up new luxo-towers have kids who do better on tests than the previous average New Yorker’s kids.

  17. Michael M.,

    You are getting at a slightly different question than Sen. Huppenthal asked, though probably what he was trying to get at.

    That is, are different families sending their kids to NYC public schools — or in different proportions — than eight years ago? (The difference is that the senator asked about the overall demographics, whereas this question asks about comparisons between individual families.)

    Well, that’s hard to know. We’d have to be tracking those who left the system (i.e. transfers, dropout, graduations) and those coming in (i.e. transfers and matriculaters). We’d also want to compare cohorts, to see if there are changes in different grade levels — but that poses the difficulty I described above, in addition to many others.

    My point is that either way it is a hard question to answer. Are most people better off than eight years ago, the recent market collapse notwithstanding? It depends on your approach to how you would answer the question.

  18. Michael M.

    Sen JH and ceolaf,

    And another thing… Manhattan saw a spike in applicants to Kindergarten (not sure about other grades) from families who might in other years have afforded private. Sort of a subset of ceolaf’s just prior comment at 4:21 pm.

  19. My question is actually fairly important for the whole nation. New York city is being cited as a model for national school reform. From 2001 to 2008 the budget went up 70% from $12.5 B to $21 B. They implemented Everyday Mathematics and Balanced Literacy. They opened up over 65 charter schools with a stringent filter. They created a test score and graduation rate based merity pay system with some degree of individual performance evaluation integrated as at least a possibility. They brought in ED Hersch’s theories on Knowledge in at least ten campuses. They opened up 200 small high schools under the Gates foundation small schools theories. They spent a ton of money on teacher development. Essentially, they implemented every cutting edge, state of the art, central command and control theory of how to improve our education system. According to the raw NAEP scores, the gold standard, they have nothing to show for it, nothing at all. But, to scientifically evaluate the experiment, you have to adjust the scores for demographics and income. It could be that their demographics were deteriorating so fast that just staying even would be a significant accomplishment. I am a little skeptical that staying even for $8.5 billion a year can be described, in any way, as any kind of accomplishment, but I always want to be comprehensive in my analysis.

Leave a Reply

Tips, questions, feedback?

Contact us at .

Mapping the Budget Cuts

Post a comment about the budget cuts at your school on our interactive comment map. more »

Chalk It Up

Our Twitter Updates

  • That was anticlimactic: Chancellor Klein just announced that school is closed tomorrow. Go stock up on cocoa now! 4 hrs ago
  • What are odds that tomorrow will be a snow day in NYC schools? Mayor Bloomberg is holding a 1 p.m. presser to discuss the city's snow plan. 4 hrs ago
  • Citywide Council on High Schools meeting is set to proceed as scheduled, for now. Same goes for the PEP meeting rescheduled from Jan. 26. 23 hrs ago
  • From the DOE: In anticipation of inclement weather, the Specialized High School open houses scheduled for Weds. have been postponed. 23 hrs ago
  • @datadiva What do you see as the biggest changes? We're having trouble figuring out what to make of the 2010-2011 changes. in reply to datadiva 23 hrs ago

Events Calendar

Archives

February 2010
M T W T F S S
« Jan  
1234567
891011121314
15161718192021
22232425262728

GothamSchools by Email

Technology in Education

The blogroll is a work-in-progress; to be added or if you've been miscategorized, send us an email at .