GothamSchools — daily independent reporting on NYC public schools

the wide view

City’s value-added initiative early entrant to evolving landscape

New York City schools erupted in controversy last week when the school district released its “value-added” teacher scores to the public after a yearlong battle with the local teachers union. The city cautioned that the scores had large margins of error, and many education leaders around the country believe that publishing teachers’ names alongside their ratings is a bad idea.

Still, a growing number of states are now using evaluation systems based on students’ standardized test-scores in decisions about teacher tenure, dismissal, and compensation. So how does the city’s formula stack up to methods used elsewhere?

The Hechinger Report has spent the past 14 months reporting on teacher-effectiveness reforms around the country and has examined value-added models in several states. New York City’s formula, which was designed by researchers at the University of Wisconsin-Madison, has elements that make it more accurate than other models in some respects, but it also has elements that experts say might increase errors — a major concern for teachers whose job security is tied to their value-added ratings.

“There’s a lot of debate about what the best model is,” said Douglas Harris, an expert on value-added modeling at the University of Wisconsin-Madison who was not involved in the design of New York’s statistical formula. The city used the formula from 2007 to 2010 before discontinuing it, in part because New York State announced plans to incorporate a different formula into its teacher evaluation system.

Value-added models use complex mathematics to predict how well a student can be expected to perform on an end-of-the-year test based on several characteristics, such as the student’s attendance and past performance on tests. Teachers with students who take standardized math and English tests (usually fewer than half of the total number of teachers in a district) are held accountable for getting students to reach this mark. If a teacher’s students, on average, fall short of their predicted test-scores, the teacher is generally labeled ineffective, whereas if they do as well as or better than anticipated, the teacher is deemed effective or highly effective.

A number of states and districts across the country already tie student performance on standardized tests to teacher evaluations; others have plans to do so. Many education reformers, including those in the Obama administration, commend the practice. States were awarded points in the federal Race to the Top grant competition for creating policies that tie student academic growth to teacher evaluations.

In Florida, by 2014, all districts must use value-added ratings for at least half of a teacher’s total evaluation score. Ohio districts will start doing so in 2013. This year in Tennessee, student test-score data will count for 35 percent of each teacher’s evaluation. Value-added ratings make up 20 to 25 percent of New York’s new teacher evaluation framework. And politicians in Nebraska and Louisiana are pushing for these measures to be included in new teacher-evaluation systems.

The new evaluations, which will generally use test-scores as one of multiple measures, including classroom observations, are increasingly being used in decisions about compensation, retention and tenure.

Advocacy groups like The New Teacher Project, now known as TNTP, and the National Council on Teacher Quality have cheered the inclusion of value-added scores in teacher-evaluation systems. In the past, most teachers were rated based on infrequent, “drive-by” principal observations that resulted in satisfactory ratings for up to 99 percent of teachers. But skeptics, including teachers unions and researchers, say that value-added models have reliability problems.

Depending on which variables are included in a value-added model, the ratings for teachers can vary dramatically, critics say. As an example, researchers at the University of Colorado examined the formula that an economist hired by the Los Angeles Times created to rate teachers there (the economist’s work was funded in part by the Hechinger Institute on Education and the Media). The University of Colorado researchers found that more than a third of L.A. Unified teachers would have had different scores if a slightly different formula had been used.

A 2010 study by Mathematica Policy Research found that the error rate for value-added scores based on three years of data was 25 percent. In other words, a three-year model would rate one out of every four teachers incorrectly. The error rate jumped to 35 percent with only one year of data. The report cautioned against using value-added models for personnel decisions, a position that other experts have echoed.

In New York City, some of the teachers whose scores were published last week received ratings based on multiple years of data, according to a 23-page technical report describing the city’s statistical formula. But other New York City teachers — a spokesperson for the city education department was unable to say exactly how many — were rated based on only one year of data.

Washington, D.C. also uses just one year of student test-scores in its statistical model. But the system that Bill Sanders, a researcher known as the “grandfather” of value-added measurement, designed for Tennessee uses five years of data in creating a score for each teacher. To ensure that elementary teachers aren’t judged based on just one or two years of test-score data, the Tennessee model takes into account a student’s performance in later years, Sanders says. For example, third-grade teachers are rated based in part on how their students do in subsequent grades.

“When any one student takes a math test, on any one day, there is a huge uncertainty around that score,” Sanders told The Hechinger Report in an interview last year. “It could be the kid got lucky this year, and guessed two or three right questions. Or the kid this morning could not have been feeling well. Consequently that score on any one day is not necessarily a good reflection of a kid’s attainment level.”

Another question that educators and researchers have debated is whether the statistical models should account for student characteristics that are linked to achievement — for example, poverty, English ability and special education status. In places like Florida and Washington, D.C., value-added models have accounted for such factors, in part because of the limitations of using fewer years of test-score data.

New York City’s model does as well. Variables include race, gender, socio-economic status, and even whole-class characteristics like the size of the class and how many students are new to the city.

Many researchers argue that adjusting for student demographic characteristics is unnecessary because the growth scores are calculated by comparing students against themselves. Sanders and others say that including student characteristics could bias the scores by making it easier for teachers of disadvantaged students to be rated more highly.

A black student, for example, might be expected to do worse than a white student in such a model, an assumption that Sanders says lowers expectations for the black student, along with the teacher who has that student in class.

In New York, high-rated teachers are evenly spread across both low-performing and high-performing schools, which experts say is partly a result of the formula’s adjustments for student demographics. Teachers with demographically similar students—whether they are low-income, minority, or have special needs—are ranked relative to one other, not the entire teaching force.

Other researchers have argued that factors like student poverty should be taken into account, however, because concentrated poverty, for example, is linked to lower student performance, suggesting that a student’s peers may affect how that student does in school and on tests. That is, a teacher who has a large number of disadvantaged students in class may have a more difficult job getting a higher rating than teachers with fewer disadvantaged students.

In an attempt to settle the question, Mathematica, the research group, is currently examining the effects of whole-class characteristics on teacher value-added ratings in a study of 30 districts across the country.

Although it gets much less attention, one of the biggest problems with value-added modeling, according to many experts, is that the ratings cover only a fraction of teachers — those whose students take standardized tests in math and English, typically in grades three through eight. As new teacher-evaluation systems go into effect in more districts and states in the next two years, many, including New York City, will be grappling with how to rate everyone else.

Rhode Island is using teacher-created goals on classroom work and tests. Colorado is planning to use off-the-shelf assessments and school-generated methods to gauge how teachers in subjects like physical education and music are performing. In Tennessee, teachers without value-added ratings are graded in part on how the teachers who do receive ratings in their school perform. And Florida is creating more tests, one for every subject and grade level down through kindergarten.

Harris calls Florida “an example of what not to do.” Given the problems with value-added modeling, no matter which formula is used, he suggests that the best uses of the ratings might not be to make decisions about hiring, firing and tenure. Instead, they can be used to give low-rated teachers more training or principal observations, rather than pink slips.

This story was produced by The Hechinger Report, a nonprofit, nonpartisan education news outlet affiliated with Teachers College, Columbia University.

  • nuff said

    And when all was said and done the margin of era ranged as high as 75-87%. No business could run with this data and certainly no school should be subjected to this nonsense. Of course you can expect academia and the Administration to defend the data because they spent millions and millions for the formula, the study , gathering the data and disseminating it–Millions and soon Billions!!!

  • bee

    I think the educational landscape shaped by know nothing “reformers” (Bloomberg among them) would best be described as devolving. If the Bloomberg landscape is to be considered “evolving,” then it is evolving into an intellectual wasteland, not a “higher” or more righteous landscape.

  • http://www.facebook.com/people/Leonie-Haimson/1094324158 Leonie Haimson

    Thanks for this summary; though there are many other problems w/ the NYC VAM that you did not mention. 

    Can  you explain why the Hechinger Report funded the LAT study that was found to be so erroneous?  Isn’t that a somewhat strange role for a news gathering outfit, to pay for a study done for another media outlet, estimating individual teacher’s scores by name,whose results were later released to the public to great controversy? 

  • SickofBloomberg

    This article uses a lot of words to say something simple (a growing and disturbing trend in education):  The value added model is a glorified guess based on theoretical mathematics.  In middle school, we teach the difference between experimental and theoretical probability.  Evidently, the education officials and others supporting using VAM to evaluate teachers believe that theory is a relible method for determining whther someone keeps their livelihood.  I wonder, should we apply the VAM to all the politicians backing its use to see  if they keep their jobs and benefits?

  • Sand_dela

    Why did the NY unions (UFT, NYSUT) agree to this in the first place? Why didnt teachers rise up in protest then (May, 2010)? Why did UFT teachers accept that their leader went to Albany, did an amazing 180 on his previous stance of opposing this hokey, harmful practice, and came back with a pig covered in lipstick to sell to the members? All for a Race to the Top booby prize. Well, Wall Street’s Hedge Fund operatives certainly got what they wanted with this RttT win, which they pushed for with propaganda and millions of dollars.

    Why do members still believe in this leadership? 

  • Vote NO!

    The  unions  agreed  to  it,  because  they  don’t  fear  that  their  members  will  do  anything.  They’re  very  comfortable,  and  not  worried  about  being  voted  out  of  office.   Why  should  they  care if  teachers  get  fired?  New  teachers  will  be  hired,  and  they  will  be  forced  to  join  the union,  and  pay  dues.

  • jteach

    This value added and any way that you rate teachers on test scores is a crap shoot.  Unfortunately the reformers like Bloomberg will call it a victory when they are able tofire teachers in 2-3 years, whether they are good or bad.  I hope that we can all survive till retirement.  Teaching has become a demoralizing profession where society will forever continue to blame us as long as Bloomberg and other rich snobs blame them for the earnings gap in the U.S.

Tips, questions, feedback?

Contact us at .

Follow GothamSchools

RSS

Chalk It Up

Recent Comments

1 comment so far today

Our Twitter Updates

Archives

May 2012
M T W T F S S
« Apr  
 123456
78910111213
14151617181920
21222324252627
28293031