7:17 p.m. | Updated After a long legal battle and amid much anguish by teachers and other educators, the New York City Education Department released individual performance rankings of 18,000 public school teachers on Friday, while admonishing the news media not to use the scores to label or pillory teachers.
Teacher Data Reports
Search for your school to view the recently released teacher data reports.
The reports, which name teachers as well as their schools, rank teachers based on their students’ gains on the state’s math and English exams over five years and up until the 2009-10 school year. The city released the reports after the United Federation of Teachers exhausted all legal remedies to block their public disclosure.
The reports are now available on SchoolBook, posted on the individual pages for the elementary and middle schools whose teachers’ ratings were released. You can search for a school by using the search module on the left.
At a briefing on Friday morning, an Education Department official said that over the five years, 521 teachers were rated in the bottom 5 percent for two or more years, and 696 were repeatedly in the top 5 percent.
But citing both the wide margin of error — on average, a teacher’s math score could be 35 percentage points off, or 53 points on the English exam — as well as the limited sample size — some teachers are being judged on as few as 10 students — city education officials said their confidence in the data varied widely from case to case.
“The purpose of these reports is not to look at any individual score in isolation ever,” said the Education Department’s chief academic officer, Shael Polakow-Suransky. “No principal would ever make a decision on this score alone and we would never invite anyone — parents, reporters, principals, teachers — to draw a conclusion based on this score alone.”
Chancellor Dennis M. Walcott also underscored the need to use the individual rankings cautiously.
“I don’t want our teachers disparaged in any way, and I don’t want our teachers denigrated based on this information,” Mr. Walcott said. “This is very rich data that has evolved over the years. As Shael has indicated, it is old data and it’s just one piece of information. And so I don’t want our teachers characterized in a certain way based on this very complex rich tool that we have available to us.”
Nevertheless, the data is ripe for analysis. One fact shared by the Education Department: Many of the teachers included in the database are no longer working in city schools.
Officials said 77 percent of the 18,000 who received reports were still employed by the Education Department, but of those who remained, many had moved on to administrative jobs or teach subject areas or grade levels that were not included in the reports.
For example, the teacher who was rated most highly, based on his scores for the 2009-10 school year, is now an assistant principal at another school, according to his online profile. His rating encompassed only one year of data and was based on 32 students’ test scores.
The data was handed to the news media on CDs, which contain spreadsheets listing teachers’ scores for the 2007-08, 2008-09 and 2009-10 school years. Roughly 12,000 teachers were given teacher data reports each year.
Charter school and special education teachers were not included; the city says it will likely release their rankings on Tuesday.
The teacher rankings began as a pilot program four years ago to improve instruction in 140 city schools. It has turned into the most controversial set of public school statistics to be released by the Bloomberg administration: individual rankings for roughly 18,000 math and English public school teachers from fourth through eighth grades.
The teachers have already seen their reports, which have been distributed for the past several years. But parents and the rest of the public can now learn how individual teachers performed, based on how they improved their students’ test scores, and with that they will begin the difficult and emotional reckoning of comparing what they personally know about a teacher to what a set of statistics tells them.
In the larger picture, the ratings could bolster or undermine a school’s reputation, and validate or discredit convictions long espoused by Mayor Michael R. Bloomberg, like his belief that small schools are better than large ones and that length of service is not necessarily a predictor of strong performance in the classroom.
The rankings, known as teacher data reports, are supposed to rate teachers across a performance scale. They are at the core of a national effort to assess, compensate and dismiss teachers based in part on their students’ test results.
In some school districts, the rankings have become an important component in decision-making about teachers. New York City principals have been using the rankings to help make tenure decisions. Houston gave bonuses based on rankings, though the district eventually restructured that program. In the Washington school district, poorly rated teachers have lost their jobs.
New York has become only the second city in the country where teachers’ names and ratings have been publicized. In 2010, The Los Angeles Times published its own set of ratings, in spite of fierce opposition from the local teachers’ union. Thousands of people flocked to the newspaper’s Web site to check the rankings, though, and Arne Duncan, the United States education secretary, praised the effort, saying, “silence is not an option.”
The rankings stem from a desire by policy makers to find an objective way to distinguish between effective and ineffective teachers, untainted by the subjective judgment of individual evaluators, like school principals.
Yet there is considerable uncertainty about the reliability of the data. That is one reason the public release of these scores — which will mark thousands of teachers with a label that they fear will become shorthand for their performance as a whole — is so controversial.
The rankings are also known as value-added assessments. In simple terms, value-added models use mathematical formulas to look at the past and forecast the future. A computer predicts how a group of students will do in next year’s tests using their scores from the previous year and accounting for several factors, like race, gender and income level. If the students surpass the expectations, their teacher is ranked at the top of the scale — “above average” or “high” under different models used in New York City. If they fall short, the teacher receives a rating of “below average” or “low.”
The rankings can be developed only in grades in which state exams are given, and leave out those who do not teach fourth through eighth graders and anyone who teaches a subject other than math, English or both subjects.
In New York City, a curve dictated that each year 50 percent of teachers were ranked “average,” 20 percent each “above average” and “below average,” and 5 percent each “high” and “low.” Teachers received separate reports for math and English. Principals also received a general report placing teachers’ names in a graphic according to their performance rankings.
Critics of the rankings point to their many deficiencies and caveats. One of them is that the higher teachers rank one year, the harder it is for them to sustain their high ranking by showing significant progress in students the next year.
The data are also more than a year old and based on test scores that have been somewhat discredited.
Critics also say there are aspects of a child’s life — or distractions on test day — that the numbers cannot capture: supportive parents, a talented principal, the help of a tutor, allergies or a relentlessly barking dog outside the classroom.
Students can also change classes during the year, and teachers who have them in their classroom for less than a full year can nevertheless be assessed on those students’ performances. Then there are schools where students are taught by multiple teachers, making it difficult to figure out the weight of their individual contributions.
Statisticians try to acknowledge these uncertainties by attributing wide margins of error to teachers’ scores — as much as 54 out of 100 points in the city. Still, they warned they can be only 95 percent sure a ranking is accurate.
The release of the individual rankings has even been controversial among the scientists who designed them. Douglas N. Harris, an economist at the University of Wisconsin, where the city’s rankings were developed, said the reports could be useful if combined with other information about teacher performance. But because value-added research is so new, he said, “we know very little about it.” Releasing the data to the public at this point, Dr. Harris added, “strikes me as at best unwise, at worst, absurd.”
And the United Federation of Teachers has pointed out numerous mistakes made by the city in individual rankings. In one case, a teacher received a ranking for a semester when she was on maternity leave. In other cases, teachers who taught English were ranked for teaching math.
At the briefing on Friday, Mr. Polakow-Suransky, the chief academic officer, said that last year the city created a Web site where teachers could verify that the reports contained information based on students and classes they actually taught. About 37 percent of teachers with reports entered the site and reviewed three years’ worth of class rosters. City officials said 3 percent of these teachers discovered that their reports were based on classes they never taught. And on average, one correction was made per report.
Most teachers who received reports did not try to correct them, and officials said it was possible that errors remained on their teacher data reports.
Officials also warned that because of the statistical model the city used, teachers’ scores were less reliable if they were assigned to schools where students were extremely high performing or low performing. Because the state’s math and English exams were designed to distinguish a proficient student from one below grade level, Mr. Polakow-Suransky said, they are too blunt a measure to track small amounts of progress made by students at the very ends of the spectrum.
In one example he gave, a teacher whose students score well above average on the state’s English test two years in a row but get a few more questions wrong the second year could see her value-added score drop by 50 percentage points.
The State Education Department is taking some of the report’s problems into account when designing its own value-added measurement, he said.
“We do know how to fix this, but these data reports and the models they use were created before that fix was identified,” he said.
In remarks leading up to the reports’ release, Mr. Walcott expressed concern that the individual rankings, once made public, would be used to highlight individual teachers and hold some up to ridicule or shame — a point also made by Bill Gates in an opinion article in The New York Times on Thursday.
But Mr. Walcott and his predecessor, Joel I. Klein, have defended the value of the ratings system, saying the data give administrators a more objective look at teacher performance — though the rankings are not meant to be used in isolation, but rather in combination with other kinds of evaluations, like principal observations, they have said.
In that way, the release of the rankings offers a peek at the future: under the agreement reached this month between the state, the city and the teachers’ unions, the state’s new evaluation system will base 20 percent of teachers’ ratings on the rise — or fall — of students’ test scores. And if teachers are ranked “ineffective” on that portion of their assessment, they must be rated “ineffective” over all.
The push to release the individual rankings began in August 2010, when New York City education officials contacted the reporters who most closely cover the city’s public schools and encouraged them to submit Freedom of Information Act requests for the teachers’ rankings. Until then, the city had refused to release the names with the rankings, citing issues of privacy.
On the eve of the rankings’ release, the teachers’ union filed a lawsuit. The city has acknowledged the reports are not perfect, but one of the judges who ruled on the case as it made its way to the state’s highest court said imperfection was no reason to hide them.
Last week, after the union lost its last appeal, the city announced the rankings’ release.
The teachers’ union president, Michael Mulgrew, said that teachers and parents “deserve more than judgments based on bad tests, incorrect data and flawed methodology.” The union has warned that the result will be sweeping, with good teachers steering clear of grades that have standardized tests, parents’ attempts to switch their children to other classrooms, low morale among teachers and worse.
Proponents, however, believe that even with all their flaws, the reports have value because they can highlight effective teachers and put poor teachers on notice.
SchoolBook plans to publish the ratings as soon as possible. Readers will be able to enter school names and teachers’ names into online databases to see how individual teachers scored.
SchoolBook intends to publish the rankings as part of a list of teachers at the affected city schools.
We have invited teachers to post responses to their own ratings, which we will publish alongside the rankings, and the teachers’ union has encouraged teachers to do so, “particularly if they contain significant errors.”
You can learn more about the rankings, as well as SchoolBook’s use of them, in our FAQ.
SchoolBook will post updates throughout the day with analysis of the overall data, focusing on, among other areas, differences among teachers in the best- and worst-performing schools, across experience levels and in various neighborhoods.
Anna M. Phillips of SchoolBook contributed reporting and writing. Beth Fertig of WNYC and Robert Gebeloff of The Times’s computer-assisted reporting group, contributed reporting.
Correction: An earlier version of this post included incorrect information about the percentage that state tests will contribute to teachers’ evaluations under the new state evaluation system agreed to earlier this month.