What are we looking at here?
Thousands of New York City public-school teachers received a percentile ranking, plus a range of uncertainty for that ranking. In the example shown, the score of 50 indicates that the teacher is average, but could range from the 35th percentile (below average) to the 70th percentile (above average).
What is being measured?
The ratings on this page reflect the city’s effort to isolate the effect of individual teachers on student performance. In this case, the measurement is based on math and English scores on New York State standardized tests. Each teacher was assigned an “expected” score based on the past performance and demographics of his or her students. This expected score is then compared to the students’ actual test results. The difference is considered the the “value added” by the teacher.
Does a low rating mean my child has a bad teacher?
Not necessarily. These Teacher Data Reports were one component of several the city used to evaluate teachers during these years. The state is now overhauling its evaluation system: 40 percent will be based on test scores, including some value-added assessment, while 60 percent will be based on more subjective methods, such as principal observations.
Why are some ratings more precise than others?
The scores here are estimates based on a complex formula, and the estimates carry with them a margin of error. The more test scores that are incorporated in the formula, the smaller the error margin, which is why single-year ratings have a bigger margin than those calculated for a teacher’s career. Error margins, which are called the “confidence interval” in these reports, can range widely.
Which teachers and schools received ratings?
Approximately 12,000 teachers each year who teach math or English in
grades 4 through 8 in New York City public schools. In all, over the
three years the ratings were distributed, that worked out to about
18,000 teachers. Some charter school teachers were also rated, but
that data was not released by the city.
Why did SchoolBook decide to publish these evaluations?
The New York Times and WNYC, who jointly publish SchoolBook, believe that the public has the right to know how the Department of
Education is evaluating our teachers. Since the value-added
assessments were being used for tenure and other high-stakes
decisions, we sued for access to the reports. While we share some
critics’ concerns about the high margins of error and other flaws in
the system, we believe it is our responsibility to provide the
information, along with appropriate caveats and context, for readers
to evaluate.
Do these evaluations consider circumstances such as poverty or
learning disabilities?
Yes. The value-added score is calculated by comparing how well a
teachers’ students perform on a standardized test with how well a
computer model predicted they would do, given their past performance
and demographics. The teacher’s “value-added” score is the difference
between the predicted score of the students and the actual score. The
gender, income level, special education status and English Language
Learner status of the students are all factored into the prediction,
as are other factors.
Why are teachers ranked from “Low” to “High”?
The city groups teachers into five categories — Low, Below Average,
Average, Above Average and High — depending on their percentile
value-added score for a given year of test data and over their
careers. Using five categories is not mandatory; the system used by
the city in the 2007-08 school year, for example, had only three
categories, Low, Medium and High.
Why would a teacher’s score fluctuate?
In general, teachers are being rated according to the performance of
a statistically small number of students, as few as 10 in some cases.
So if a single student does poorly on a given day, it can have a big
effect on a teacher’s score. In addition, the teacher cannot control
many of the factors that go into a child’s performance– such as
whether he decided to study, or whether she slept well the night before.
When three or more years of data are available for the teacher, the
rating becomes more reliable, though it retains considerable
uncertainty.
How did the city use these reports?
The city used these so-called “value-added” ratings, which cover the
school years of 2007-8, 2008-9 and 2009-10, as part of its yearly
evaluations of teacher performance, and over time, began to ask
principals to consider them in tenure decisions. Last year, the city
calculated the ratings but did not use them. This year, the state took
over the controversial task of analyzing teacher performance in terms
of test scores, as part of a new statewide teacher evaluation system
recently agreed on.
How can teachers respond?
The Times and WNYC invite any teacher who was rated to provide her or
his response or explanation, using this form; we will display the response alongside the
numbers so readers can consider them together. If there were special
circumstances that compromise the credibility of the ratings in
particular cases, we want to know. Michael Mulgrew, president of the
United Federation of Teachers, said in an e-mail message: “The UFT
encourages teachers to comment on their reports, particularly if they
contain significant errors.” Teachers — and parents — are also invited to comment generally on the ratings system and their release, joining the conversation here.
Why did the city implement these ratings?
In 2006-07, the city began experimenting with value-added rankings of
teachers at 140 schools. The union agreed to the pilot program because
the city promised that the rankings would not be used in high-stakes
decisions and would be kept confidential. Joel I. Klein, the former
chancellor, wanted to try value-added to keep the city at the
cutting edge of the developing science of analyzing teacher
performance. As the years passed, he told his principals to begin
consulting the data in tenure decisions and evaluations, in
what the union considered a breach of faith.
Why did the teachers’ union sue to keep the names confidential?
The union argued that releasing the data would unfairly result in the
public shaming of teachers who did poorly. The union documents many inaccuracies,
including teachers who got rated for subjects they did not teach, and
said that the mathematical formula to determine the scores was
inherently flawed. The teachers’ union also said that the city had
breached an agreement to fight any release of this information to the
public.
Is this data similar to evaluation systems in other cities and states?
Value-added modeling is a growth industry. It was favored by the White
House in its Race to the Top competition, so many states have adopted
some version of value-added as part of their teacher
evaluation systems. However, the mathematical technique used to
calculate the scores differs among states, as does the weight given to
the rankings compared to classroom observations and other subjective
measures. As New York State overhauls its evaluation system, value-added scores will be 20 percent of a teacher’s annual review. Other states use them for up to 50 percent.
SchoolBook invites any teacher who was rated to provide her or his response or explanation, which we will display alongside the numbers so readers can consider them together. If there were special circumstances that compromise the credibility of the ratings in particular cases, we want to know. Michael Mulgrew, president of the United Federation of Teachers, said in an e-mail message: “The UFT encourages teachers to comment on their reports, particularly if they contain significant errors.”
