If test scores matter less in Georgia teacher evaluations, what will matter more?

Sandi Jacobs. National Council on Teacher Quality’s senior vice president, state and district policy, discusses Georgia’s debate over how much student test scores should matter in teacher evaluations.

Her view:  Student growth — as measured by test scores — provides reliable insights into teacher performance and ought to remain a key component in how Georgia evaluates teachers.

By Sandi Jacobs

In Georgia, a debate is raging over how much a student’s academic growth should factor into teacher evaluations. The state Legislature is currently considering a bill to lower the 50 percent student growth requirement to 30 percent.

???????????????????No doubt, this is a meaningful discussion to have. After all, some level of course adjusting for states that have been working to incorporate student growth in teacher evaluations seems reasonable, so long as the intent behind including a measurement of student academic growth remains. The point is that effective teachers clearly have a positive impact on student learning and achievement.

Unfortunately, the singular focus on 50 percent versus 30 percent takes attention from equally important issues. The more critical question than what is the precise right weight for student growth is whether all the evaluation components add up to the full picture of teacher performance. If the weight of student growth is to be lowered, what will be measured in its place? The current proposal offers only vague answers.

The sole focus on the weight of testing by some has created two camps in the teacher evaluation debate: those who still argue to do away with measuring student growth altogether and those who believe we should no longer be asking, “Should student growth be part of teacher evaluations?” but rather, “How can student growth best be incorporated in teacher evaluations?”

For those who continue to challenge the use of student growth, the perfect should not be the enemy of the good.  Researchers have never claimed that student growth measured by statistical tools like a student growth percentile model (used in Georgia) or value-added models (VAM) are perfect. But the information that these models add to a multiple measure system has the ability to strengthen the feedback teacher evaluations can provide.

Relevantly, following a report by the American Statistical Association critical of the use of VAM, economists Raj Chetty, John Friedman and Jonah Rockoff drafted a point-by-point response.

Chetty, Friedman and Rockoff point to studies that have shown these models directly measure teacher contributions toward student outcomes. Students of teachers with high value-added estimates were more likely to experience several positive outcomes in adulthood, like attending college and earning higher wages. Further, other studies found that when the model controls for students’ prior test scores, it does indeed capture teachers’ impacts on students.

In response to the argument that VAM scores can change substantially when a different model or test is used, Chetty and his research partners argue that when models account for students’ prior achievement, they produce similar results. The researchers readily admit that value added measures are not perfectly reliable, but also make the point that no measure is. When used along with other measures, value added estimates are far more reliable than traditional, far more subjective components of teacher evaluations, like observations.

As Georgia debates the nuances of 50 percent versus 30 percent weight on student growth, my hope is the Legislature also discusses these other, equally important questions, and that the conversation, in Georgia and nationally, further shifts from “should” to “how” when it comes to ensuring that student growth and achievement remains an integral part of the teacher evaluation process

Reader Comments 0

13 comments
jerryeads
jerryeads

Very, very, very, very clearly Sandi Jacobs doesn't know much about testing. Like most of us, she's so incredibly gullible as to believe the tests we build are accurate at measuring human performance. The paper she cites is by economists, fer pete's sake, not measurement people. The VAST majority of refereed research (the paper she cites is not refereed) finds VAM to be virtually useless.  Even the best tests in the world - the SAT and ACT are good examples - are absolutely terrible at doing the one thing they're supposed to do, which is predict first year college survival. The low-bid junk that testing companies cobble together every year for states is far, far less capable - and I'm being very kind.

But Sandi, you go ahead and keep believing against overwhelming data that a test is every bit as accurate as a thermometer. Just shove it in that kid and we'll know exactly how well taught they are.

Reducing the weight of testing to that 10 percent in the Ligons bill would have been more appropriate. I'm particularly hopeful that the publicity of the 'opt-out' section in that bill will encourage parents to keep their kids out of testing. Then regardless of what the state tries to do the data will be totally worthless anyway.

proudparent01
proudparent01

There is a vast difference between our state Milestones tests and the state mandated but district created SLOs. The SLOs are poorly designed and the growth on these assessments has very little statistical validity. It doesn't matter if there is a 50% threshold or a 30% threshold, the enormous number of tests remains the same. Further, even if you remove some Milestones tests, there won't be less testing because for many classrooms they will be replaced by SLOs. 


Yes, there is some value in using growth for evaluations but the SLOs are a huge burden for our schools and there is very little evidence that they are good tools for anything other than wasting classroom time. 

billiebrown41
billiebrown41

@ajc Try academics! Classroom lectures! Reading assignments! Group discussions! Team projects! DUH!

BrieT
BrieT

Why can't people understand that a major issue with these evaluation plans includes very real problems with validity of the tests? If the tests 1) do not measure what they are supposed to measure, and 2) they were developed too quickly and not properly vetted 3) targets for growth are not developed thoughtfully, nor are they adjusted when huge numbers of great teachers fail to meet targets, then how is it possibly fair to use them as a measure of effective teaching? 


I'm a music teacher. I helped develop some of the tests for my district and for the state, because I didn't want to passively accept whatever was handed down to us, and I wanted to try to be a voice of reason among the chaos. But because of  mandates from the state and the district we HAD to create a bubble test, which could not possibly measure what we do the most: singing, moving, playing instruments, improvising, composing, relating music to other arts/history and culture. listening & describing. 


So we're left with a test that covers a very narrow portion of a vast subject matter.  Meanwhile, any extras that have to do with the performance aspect of our jobs (ensembles, school and community performances, honor chorus/band/orchestra, Allstate, GHP, plays/shows, LGPE, and Literary) do not count toward our evaluation at all. 


Last year my students had to take tests in: Art, Music, PE, Band, Orchestra, Spanish, Milestones for ELA, Math, Science, & Social Studies, or SLOs for K-2, Challenge (gifted). They were so sick of tests by the end of the year (and we spent week after week rotating them through all of the tests and make ups for the kids who had missed) that  I am certain they didn't even bother to try. Plus we lost many, many weeks of instruction, which is brutal when some subjects like Art and Music only have the students once a week.



These are the 1994 National Standards for Music Educationupon which the Georgia Music Standards have been based since 2010. There is a new set of standards for the Arts, but Georgia has not yet adopted them. 

1. Singing, alone and with others, a varied repertoire of music. 2. Performing on instruments, alone and with others, a varied repertoire of music. 3. Improvising melodies, variations, and accompaniments. 4. Composing and arranging music within specified guidelines. 5. Reading and notating music. 6. Listening to, analyzing, and describing music. 7. Evaluating music and music performances. 8. Understanding relationships between music, the other arts, and disciplines outside the arts. 9. Understanding music in relation to history and culture. 10. Moving to music, alone and with others. 

BG927
BG927

Maybe Ms. Jacobs can explain how I, as an 8th grade teacher of physical science (essentially very basic chemistry and physics) can be evaluated for "growth" when my students scores are being compared to their 7th grade life science (basic biology) scores. Anyone who has had all three know that there is very little continuity in the content. All science teachers and social studies teachers in GA have a similar problem. Not to mention, once in high school, some teachers are evaluated by state tests that we don't see and get no feedback on. Other teachers get evaluated by district-created SLOs which are given in a pre- and post-test fashion. How is this any kind of valuable comparison or evidence of growth? One final issue - readers may not be aware that teachers have YET to get the results from LAST YEAR'S tests. I have no idea how my student's growth looks (though I do know that I had a 100% passing rate - that does not mean my growth scores will be good - also infuriating).

To those who think this is against accountability, I would say that I can't think of anyone that would be pleased with this sham of an accountability system. You wonder why enrollment is down at ed schools? This is part of the reason why.

AvgGeorgian
AvgGeorgian

 Follow the money and research the claims


http://www.dailykos.com/story/2012/6/1/1096454/-NCTQ-Unmasked

"To demonstrate that enough funding can buy exclusive rights to publish propaganda as research in the mainstream media, see thisteacher bashing piece below from the AP, which treats NCTQ as a legitimate research organization, rather than as an advocacy group in support of charter schools and the corporate attack on the teacher preparation, teacher quality, and state teacher credentialing systems[emphasis added]." 

Dunstann
Dunstann

It would be foolish to lower the amount student academic growth counts in teacher evaluations. Yet there is a small but stubborn faction which will seemingly continue to fight accountability no matter what.

athensareateacher
athensareateacher

@Dunstann Foolish is a great word to describe the current use of standardized tests in TKES and LKES scores.  Brie T's comments above outline just some of the problems.  


Previous legislators put the cart before the horse when they included standardized test results in TKES and LKES scores.  Even if research showed using standardized test scores as part of a teacher's evaluation was positive (and it is certainly mixed)...those standardized tests should be valid measurements.  This just isn't the case in Georgia.  SLO's and even the Georgia Milestones have not been tested for validity. 


Teachers don't mind being held accountable.  Multiple measures of student growth which could include portfolios, student observations and yes....formal written assessments should be considered.  I'm not sure how you would create an evaluation tool to fairly and accurately measure teachers across all subject areas.  Authentic assessment of student growth in all subject areas can not be measured with a series of bubble tests.  


Use of standardized tests only is foolish.  I'm excited our legislators are discussing this issue.  

AvgGeorgian
AvgGeorgian

@Dunstann


Tel you what. 


You get a team of  thirty, 21 year old people that live closest to your work location as your team. You are evaluated on their productivity. You can't pay them,fire them, make them come to work, or make them work.


Their productivity is 50% of your evaluation. Get back to us on how that works out for you.