This is a lengthy account of Georgia’s new teacher evaluation system by a teacher in a Georgia school system. Rated as highly effective in the past, the teacher helped develop the End of Course Test in his discipline.
In this piece, the teacher reports on a recent training session in which the presenter said teacher evaluations last year were too high and administrators were being taught how to downgrade their ratings. The teacher also shares his experience with the new Milestones test.
He provides a fascinating and detailed look at an evaluation system that seems inconsistent, arbitrary and ultimately self-defeating. This approach does not seem designed to improve Georgia teachers, but to drive them out the door.
I am sharing it at a longer-than-usual blog length as it helps explain why teachers are concerned about linking their evaluations to student test scores, as sought by Gov. Nathan Deal.
Here is the teacher’s account:
Recently, my colleagues and I were introduced to a “Deeper Look at Instructional Practice and TKES.” The two-hour presentation started with a short history of TKES and how it originated with House Bill 244 during the 2013 legislative session.
The law creates a single, statewide evaluation system for teachers of record (the teacher whose name appears on the transcript as the teacher of a particular class). The law requires student achievement be tied to the teacher’s professional evaluation. Student achievement must contribute at least 50 percent to the teacher’s Teacher Effectiveness Measure or TEM score. The law requires that student achievement be weighted more heavily than observations. The purpose is to optimize student learning and growth, improve the quality of classroom instruction, and support the continuous growth of teachers.
This diagram was provided to explain how a TEM score is calculated. Teachers are scored on one of four Levels I, II, III, IV, with IV being exemplary:
A quick glance at the chart shows there only seven boxes in which a teacher is considered “proficient” or “exemplary.” There are nine boxes in which a teacher might be considered “needs development” or “ineffective.”
A question was immediately raised by a teacher: “How are we supposed to succeed when there are more opportunities to do poorly than to do well?”
The response of the presenter: “I advise you to contact your legislator, as I am only the messenger.”
The conversation quickly moved on to the real reason for our meeting that day. There was a general lack of consistency for the 2014-2015 school year between ratings for teachers and ratings for student achievement. Teachers were consistently rated at Level 3 or proficient. But, student performance on standardized tests was on average at Level 2 or needs development.
The state Department of Education wants teacher ratings and student ratings to align. Therefore, the solution was to teach administrators how to rate teachers lower. Rather than address the reasons why students do poorly on tests, the quick fix is to downgrade the teachers who clearly failed to do their jobs.
Administrators were trained over a three-day period earlier in the school year, so it was only fair that teachers be informed of what to expect. “It wasn’t economically feasible to train every teacher in the state of Georgia on the TKES platform, so the state decided to teach the raters (administrators) first and worry about the teachers later,” said the presenter.
To quote the presenter, “We are going to build the plane while we fly it.”
And finally, “The state wants to make sure we get this right, so we pushed back the full implementation of TKES until 2016-2017. Your results won’t begin to count against you until 2017.”
The point is raters (administrators) need to give more 2s in their ratings of teachers and fewer 3s Teachers should shoot for Level 3 in their practice, but the reality is that you can’t be there all the time. Teachers shouldn’t take it personally when they are given 2s on their observations, but rather see it as an opportunity to grow in professional practice.
One teacher asked, “Is it possible to get a Level 4 (exemplary rating)?”
The presenter responded: “The language of the standards states a Level 3 teacher consistently performs a certain action, and that a Level 4 teacher continually performs that action. So consider a medical analogy: If you take a vitamin daily, but you forget to take it on vacation or the occasional Sunday morning, then it’s safe to say you take that vitamin consistently, but the only real way to take it continually is to be hooked up to an IV bag.”
Several other questions were asked but the answers were all similar: No system is perfect; they don’t really count against you yet; contact your legislator.
The remainder of the meeting was divided into two parts. There was an in-depth presentation on TKES standard 4: Differentiated Instruction, and standard 3: Instructional Strategies. Finally, the presentation addressed how raters would collect evidence of teacher effectiveness and how administrators were trained to share the results with teachers through “mediated thinking” strategies.
The keys to addressing the TKES standards 3 and 4, according the presenter, were in lesson planning.
A lesson plan form was shared with us that would be the delight of any college professor. In short, we needed to write into daily plans how the lesson was differentiated for the different ability levels in our classroom and provide administrators with a “look for” guide as to our efforts to address all learners. If the administrator is not provided with a “look for” list, then the highest a teacher could be scored is Level 2. The thinking was that if you can’t show it up front, you don’t do it. All this in addition to lesson plans to reference GPS standards, lesson content, etc.
Also, all lessons will now be required to “engage students cognitively.” The presenter explained the difference between “compliance” and “cognitive engagement” as this: “Students sitting in desks in rows, with books on their desks, doing seat work, following the lead of the teacher are simply engaged in a compliance activity and are learning nothing. This can be scored no higher than Level 2. Students who are active, talking, moving around the classroom, working in groups, or engaged in hands-on activities are the ones who are cognitively engaged.”
That seemed plausible to me, but I kept thinking back to the earlier analogy regarding a level 4 rating. A teacher asked, “What about cognitively engaging students who stayed up all night, came to school on drugs, or just don’t care about school?” The answer was, “If you teach at the highest possible level, then the test scores and everything else will take care of itself, you won’t reach every student.”
The collection of evidence didn’t seem out of the ordinary to me. Just like college professors who observed me as a student teacher back in the 1900s the administrators are asked to script what they see throughout a lesson and to remove bias from their observations. Removing bias seems difficult, if not impossible, but the state DOE has ordered it, so it shall be done.
The final few moments of the meeting dealt with how an administrator is supposed to kindly share with a teacher the downgraded marks they are to receive. The process is called “mediated thinking.” The process involves several steps: The administrator will remain neutral during the evaluation meeting. Collected evidence is to be used as the “third point of conversation.” Use this as an opportunity to “shine a light” on the teacher’s thinking about instruction and evaluation and teach them the proper way to view them. Push teachers to self-reflect on their practice and move themselves from (Needs Development) to (Proficient) and from (Good) to (Great).
As a teacher, I would like to offer some thoughts/notes of my own to what I saw that day. My first exposure to TKES came in July of 2014 when all teachers were given an orientation by the system director of curriculum. We were informed student growth was paramount to any individual teacher’s evaluation by an administrator. Teachers were going to be graded on a four point system.
A Level 1 will get you fired. A Level 2 will get you put on probation and another Level 2 after that will get you fired.
Most teachers live in Level 3 land and sometimes they visit Level 4. Do not expect 4s and shoot for 3s. Hopefully, this will go away before anyone gets fired. I survived year one with all 3s and 4s.
Fast forward to this year and the message seems to be there were too many 3s and not enough 2s and 4s. Schools can be rewarded for giving teachers 4s if student achievement matches that rating, but we won’t know that until after the fact. It sounds like the new normal is to expect 2s. But I think back to summer of 2014 the message was too many 2s will get you fired.
Student achievement tied to test scores is a real problem. I know a bit about the state’s End of Course Tests. I worked on the creation of my subject area’s EOCT. I worked as a representative of the Georgia Department of Education with Pearson Educational Measurement and attended more than 20 meetings in Atlanta that usually lasted three days at a time. We developed the testing specifications, divided the testing domains, looked over 8,000 multiple choice questions word by word.
We looked at data from questions that were field tested and kept and rejected questions based on that data. Finally, we normed test data and gave Pearson a benchmark by which to score every student in the state. This was an eight-year process beginning to end. I was there every step of the way, usually serving on teacher committees of 12 members or fewer. I used what I learned to my advantage, and I consistently had the highest EOCT scores in my county from 2004 until 2014.
I’m not saying I used information illegally or unethically, but when a teacher knows the test inside and out it’s pretty easy to teach a student how to play the game. Sometime in 2014, Pearson Educational Measurement lost its Georgia contract and was replaced by CTB/McGraw-Hill.
I was immediately chosen by them to serve on the same types of committees I had served on before. I learned the new Milestones test for the first few years would be identical to the tests that Pearson had given because McGraw Hill didn’t have field tested items yet so they couldn’t create their own version of the test.
The EOCT created by Pearson was property of the state of Georgia so they would use those questions for the 2014-2015 EOC and just grade it to a new set of criteria. I was thinking no problem, same test, same questions, my students should do well.
When last year’s scores finally came out in November I was shocked my pass rate had dropped from the usual 90 percent or higher to 29 percent. I had changed schools in between, same county, but I didn’t feel like my new students were that much worse off than any other group I had ever taught, and surely my teaching ability didn’t decline that much in one year.
What I noticed was that in order to pass the Milestone at a proficient level a student had to score 80 or higher. Students who scored in the 70s were considered “needs development” or “Level II.”
In the past students were rated as Exceeds Expectations, Meets Expectations, or Does Not Meet Expectations. Now students are rated on a 4-point scale just like teachers, and, just like teachers, a Level 2 is not a good thing. The real problem with the test scores is that a student can pass my course and receive the Carnegie unit for a score of 70 or higher. But, my rating as a teacher will suffer if the same student doesn’t score above 80.
I had one student in particular that came in early every day for two weeks leading up to the EOC last year. She was a solid B student but she was worried about the EOC. I worked with her and she pulled out a 79 on the EOC.
The EOC counting as 20 percent of her final average was waived since scores did not come back in time, but even without the waiver she would have kept her B average in my class with a 79 on the EOC. The results from that situation are the student is happy she kept her B, she performed as well as she was ever going to on the EOC, and the teacher receives a rating of Level 2 because she didn’t score 80 or higher. It’s scary to think that 50 percent or more of my evaluation depends on that number considering it was the same test students have been taking for a decade, it’s just graded much tougher now.
I’m no expert in mathematics, but statistically I know most people are average, and I know the bell curve says that most students will score around 75 on any valid exam. It seems unfair to move students ahead in the system for being average, but to punish the teacher when an average student doesn’t score above average on a standardized test.