With teachers going to jail, we’ve raised stakes in testing. But we haven’t raised quality.

Dr. Angelika Pohl is founder and president of the Atlanta-based Better Testing & Evaluations.

She formerly worked with the state Department of Education as a senior research and evaluation associate. She also worked with a large testing company developing tests.

By Angelika Pohl

High-stakes tests have indeed turned out to have high stakes:  they have caused seasoned educators to be sent to prison.

How could this happen? The main problem with high-stakes tests is that they are shrouded in secrecy.

In telling the story of test cheating in APS, did other, more positive stories get lost? (AJC File)

Do we have any assurances our high-stakes tests are good tests?  (AJC File)

Teachers are not allowed to see the tests they have to give their students, the very tests that have serious consequences for these teachers and their students.

Of course, let’s not kid ourselves: teachers do look at the tests to see what their students are being asked — and they are all too often appalled. They see test questions with poor grammar, confusing wording, and misspelled words; they see trivial and ambiguous questions; they see flawed graphs and visually confusing charts; they see many questions about facts and concepts that they have not taught; they see questions testing reading skills that bear no resemblance to any authentic texts that the student might have read in real life. And the list of flaws goes on.

When teachers see these flaws, they cannot speak up because that would reveal that they violated test security regulations.  A few years ago a teacher in a metro Atlanta school system dared to publicize and critique a test question; he was punished for breaching test security — never mind questioning test quality.

No wonder teachers have doubts about the validity of these tests which have recently become high-consequence measures of their competence as teachers. No wonder that some teachers treat the tests in as sloppy a way as the tests appear to have been written.

Tests are not inherently bad. It is quite possible to write test questions and answer choices that most people would agree are fair measures of what a student has learned. It is possible to write questions that do not have any of the flaws mentioned nor other flaws. But it costs money. And expertise.

I did not grow up wanting to be a testing expert, but it happened. After finishing my doctorate I got a job with a big test development company in Massachusetts that produced teacher certification tests. They hired me because I was an eclectic type that seemed to know a little about a wide range of fields and enjoyed everything from philosophy to statistics.

It was impressed upon me from day one that questions on these tests had to be absolutely flawless, so that they would stand up in a legal challenge to their validity. Teachers could be denied a livelihood on the basis of these tests, so the tests had to withstand closest scrutiny.

Before a question (usually termed an “item”) could appear on a test, it was subjected to numerous reviews. Three or four editors would tweak and fine-tune the wording and a copy editor would further subject the item to various tests of factual and linguistic accuracy.

Once items were deemed flawless, they were presented to experienced teachers for careful scrutiny and extensive discussion. As a result, many items were thrown out or revised. The approved new items were then slipped into official tests for testing, i.e., as dummies to determine if they met certain statistical criteria for validity and reliability.

Only if they met these criteria would they appear on a test to actually count toward a final score. To produce a single valid test question was an elaborate, lengthy, and very expensive process. But these tests were a recognized measure of whether or not a teacher candidate had sufficient knowledge of a given content area to be certified to teach it.

This is not the process that test questions go through before they appear on a test that has high stakes for students and now indirectly, for teachers.

When I moved to Georgia I began work with the Georgia Department of Education and was given responsibility for implementing the then new high school graduation tests.  The department had contracted with a test development firm to write the test questions.

When the contractor submitted tests for my approval, I was appalled.  Items had all the flaws listed above. I would send items back for revision, but that rarely resulted in great improvement. It became clear to me that we were receiving first drafts, rather than carefully edited items.

My years of training for precision and clarity would not let me accept these items so I spent truly countless hours editing them myself. This effort was not appreciated at the department. My colleagues and my director were of the opinion that items could just be thrown into a pilot test, submitted to hapless tens of thousands of students across Georgia, and then checked for statistical results.

If the stats met established criteria, the item was a go. Not human readers but psychometrics had the last word. To be sure, statistical validity is a necessary criterion, but it is by no means a sufficient one.

The statistics do not tell you whether a test question is written clearly, without grammatical or usage flaws, whether it makes sense, or has any educational value whatsoever. (Yes, teacher committees also look at and approve test questions, but it is a quick, pro forma process. Teachers are hurried through and not invited to engage in any discussion of the questions. Rarely is an item revised or rejected.)

Most of the state-level standardized tests given to students these days are poorly constructed. Contractors don’t want to spend the money to develop carefully constructed items; bureaucrats are intimidated or enamored by the psychometrics and lack editorial and pedagogical sensitivity.

As a result, the tests are crude measures of learning and do not invite the respect of students or teachers.  No wonder they are dismissed as unfair and a waste of time. High-quality tests are possible, and sorely needed, but the higher-ups need to know and care enough to insist on them.

 

 

Reader Comments 0

82 comments
Batters Box
Batters Box

it wasn't poorly designed tests that led to this scandel anymore than bad security is justification for a bank robbery. These were acts of greed for profit the RICO laws address this behavior correctly, the jury deliberated and gave their verdicts.  They used a legitimate organization for their illegal acts turning it into a criminal enterprise for personal profit.

The stupidity of the ones that didn't plead guilty and accept responsibility after being found guilty must think everyone is stupid and deserve the extra time. These are no longer teachers they are now  rightfully convicted felons. Adult education classes has now begun.

BPorter
BPorter

BB, I understand your anger at the teachers and adminstrators who put their careers ahead of duty, but with respect I think the bank robbery/security analogy is flawed.

Bank robbers elect to steal for profit. The APS personnel had a testing system thrust upon them. Results to be delivered or pay the consequences. A moral and ethical failure for sure, but more akin to a classical tragedy than a tawdry robbery. If the flawed testing hadn't been employed, fewer would have paid the consequences.

One line of ethical reasoning is: a good thing that they got tried with temptation. Weeded out the bad apples. However, another view is: under better circumstances more of those apples might have productively ripened and wouldn't have fallen prey. I honestly don't have a decisive answer to this. Do you? I lean toward the latter but suspect the truth is mixed and complex.

"Bad apples" calls to mind explanations of Abu Ghraib. That was the official line at the outset. Then it became apparent that systemic failures allowed those apples free rein. The easiest thing is to blame a few individuals and close the book. I think that would be a big mistake in this case also. You have to address the underlying problems or continue to live with consequences.

Regardless, I think interjecting the APS trial is a bit of a red herring. Dr Pohl's thrust was the content and quality of the testing employed. I don't think she was bashing tests per se as a few have implied. I know she wasn't justifying the actions of cheaters. The APS scandal should be a wake-up call to address the underlying problems, not feel satisfied that it's all fixed now that a few teachers may (after years of appeals) serve jail time. The students deserve better.

DrPohl
DrPohl

Allow me to clarify: I do not defend the teachers who cheated. I think their cynical systematic falsification of student records is shocking and deeply distressing. They deserved to be punished, though we might disagree on the wisest way to punish them.

My intent was to point out that seriously flawed tests generate a disrespect that can lead SOME educators to make unethical choices. But it was still their choice.

BPorter
BPorter

In response to several who have emphatically stated that high-stakes testing did not lead to cheating by teachers, let me invite you to reconsider your logic.

Suppose that I visit a dangerous neighborhood, drive an expensive car, flash a wad of cash, and wear a Rolex watch. And I get mugged. Did my actions lead in any way to the consequences? Or does responsibility fall solely upon those with criminal intent?

Regardless of what some would prefer to believe, reality is seldom simple and multiple causitive factors are the norm. When laws are badly written and unfairly enforced, contempt for the law becomes more widespread - and more people are willing to break the law. Similarly, poorly designed tests, badly administered, with results that bear significant consequences for teachers and administrators *will* lead to a greater number willing to game the system or cheat the students.

Is that an excuse for the actions of those individuals? Hardly. But simply jailing the perpetrators and walking away with a feeling of satisfaction will not solve the problem. Changes are needed at many levels, and the validity of the testing procedure appears to be significant. Hence it was partly responsible for the sentences imposed on APS personnel. That is not an evasion of responsibility, rather a fact that needs recognition. Which was clearly Dr. Pohl's point.

PJ25
PJ25

This article is the typical liberal response to any question regarding accountability, it's everyone's fault but my own. 

smithmc
smithmc

@Andy123  And your response is the typical conservative response to factual information.  Attack the messenger and ignore the substance.

Batters Box
Batters Box

@smithmc @Andy123 Andy youhave no substance this article had no substance al I heard was the same ole song ..the neck bone connected to the shoulder bone, the shoulder bone connected to the arm bone...and blah blah blah They got what they deserved as the judge said you made your beds now lay in them.

newsphile
newsphile

These teachers are going to jail because of their actions.  No matter how bad the tests, no matter who else did what, these individuals were found guilty by a jury of their peers who listened to evidence for months.  They were sentenced by a judge who gave them one more last-ditch opportunity to plea.  Let it go!!!!!

EdJohnson
EdJohnson

With teachers going to jail, we’ve raised stakes in testing.  But we haven’t raised quality.”

That says it all in nutshell.

Still, to add…  Perhaps now the understanding that Bush’s and Obama’s testing and "accountability" and "governance" and “career and college ready” ideology does not substitute for the need to continually improve the quality of our nation’s public education systems.  But, on the other hand, the new APS superintendent Meria Carstarphen and mostly new school board, with TFA’er Courtney English as board chairman, will soon turn the district into a Charter System that will reinvent the whole testing and "accountability" and "governance" mess all over again, much like simply pouring old wine into a new bottle and calling it a “New Direction.” 

Turning APS into a Charter System will not do a damn thing to improve public education quality in Atlanta.  Turning APS into a Charter System simply means the mostly Atlanta "African American" community wants to keep on allowing and enabling the latest "African American" savior of APS to keep on robbing the mostly "African American" children of development with a quality education.  The children will develop; that is inevitable.  But the children’s development with a quality education is not inevitable.  Who will go to jail for that immeasurable outcome?

Prediction: Atlanta will be last in the nation to wake to and move away from Bush’s NCLB and Obama’s “Race to the Top Competition.”

Batters Box
Batters Box

@EdJohnson Until the parents themselves lead lives by example not just lip service of the value of education these children will never benefit. Success breeds success losers make more losers.  

staylor649
staylor649

Thank you for saying what needs to be said about tests in Georgia. I've noticed test flaws for years, including questions so ambiguous I couldn't even answer them, much less my fourth graders. The respect and credence these tests get is way beyond what they deserve. In addition, there is a bias against poor children, because of the background knowledge required to answer many of the questions. With milestones coming tomorrow, I'm prepared for my annual outrage at the whole process. It's defeating.

Batters Box
Batters Box

@staylor649 There are many children around the world would love a chance to have an education and some have died in various parts of the world trying to do just that. Put a lot of the blame where it belongs on the parents themselves and the culture of stupid. 

Happy Hippie
Happy Hippie

I worked for many years as a professional proofreader and one publisher I worked for published standardized tests for several states. The errors were horrible, but any corrections I made for anything other than grammar were ignored. Questions were poorly worded and difficult to understand. Some didn't make sense. Some didn't even have a correct answer listed (I remember one question that asked what 3.14 was and the only choice that was remotely close was "pie." My correction of "pi" was not accepted and it was still incorrect on the final draft.) It is extremely upsetting to me that the people developing these tests care so little about the resulting product. They are deeply flawed.

DrPohl
DrPohl

I can relate to what you're telling. There was the time I pointed out a clear error but it would not be corrected because "that would mess up the statistics."

DawgDadII
DawgDadII

"High-stakes tests have indeed turned out to have high stakes:  they have caused seasoned educators to be sent to prison."


I am sorry, but I stopped reading RIGHT THERE. "High-stakes tests" did NOT repeat NOT repeat NOT cause these people to go to jail, dishonest behavior (on steroids, figuratively speaking) landed these people in jail.


If this is the lead-in to the article, the article isn't worth my or anyone else's time.

Quidocetdiscit
Quidocetdiscit

As a teacher, I use the "sample test questions" provided by the state to help prepare my students.  Oddly enough, many of the "sample questions" provided for the MILESTONE tests are same as the ones that used to be provided for the CRCT, even though this is supposed to be a "new" test...


In addition, there are several errors in the questions which is always amusing because the students tend to find them.  It is embarrassing to explain that the "state" is apparently not "Smarter than a Third Grader".  And these are the same people who are in charge of creating the "high stakes" tests that evaluate our schools, our students, and ourselves!

ScienceTeacher671
ScienceTeacher671

@Quidocetdiscit


Quid, back last summer when they were still saying the tests would count for this year, I asked a GaDOE representative (whom I will not name here) how they were going to give a test that hadn't been field tested or validated, and I was told, "We have plenty of field-tested questions from the old tests."


My conclusion was that the tests will be pretty much the same, but the cut scores will be changed to make the tests APPEAR "more rigorous."  


You can draw your own conclusions.

MaryElizabethSings
MaryElizabethSings

"And the list of flaws (of high-stakes tests) goes on."

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


This is true.  I once served on a committee to proof and edit new test items, so I know from experience.


In addition, as I have often stated, standardized tests cannot compare - with validity - the progress (or lack thereof) of the students of one teacher with another teacher's, nor one school's score results with another school's, nor one school system's scores with another school system's scores because the variable of different student populations will be forever be present which keeps valid comparisons from being made.


Moreover, standardized testing should ONLY be used to diagnose where individual students are functioning, where they individually should be placed, and on what level of curriculum instruction each should be taught irrespective of grade level demarcations.

HollyJones
HollyJones

Let's play a game.  We are going to change the way everyone in this country is held accountable for their jobs.  Let's say you are a sales manager.  You have 25 sales people in your division.  Your boss picks a random day, say June 16, and looks at how much your sales force sold on that one day, and ONLY that one day. Not what YOU did on that one day, but what 25 other people did. If your sales group doesn't meet whatever "standard" is determined by the higher-ups (and you don't know what that standard is), you will be demoted, or take a pay cut, or be fired.  Now, how many folks want to work under those conditions?  I would guess not many, because it's a stupid way to assess anyone in any field. 


But, we have been told over and over again that is is an effective way to hold teachers and schools "accountable." Thanks, NCLB.    If your child's teacher told you s/he would be grading your child on only one test for the entire year, would you be happy about that?  Of course not. Then why are we allowing it to be done to our teachers and schools?  And who really believes it is an accurate depiction of any school or teacher?  


Yes, testing is necessary.  Yes, it is here to stay.  But it is being used in totally inappropriate ways.  No test was ever intended to determine if a teacher is good at his/her job.  That's what classroom evaluations are for.  But, those take a lot of time and if a teacher is found lacking, the paperwork and follow up for the "professional development plan" was so daunting, that most admins I knew were loath to give a "needs improvement" to anyone.  I think you might almost need a full time evaluator to do that job really effectively.  But, I'm sure many here would call that "throwing more money at the problem."  And we mustn't do that, apparently.  

booful98
booful98

@HollyJones A number of years ago, I worked at the Rich's corporate office here in Atlanta. I was a financial analyst. As such, I had ZERO control over sales. And yet, 10% of my performance evaluation every year was based on meeting sales expectations. The only thing I could personally do to affect sales was to spend money myself. I wasn't in marketing, I wasn't in advertising, I wasn't a merchant buyer, I didn't work in any store ever. I was a bean counter. And yet, there it was: 10% of my evaluatin was based on sales.


I am sorry, but I cannot find much sympathy for teachers these days.

Quidocetdiscit
Quidocetdiscit

Rather like those teachers whose evaluations are based upon students they do not even teach...and 10%?  Try 50% written into Georgia law and then include "evaluation surveys" done by seven year olds about how well you do your job. 

Quidocetdiscit
Quidocetdiscit

@booful98 @Quidocetdiscit


Although you may not discuss results or worry much about them, I can assure you the schools and teachers do... that is because half of our evaluations are now tied to those test scores.  For our principals, it is even worse.  I believe it is 70% of their evaluation. (Any administrator want to chime in?) Schools are judged by those scores within the system... so although YOU may not be aware of it, district superintendents put pressure on principals over lower than expected scores, and they in turn put pressure on teachers.  There is also federal funding tied to those scores.  The use of test scores to evaluate teachers so heavily is new, thanks to the current state government - and there are those pushing to have our pay tied to those scores as well.  


In areas like yours, one wonders WHY there is even a need for such testing, but it is still being pushed.   The costs to your local schools are in the thousands.   That is your tax money being used to "fix" a problem that does not exist.  People need to start fighting back!  

booful98
booful98

@Quidocetdiscit I think he real issue here is we all have to deal wit being evaluated by things that are not under our control.

I'll tell you one thing: around where I live,most people don't really worry about testing because this is a middle-upper class area and most of the kids do well. We don't discuss results, our schools do well enough. We have no idea how any particular teacher got "graded". But we ALL know who the good ones are and we ALL know who the bad ones are. It is always obvious and testing has nothing to do with it.

JBBrown1968
JBBrown1968

Papatart, 


As smart as you are. You should use your real name as some of the rest of us do. What are you really hiding?  I AM NOT ASHAMED THAT MY TYPING AND GRAMMAR SKILLS SHOULD BE BETTER. IT IS MY FAULT! My teachers were hard working kind people. I was not engaged as a young man. What the hell is your excuse for all the teacher bashing? 


JBBrown1968
JBBrown1968

Udunut,


Teddy K is your moral compass.....So tell me please! What is worse?   Cheating or killing? Mary Jo Kopechne ring a bell? Teddy is your kind of man for sure! You are very disturbing! My name is on this post. Feel free to post yours troll.

straker
straker

Who is going to pay these teacher's legal bills for their appeals?

Starik
Starik

@straker The taxpayers, if these "educators" claim to be indigent.

BCW1
BCW1

Great article and she is absolutely correct!!

MaureenDowney
MaureenDowney moderator

From a teacher involved in editing test questions:


I had the opportunity to revise and edit one of the benchmark tests that DeKalb paid Pearson to write a few years ago ($1.8 million in RTTT funds for all of the tests combined). Between the other teacher and myself, we rejected about 40% of the questions as not meeting the GPS, being worded very poorly, confusing graphics, etc. 

Some of the corrections were made, some weren't. It is the first time I ever saw questions written by a company that make such an impact on my job and it bothered me. I don't know if McGraw - Hill is any better, but somehow, I doubt it

MaureenDowney
MaureenDowney moderator

I wanted to add a neighbor was also involved in this process for DeKalb and said the questions were not aligned. This task ended up dominating her work life as there was so much to do.

jerryeads
jerryeads

Bravo, Angelika.

A kindred spirit, but I'll sadly note we never got to work together. Angelika was cursed with both competence and a conscience, a fatal combination these days when working testing in a state department. There are VERY few people who actually have training in measurement, and they are FAR better rewarded in the private than in the public sector. Hence, few people with said measurement expertise actually ever end up in state departments. Angelika, prepared (as was I) in a number of areas in addition to measurement, was a rare bird indeed

As I oft remind my students in my policy classes, state testing has one thing in common with the space program. LOW BID. BKendall, rest assured that reality is no different for Milestones. To some extent that's not "the fault" of the staff downtown - as was my staff in Virginia, "We were asked so often to do more with less it eventually came to pass that we were expected to do everything with nothing."

Let's hope that the new state superintendent, as he "restructures" his agency, will pay most attention to (1) whether those in the testing shop (not to mention all others) actually care about kids and how their actions affect them and (2) that the folks (who care about kids) actually have demonstrable formal expertise in measurement. As Angelika suggests, it takes expertise AND dedication AND a conscience to create decent tests. Let's also hope said superintendent and his board have paid attention to the mountain of evidence over many decades showing that whenever a measure is used to punish or reward people, that measure becomes meaningless (e.g., APS scandal).

BKendall
BKendall

I appreciate Dr. Angelika Pohl’s comments about the high school graduation tests. 

I do wish she had addressed the low assessment standards for the now extinct GHSGT.Could the standards have been low to compensate for the issues she addresses?

It would also be interesting to read her thoughts about the psychometrics of Georgia Milestones, Profiling students statewide to determine teacher effectiveness.

DrPohl
DrPohl

The "cut scores" (what students had to attain to pass the tests) were ridiculously low. Cut scores were set by teacher committees -- and of course they didn't want their own students to fail the tests.

liberal4life
liberal4life

The state should release all test items after each administration - this should be required of all tests, whether it is a state-wide standardized test or a teacher-made test given in a local school.

DrPohl
DrPohl

The state of Texas does (or did - I havent checked into it lately) publish all their tests. It makes test development costly, but it buys credibility.

popacorn
popacorn

In honor of all students, past and present, who have been 'taught' by enabled/cheating/incompetent teachers, I propose that this blog be renamed: 

Get Fooled

dg417s
dg417s

I had an instructor when I was at Georgia State who also worked for DOE. He was telling about a test question on one CRCT that was a field test item (fortunately) asking about the shape of a football. It went to show how biased some questions can be. That was just one example, but it was one example too many. Think back also to the CRCTs that were thrown out in Social Studies a few years ago - one question asked which image was a picture of currency from Chad. Frankly, who cares? How is that important to the students or the world in general? Which Georgia Performance Standard would that satisfy?

RealLurker
RealLurker

I am not denying that there are issues with the tests that students were taking.  However, the first sentence is blatantly false.  The tests did not "cause" anyone to go to prison.  Many thousands of teachers and administrators were using the same tests and DID NOT BREAK THE LAW.


The article shows good evidence that the tests were faulty, but does not in any way excuse the bad behavior of people.

dg417s
dg417s

@RealLurker Not condoning cheating, but something needs to be done about the testing process over all.

LookbeforeIleap
LookbeforeIleap

@RealLurker 

Almost word for word what I had intended to post.

It bears repeating, the educators who were sentenced to jail yesterday, were sentenced for THEIR bad acts, not any global issues with the testing process.

If the tests and the process for developing them are flawed, then you change the tests and the process, not the answers that are given by the students.

LookbeforeIleap
LookbeforeIleap

@dg417s @LookbeforeIleap @RealLurker 

That would involve changing the process.

The reason they don't show the tests to the teachers, is to preserve the integrity of the testing and to prevent the teachers from "teaching the test". 

dg417s
dg417s

@LookbeforeIleap @RealLurker Again, not condoning, but how do we change the tests when we aren't allowed to know what's on the test and making sure that it matches our standards?

RealLurker
RealLurker

@dg417s You do pose a good question.  The tests should be scrutinized and should at least do a decent job of indicating proficiency in the materials.


My question would be about the "Again, not condoning" part of your statement.  This article could have stated the questions about the tests without leading with "High-stakes tests have indeed turned out to have high stakes:  they have caused seasoned educators to be sent to prison."  The "tests" did not cause anyone to go to prison.  Their bad acts did.  The headline of the article is "With teachers going to jail, we’ve raised stakes in testing".  The headline sounds like the teachers went to prison because their students did poorly on the tests, not because they broke the law.


There are valid points in the article regarding validity of previous tests.  However the thesis of the article is that teachers went to prison because the tests are invalid.  It is going to be difficult to have a productive discussion about the tests, if the starting point is that those APS personnel who were convicted are the victims, so the tests should be changed.

dg417s
dg417s

@RealLurker @dg417s I don't think it was right for the teachers to do what they did and they should be punished accordingly (I think the sentences here were a bit extreme, but that's beside the point). I also don't agree about "teaching to the test" but if the test matches the standards, and teachers teach students the standards, then there shouldn't be a problem. The problem is that if the tests don't match the standards, how is it fair to hold either students or teachers accountable to the results?

RealLurker
RealLurker

@dg417s I agree with you on all of those points.  I would even add some:  How is it fair to hold a teacher accountable if the student's were socially promoted and are not ready for the material that they should be learning,  How is it fair to hold a school accountable if the parents and community refuse to support education.  These and many more should be taken into account.


My main issue is the way that such issues are usually presented.  Many times people rail against objective assessments.  If money is accepted from State and Federal agencies, objective assessments are necessary.  Such assessments should be used to compare schools and systems.  In schools and systems that are not keeping up, there should be an effort to find the reasons and attempt to correct them.  It might be community reasons outside of the school.  If so, that should be made public and there should be an attempt to remedy it.  That would be the correct use of such tests.  In this case, the headline and thesis of the article blame the tests for the conviction of the APS personnel.  It is not the fault of the test that these people broke the law.  The attempt to use their conviction to argue against the tests will backfire with many people.


IF the goal of the tests is to better education, they should be an accurate indicator of how students are progressing.  They should be one of many indicators used to identify systems, schools, and teachers who are and are not effective.  Once ineffective systems, schools, and teachers are identified, there should be an effort to improve them, but it should be through a systematic manner.

RafeHollister
RafeHollister

So, it is the test's fault or the fault of the test creators?  Teachers ethics can be excused because the test was poorly constructed?  


So, if the bag boy at the supermarket crushes your fresh loaf of bread, you can just go get another one off the shelf and walk out of the store.  If caught, its not your fault?