Dr. Angelika Pohl is founder and president of the Atlanta-based Better Testing & Evaluations.
She formerly worked with the state Department of Education as a senior research and evaluation associate. She also worked with a large testing company developing tests.
High-stakes tests have indeed turned out to have high stakes: they have caused seasoned educators to be sent to prison.
How could this happen? The main problem with high-stakes tests is that they are shrouded in secrecy.
Teachers are not allowed to see the tests they have to give their students, the very tests that have serious consequences for these teachers and their students.
Of course, let’s not kid ourselves: teachers do look at the tests to see what their students are being asked — and they are all too often appalled. They see test questions with poor grammar, confusing wording, and misspelled words; they see trivial and ambiguous questions; they see flawed graphs and visually confusing charts; they see many questions about facts and concepts that they have not taught; they see questions testing reading skills that bear no resemblance to any authentic texts that the student might have read in real life. And the list of flaws goes on.
When teachers see these flaws, they cannot speak up because that would reveal that they violated test security regulations. A few years ago a teacher in a metro Atlanta school system dared to publicize and critique a test question; he was punished for breaching test security — never mind questioning test quality.
No wonder teachers have doubts about the validity of these tests which have recently become high-consequence measures of their competence as teachers. No wonder that some teachers treat the tests in as sloppy a way as the tests appear to have been written.
Tests are not inherently bad. It is quite possible to write test questions and answer choices that most people would agree are fair measures of what a student has learned. It is possible to write questions that do not have any of the flaws mentioned nor other flaws. But it costs money. And expertise.
I did not grow up wanting to be a testing expert, but it happened. After finishing my doctorate I got a job with a big test development company in Massachusetts that produced teacher certification tests. They hired me because I was an eclectic type that seemed to know a little about a wide range of fields and enjoyed everything from philosophy to statistics.
It was impressed upon me from day one that questions on these tests had to be absolutely flawless, so that they would stand up in a legal challenge to their validity. Teachers could be denied a livelihood on the basis of these tests, so the tests had to withstand closest scrutiny.
Before a question (usually termed an “item”) could appear on a test, it was subjected to numerous reviews. Three or four editors would tweak and fine-tune the wording and a copy editor would further subject the item to various tests of factual and linguistic accuracy.
Once items were deemed flawless, they were presented to experienced teachers for careful scrutiny and extensive discussion. As a result, many items were thrown out or revised. The approved new items were then slipped into official tests for testing, i.e., as dummies to determine if they met certain statistical criteria for validity and reliability.
Only if they met these criteria would they appear on a test to actually count toward a final score. To produce a single valid test question was an elaborate, lengthy, and very expensive process. But these tests were a recognized measure of whether or not a teacher candidate had sufficient knowledge of a given content area to be certified to teach it.
This is not the process that test questions go through before they appear on a test that has high stakes for students and now indirectly, for teachers.
When I moved to Georgia I began work with the Georgia Department of Education and was given responsibility for implementing the then new high school graduation tests. The department had contracted with a test development firm to write the test questions.
When the contractor submitted tests for my approval, I was appalled. Items had all the flaws listed above. I would send items back for revision, but that rarely resulted in great improvement. It became clear to me that we were receiving first drafts, rather than carefully edited items.
My years of training for precision and clarity would not let me accept these items so I spent truly countless hours editing them myself. This effort was not appreciated at the department. My colleagues and my director were of the opinion that items could just be thrown into a pilot test, submitted to hapless tens of thousands of students across Georgia, and then checked for statistical results.
If the stats met established criteria, the item was a go. Not human readers but psychometrics had the last word. To be sure, statistical validity is a necessary criterion, but it is by no means a sufficient one.
The statistics do not tell you whether a test question is written clearly, without grammatical or usage flaws, whether it makes sense, or has any educational value whatsoever. (Yes, teacher committees also look at and approve test questions, but it is a quick, pro forma process. Teachers are hurried through and not invited to engage in any discussion of the questions. Rarely is an item revised or rejected.)
Most of the state-level standardized tests given to students these days are poorly constructed. Contractors don’t want to spend the money to develop carefully constructed items; bureaucrats are intimidated or enamored by the psychometrics and lack editorial and pedagogical sensitivity.
As a result, the tests are crude measures of learning and do not invite the respect of students or teachers. No wonder they are dismissed as unfair and a waste of time. High-quality tests are possible, and sorely needed, but the higher-ups need to know and care enough to insist on them.