By Morgan Smith
Summary by PEN Weekly NewsBlast
"In research that shakes the foundation of high-stakes test-based accountability, Walter Stroup of the University of Texas at Austin and two colleagues believe they've found a glitch in the DNA of the Texas Assessment of Knowledge and Skills (TAKS) that renders it "virtually useless" at measuring the effects of classroom instruction, The New York Times reports. The flaw stems from a statistical method used to assemble the tests. Pearson, which has a five-year, $468 million contract to create the state's tests through 2015, uses "item-response theory" to devise standardized exams, as other testing companies do. Using I.R.T., developers select questions based on a model that correlates students' ability with the probability they will get a question right. That produces a test Mr. Stroup said is more sensitive to how it ranks students than to measuring what they have learned. That design flaw also explains why students' scores on the previous year's TAKS test were a better predictor of performance on the next year's TAKS test than the benchmark exams were. Benchmark exams are developed by districts, the TAKS by the testing company. Gloria Zyskowski of the Texas Education Agency said Mr. Stroup's comments reflect "fundamental misunderstandings" about test development, and saw no evidence of a flaw in the test. "