This technical report summarizes the results of a study in which we examined the technical adequacy of five potential measures for algebra progress monitoring. One hundred three students (14 of whom were receiving special education services) completed two forms of a Basic Skills measure, two forms of an Algebra Foundations measure, one form of a Content Analysis-Constructed Response measure, two forms of a Translations measure, and two forms of a Content Analysis-Multiple Choice measure administered over two data collection sessions. Each probe data collection session was repeated to investigate the test-retest reliability of the measures. In addition, we gathered data on criterion variables including grades, overall grade point average, teacher ratings of student proficiency, and scores on district-administered standardized tests, as well as a measure of algebra aptitude. We examined both test-retest and alternate form reliability for both single probe scores and aggregated scores (computed by averaging two individual scores). Criterion validity was examined by computing correlations between students’ single and aggregated scores on the probes with their scores on other indicators of proficiency in algebra. We found that four of the five measures produced effective distributions of student scores, with no signs of floor or ceiling effects. On the Translations probe, students produced nearly as many incorrect responses as they did correct responses, suggesting a high rate of guessing on that measure. The test-retest and alternate form reliability of single probes ranged from .4 to .9, with most coefficients in the .4 to .6 range. Aggregating scores from two probes produced slight increases in the reliability of the probes, with most correlations ranging from .5 to .7. For both single probes and aggregated scores, test-retest reliability coefficients exceeded those obtained for alternate form reliability. Neither the single nor the aggregated probes consistently produced reliability coefficients above the .80 level that represents a standard benchmark. Criterion validity coefficients were also lower than those obtained in previous research (Foegen & Lind, 2004). Coefficients were generally in the low range (.2 to .4); the exception to this pattern was for the Iowa Algebra Aptitude Test, which was more strongly related to the algebra progress monitoring measures (coefficients in the .3 to .5 range). The Content Analysis Constructed Response, the Algebra Foundations, and the Content Analysis-Multiple Choice measures produced the strongest relations with the criterion measures, with lower relations obtained for the Basic Skills and Translations measures. Concerns were identified with difficulty of scoring the Content Analysis-Constructed Response probes efficiently and accurately, which will likely limit the viability of this measure in applied settings. Issues for future research are identified.
Foegen, Anne; Olson, Jeannette R.; and Perkmen, Serkan, "Reliability and Criterion Validity of Five Algebra Measures in Districts B and C" (2005). Project AAIMS Technical Reports. 1.