TitleComparability of examinee proficiency scores on computer adaptive tests using real and simulated data
NameEvans, Josiah Jeremiah (author), de la Torre, Jimmy (chair), Camilli, Gregory (internal member), Penfield, Douglas (internal member), Pashley, Peter (outside member), Rutgers University, Graduate School - New Brunswick,
Computer adaptive testing
DescriptionIn measurement research, data simulations are a commonly used analytical technique. While simulation designs have many benefits, it is unclear if these artificially generated datasets are able to accurately capture real examinee item response behaviors. This potential lack of comparability may have important implications for administration of computer adaptive tests (CAT) which display proficiency-targeted items to examinees. In addressing this problem, this study sought to compare results from real testing data to that of simulated data to determine the extent to which simulated data are an accurate representation of real-world testing data. Specifically, this study matched real examination data from multiple administrations of the Law School Admission Test to create a single large dataset with 534 items and 5,000 synthetic examinees. From this dataset examinee proficiency estimates and item parameters were obtained, which were used to create 100 simulated item response datasets. Both real and simulated data were utilized in two post-hoc testing formats: CAT and linear format examinations. The CAT administrations used the item-level adaptive method; the linear tests were constructed by selecting items using stratified random sampling. In addition to the two data types and two test administration formats, the impact of three varying test lengths (25, 35, and 50 items) on proficiency estimation was examined. For linear tests, results demonstrated that replication of original proficiency estimates from simulated data was variable, depending on test length, items selected, and examinee proficiency levels. Randomly constructed linear tests with extreme item parameter values resulted in test instability which yielded less accurate proficiency recovery. For most datasets, CAT format tests yielded improved true proficiency recovery as compared to their linear test counterparts. Generally, the longest length 50-item CAT simulated data tests yielded the best replication of original real data proficiency estimates. CAT format tests performed well given real or simulated data, whereas linear tests displayed more performance variation compared to their CAT counterparts. The tails of the distributions showed the greatest variation between data types and conditions. The results of this dissertation support the use of simulated data when the items used to construct the tests reflect non-extreme item parameter values.
NoteIncludes bibliographical references (p. 104-108)
Noteby Josiah Jeremiah Evans
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.