Predicting differential item functioning in cross-lingual testing: the case of a high stakes test in the Kyrgyz Republic
Cross-lingual tests are assessment instruments created in one language and adapted for use with another language group. Practitioners and researchers use cross-lingual tests for various descriptive, analytical and selection purposes both in comparative studies across nations and within countries marked by linguistic diversity (Hambleton, 2005). Due to cultural, contextual, psychological and linguistic differences between diverse populations, adapting test items for use across groups is a challenging endeavor. The validity of inferences based on cross-lingual tests can only be assured if the content, meaning, and difficulty of test items are similar in the different language versions of the test items (Ercikan, 2002).
In this study, the accuracy of bi-linguals’ predictions about whether differences between Kyrgyz and Russian language test items would lead to DIF was evaluated. The items came from a cross-lingual university scholarship test in the Kyrgyz Republic. Evaluators’ predictions were compared to a statistical test of “no difference” in response patterns by group using the logistic regression (LR) DIF detection method (Swaminathan & Rogers, 1990). A small number of test items were estimated to have “practical statistical DIF.” There was a modest, positive correlation between evaluators’ predictions and statistical DIF levels. However, with the exception of one item type, sentence completion, evaluators were unable to predict which language group was favored by differences on a consistent basis. Plausible explanations for this finding as well as ways to improve the accuracy of substantive review are offered.