Examinee Characteristics and their Impact on the Psychometric Properties of a Multiple Choice Test According to the Item Response Theory (IRT)

  • D. Almaleki Department of Evaluation, Measurement, and Research, Umm Al-Qura University, Saudi Arabia


The aim of the current study is to provide improvement evaluation practices in the educational process. A multiple choice test was developed, which was based on content analysis and the test specification table covered some of the vocabulary of the applied statistics course. The test in its final form consisted of 18 items that were reviewed by specialists in the field of statistics to determine their validity. The results determine the relationship between individual responses and the student ability. Most thresholds span the negative section of the ability. Item information curves show that the items provide a good amount of information about a student with lower or moderate ability compared to a student with high ability. In terms of precision, most items were more convenient with lower ability students. The test characteristic curve was plotted according to the change in the characteristics of the examinees. The information obtained by female students appeared to be more than the information obtained by male students and the test provided more information about students who were not studying statistics in an earlier stage compared with students who did. This test clearly indicated that, based on the level of the statistics course, there should be a periodic review of the tests in line with the nature and level of the course materials in order to have a logical judgment about the level of the students’ progress at the level of their ability.

Keywords: item response theory, item characteristics, multiple-choice, psychometric properties


Download data is not yet available.


B. Zhuang, S. Wang, S. Zhao, and M. Lu, "Computed tomography angiography-derived fractional flow reserve (CT-FFR) for the detection of myocardial ischemia with invasive fractional flow reserve as reference: systematic review and meta-analysis," European Radiology, vol. 30, no. 2, pp. 712-725, Feb. 2020. https://doi.org/10.1007/s00330-019-06470-8

Y. A. Wang and M. Rhemtulla, "Power Analysis for Parameter Estimation in Structural Equation Modeling:A Discussion and Tutorial," in Advances in Methods and Practices in Psychological Science, California, USA: University of California, 2020. https://doi.org/10.31234/osf.io/pj67b

H. Zhu, W. Gao, and X. Zhang, "Bayesian Analysis of a Quantile Multilevel Item Response Theory Model," Frontiers in Psychology, vol. 11, Jan. 2021, Art. no. 607731. https://doi.org/10.3389/fpsyg.2020.607731

M. R. Szeles, "Examining the foreign policy attitudes in Moldova," PLOS ONE, vol. 16, no. 1, 2021, Art. no. e0245322. https://doi.org/10.1371/journal.pone.0245322

D. Almaleki, "The Precision of the Overall Data-Model Fit for Different Design Features in Confirmatory Factor Analysis," Engineering, Technology & Applied Science Research, vol. 11, no. 1, pp. 6766-6774, Feb. 2021. https://doi.org/10.48084/etasr.4025

D. Almaleki, "Empirical Evaluation of Different Features of Design in Confirmatory Factor Analysis," Ph.D. dissertation, Western Michigan University, MC, USA, 2016.

C. S. Wardley, E. B. Applegate, A. D. Almaleki, and J. A. Van Rhee, "A Comparison of Students' Perceptions of Stress in Parallel Problem-Based and Lecture-Based Curricula," The Journal of Physician Assistant Education, vol. 27, no. 1, pp. 7-16, Mar. 2016. https://doi.org/10.1097/JPA.0000000000000060

C. Wardley, E. Applegate, A. Almaleki, and J. V. Rhee, "Is Student Stress Related to Personality or Learning Environment in a Physician Assistant Program?," The Journal of Physician Assistant Education, vol. 30, no. 1, pp. 9-19, Mar. 2019. https://doi.org/10.1097/JPA.0000000000000241

A. C. Villa Montoya et al., "Optimization of key factors affecting hydrogen production from coffee waste using factorial design and metagenomic analysis of the microbial community," International Journal of Hydrogen Energy, vol. 45, no. 7, pp. 4205-4222, Feb. 2020. https://doi.org/10.1016/j.ijhydene.2019.12.062

N. M. Moo-Tun, G. Iniguez-Covarrubias, and A. Valadez-Gonzalez, "Assessing the effect of PLA, cellulose microfibers and CaCO3 on the properties of starch-based foams using a factorial design," Polymer Testing, vol. 86, Jun. 2020, Art. no. 106482. https://doi.org/10.1016/j.polymertesting.2020.106482

K. M. Marcoulides, N. Foldnes, and S. Grønneberg, "Assessing Model Fit in Structural Equation Modeling Using Appropriate Test Statistics," Structural Equation Modeling: A Multidisciplinary Journal, vol. 27, no. 3, pp. 369-379, May 2020. https://doi.org/10.1080/10705511.2019.1647785

M. D. H. Naveiras, "Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical versus Hierarchical Bayes Estimation," Ph.D. dissertation, Vanderbilt University, Nashville, TN, USA, 2020.

M. N. Morshed, M. N. Pervez, N. Behary, N. Bouazizi, J. Guan, and V. A. Nierstrasz, "Statistical modeling and optimization of heterogeneous Fenton-like removal of organic pollutant using fibrous catalysts: a full factorial design," Scientific Reports, vol. 10, no. 1, Sep. 2020, Art. no. 16133. https://doi.org/10.1038/s41598-020-72401-z

W. van Lankveld, R. J. Pat-El, N. van Melick, R. van Cingel, and J. B. Staal, "Is Fear of Harm (FoH) in Sports-Related Activities a Latent Trait? The Item Response Model Applied to the Photographic Series of Sports Activities for Anterior Cruciate Ligament Rupture (PHOSA-ACLR)," International Journal of Environmental Research and Public Health, vol. 17, no. 18, Sep. 2020, Art. no. 6764. https://doi.org/10.3390/ijerph17186764

C. Shin, S.-H. Lee, K.-M. Han, H.-K. Yoon, and C. Han, "Comparison of the Usefulness of the PHQ-8 and PHQ-9 for Screening for Major Depressive Disorder: Analysis of Psychiatric Outpatient Data," Psychiatry Investigation, vol. 16, no. 4, pp. 300-305, Apr. 2019. https://doi.org/10.30773/pi.2019.02.01

C. W. Ong, B. G. Pierce, D. W. Woods, M. P. Twohig, and M. E. Levin, "The Acceptance and Action Questionnaire - II: an Item Response Theory Analysis," Journal of Psychopathology and Behavioral Assessment, vol. 41, no. 1, pp. 123-134, Mar. 2019. https://doi.org/10.1007/s10862-018-9694-2

A. Acevedo-Mesa, J. N. Tendeiro, A. Roest, J. G. M. Rosmalen, and R. Monden, "Improving the Measurement of Functional Somatic Symptoms With Item Response Theory," Assessment, Aug. 2020, Art. no. 1073191120947153. https://doi.org/10.1177/1073191120947153

J. Xia, Z. Tang, P. Wu, J. Wang, and J. Yu, "Use of item response theory to develop a shortened version of the EORTC QLQ-BR23 scales," Scientific Reports, vol. 9, no. 1, Feb. 2019, Art. no. 1764. https://doi.org/10.1038/s41598-018-37965-x

Y. Liu and J. S. Yang, "Interval Estimation of Latent Variable Scores in Item Response Theory," Journal of Educational and Behavioral Statistics, vol. 43, no. 3, pp. 259-285, Jun. 2018. https://doi.org/10.3102/1076998617732764

U. Gromping, "Coding invariance in factorial linear models and a new tool for assessing combinatorial equivalence of factorial designs," Journal of Statistical Planning and Inference, vol. 193, pp. 1-14, Feb. 2018. https://doi.org/10.1016/j.jspi.2017.07.004

P. J. Ferrando and U. Lorenzo-Seva, "Assessing the Quality and Appropriateness of Factor Solutions and Factor Score Estimates in Exploratory Item Factor Analysis," Educational and Psychological Measurement, vol. 78, no. 5, pp. 762-780, Oct. 2018. https://doi.org/10.1177/0013164417719308

X. An and Y.-F. Yung, "Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It," SAS Institute Inc., Paper SAS364-2014.

K. Coughlin, "An Analysis of Factor Extraction Strategies: A Comparison of the Relative Strengths of Principal Axis, Ordinary Least Squares, and Maximum Likelihood in Research Contexts that Include both Categorical and Continuous Variables," Ph.D. dissertation, University of South Florida, Tampa, FL, USA, 2013.

D. L. Bandalos and P. Gagne, "Simulation methods in structural equation modeling," in Handbook of structural equation modeling, New York, ΝΥ, USA: The Guilford Press, 2012, pp. 92-108.

J. C. F. de Winter, D. Dodou, and P. A. Wieringa, "Exploratory Factor Analysis With Small Sample Sizes," Multivariate Behavioral Research, vol. 44, no. 2, pp. 147-181, Apr. 2009. https://doi.org/10.1080/00273170902794206

J. D. Kechagias, K.-E. Aslani, N. A. Fountas, N. M. Vaxevanidis, and D. E. Manolakos, "A comparative investigation of Taguchi and full factorial design for machinability prediction in turning of a titanium alloy," Measurement, vol. 151, Feb. 2020, Art. no. 107213. https://doi.org/10.1016/j.measurement.2019.107213

G. Kuan, A. Sabo, S. Sawang, and Y. C. Kueh, "Factorial validity, measurement and structure invariance of the Malay language decisional balance scale in exercise across gender," PLOS ONE, vol. 15, no. 3, 2020, Art. no. e0230644. https://doi.org/10.1371/journal.pone.0230644

M. J. Allen and W. M. Yen, Introduction to measurement theory. Monterey, CA, USA: Cole Publishing, 1979.

O. P. John and S. Srivastava, "The Big Five trait taxonomy: History, measurement, and theoretical perspectives," in Handbook of personality: Theory and research, New York, NY, USA: Guilford Press, 1999, pp. 102-138.

S.-H. Joo, L. Khorramdel, K. Yamamoto, H. J. Shin, and F. Robin, "Evaluating Item Fit Statistic Thresholds in PISA: Analysis of Cross-Country Comparability of Cognitive Items," Educational Measurement: Issues and Practice, Nov. 2020. https://doi.org/10.1111/emip.12404

H. Bourdeaud'hui, "Investigating the effects of presenting listening test items in a singular versus dual mode on students' critical listening performance," in Upper-primary school students' listening skills: Assessment and the relationship with student and class-level characteristics, Ghent, Belgium: Ghent University, 2019.

D. M. Dimitrov and Y. Luo, "A Note on the D-Scoring Method Adapted for Polytomous Test Items," Educational and Psychological Measurement, vol. 79, no. 3, pp. 545-557, Jun. 2019. https://doi.org/10.1177/0013164418786014

J. Suarez-Alvarez, I. Pedrosa, L. Lozano, E. Garcia-Cueto, M. Cuesta, and J. Muniz, "Using reversed items in Likert scales: A questionable practice," Psicothema, vol. 30, no. 2, pp. 149-158, 2018.

J. P. Lalor, H. Wu, and H. Yu, "Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds," in Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov. 2019, pp. 4240-4250. https://doi.org/10.18653/v1/D19-1434

B. Couvy-Duchesne, T. A. Davenport, N. G. Martin, M. J. Wright, and I. B. Hickie, "Validation and psychometric properties of the Somatic and Psychological HEalth REport (SPHERE) in a young Australian-based population sample using non-parametric item response theory," BMC Psychiatry, vol. 17, no. 1, Aug. 2017, Art. no. 279. https://doi.org/10.1186/s12888-017-1420-1

P. M. Bentler and D. G. Bonett, "Significance tests and goodness of fit in the analysis of covariance structures," Psychological Bulletin, vol. 88, no. 3, pp. 588-606, 1980. https://doi.org/10.1037/0033-2909.88.3.588

A. Schimmenti, L. Sideli, L. L. Marca, A. Gori, and G. Terrone, "Reliability, Validity, and Factor Structure of the Maladaptive Daydreaming Scale (MDS-16) in an Italian Sample," Journal of Personality Assessment, vol. 102, no. 5, pp. 689-701, Sep. 2020. https://doi.org/10.1080/00223891.2019.1594240

C.-Y. Lin, V. Imani, M. D. Griffiths, and A. H. Pakpour, "Validity of the Yale Food Addiction Scale for Children (YFAS-C): Classical test theory and item response theory of the Persian YFAS-C," Eating and Weight Disorders - Studies on Anorexia, Bulimia and Obesity, Jul. 2020. https://doi.org/10.1007/s40519-020-00956-x

L. Jiang et al., "The Reliability and Validity of the Center for Epidemiologic Studies Depression Scale (CES-D) for Chinese University Students," Frontiers in Psychiatry, vol. 10, 2019, Art. no. 315. https://doi.org/10.3389/fpsyt.2019.00315

S. Doi, M. Ito, Y. Takebayashi, K. Muramatsu, and M. Horikoshi, "Factorial validity and invariance of the Patient Health Questionnaire (PHQ)-9 among clinical and non-clinical populations," PLOS ONE, vol. 13, no. 7, 2018, Art. no. e0199235. https://doi.org/10.1371/journal.pone.0199235

T. Tsubakita, K. Shimazaki, H. Ito, and N. Kawazoe, "Item response theory analysis of the Utrecht Work Engagement Scale for Students (UWES-S) using a sample of Japanese university and college students majoring medical science, nursing, and natural science," BMC Research Notes, vol. 10, no. 1, Oct. 2017, Art. no. 528. https://doi.org/10.1186/s13104-017-2839-7

S. C. Smid, D. McNeish, M. Miocevic, and R. van de Schoot, "Bayesian Versus Frequentist Estimation for Structural Equation Models in Small Sample Contexts: A Systematic Review," Structural Equation Modeling: A Multidisciplinary Journal, vol. 27, no. 1, pp. 131-161, Jan. 2020. https://doi.org/10.1080/10705511.2019.1577140

M. K. Cain and Z. Zhang, "Fit for a Bayesian: An Evaluation of PPP and DIC for Structural Equation Modeling," Structural Equation Modeling: A Multidisciplinary Journal, vol. 26, no. 1, pp. 39-50, Jan. 2019. https://doi.org/10.1080/10705511.2018.1490648

D. Garson, "StatNotes: Topics in Multivariate Analysis," North Carolina State University. https://faculty.chass.ncsu.edu/garson/PA765/

statnote.htm (accessed Feb. 10, 2021).

H. W. Marsh, K.-T. Hau, and D. Grayson, "Goodness of Fit in Structural Equation Models," in Contemporary psychometrics: A festschrift for Roderick P. McDonald, Mahwah, NJ, USA: Lawrence Erlbaum Associates Publishers, 2005, pp. 275-340.

I. Williams, "A speededness item response model for associating ability and speededness parameters," Ph.D. dissertation, Rutgers University, New Brunswick, NJ, USA, 2017.

B. Shamshad and J. S. Siddiqui, "Testing Procedure for Item Response Probabilities of 2Class Latent Model," Mehran University Research Journal of Engineering and Technology, vol. 39, no. 3, pp. 657-667, Jul. 2020. https://doi.org/10.22581/muet1982.2003.20

K. M. Williams and B. D. Zumbo, "Item Characteristic Curve Estimation of Signal Detection Theory-Based Personality Data: A Two-Stage Approach to Item Response Modeling," International Journal of Testing, vol. 3, no. 2, pp. 189-213, Jun. 2003. https://doi.org/10.1207/S15327574IJT0302_7

D. Tafiadis et al., "Using Receiver Operating Characteristic Curve to Define the Cutoff Points of Voice Handicap Index Applied to Young Adult Male Smokers," Journal of Voice, vol. 32, no. 4, pp. 443-448, Jul. 2018. https://doi.org/10.1016/j.jvoice.2017.06.007

L. Lina, D. Mardapi, and H. Haryanto, "Item Characteristics on Pro-TEFL Listening Section," presented at the First International Conference on Advances in Education, Humanities, and Language, ICEL 2019, Malang, Indonesia, 23-24 March 2019, Jul. 2019. https://doi.org/10.4108/eai.11-7-2019.159630

D. L. Moody, "The method evaluation model: a theoretical model for validating information systems design methods," in European Conference on Information Systems, Naples, Italy, Jun. 2003, pp. 1-17.

H. Davis, T. M. Rosner, M. C. D'Angelo, E. MacLellan, and B. Milliken, "Selective attention effects on recognition: the roles of list context and perceptual difficulty," Psychological Research, vol. 84, no. 5, pp. 1249-1268, Jul. 2020. https://doi.org/10.1007/s00426-019-01153-x

L. Sun, Y. Liu, and F. Luo, "Automatic Generation of Number Series Reasoning Items of High Difficulty," Frontiers in Psychology, vol. 10, 2019, Art. no. 884. https://doi.org/10.3389/fpsyg.2019.00884

T. O. Abe and E. O. Omole, "Difficulty and Discriminating Indices of Junior Secondary School Mathematics Examination; A Case Study of Oriade Local Government, Osun State," American Journal of Education and Information Technology, vol. 3, no. 2, pp. 37-46, Oct. 2019.

G. Nelson and S. R. Powell, "Computation Error Analysis: Students With Mathematics Difficulty Compared To Typically Achieving Students," Assessment for Effective Intervention, vol. 43, no. 3, pp. 144-156, Jun. 2018. https://doi.org/10.1177/1534508417745627

H. Retnawati, B. Kartowagiran, J. Arlinwibowo, and E. Sulistyaningsih, "Why Are the Mathematics National Examination Items Difficult and What Is Teachers' Strategy to Overcome It?," International Journal of Instruction, vol. 10, no. 3, pp. 257-276, Jul. 2017. https://doi.org/10.12973/iji.2017.10317a

T. A. Holster, J. W. Lake, and W. R. Pellowe, "Measuring and predicting graded reader difficulty," vol. 29, no. 2, pp. 218-244, Oct. 2017.

S. Gaitas and M. A. Martins, "Teacher perceived difficulty in implementing differentiated instructional strategies in primary school," International Journal of Inclusive Education, vol. 21, no. 5, pp. 544-556, May 2017. https://doi.org/10.1080/13603116.2016.1223180

J. L. D'Sa and M. L. Visbal-Dionaldo, "Analysis of Multiple Choice Questions: Item Difficulty, Discrimination Index and Distractor Efficiency," International Journal of Nursing Education, vol. 9, no. 3, pp. 109-114, 2017. https://doi.org/10.5958/0974-9357.2017.00079.4

A. H. Blasi and M. Alsuwaiket, "Analysis of Students' Misconducts in Higher Education using Decision Tree and ANN Algorithms," Engineering, Technology & Applied Science Research, vol. 10, no. 6, pp. 6510-6514, Dec. 2020. https://doi.org/10.48084/etasr.3927

N. Sharifi, M. Falsafi, N. Farokhi, and E. Jamali, "Assessing the optimal method of detecting Differential Item Functioning in Computerized Adaptive Testing," Quarterly of Educational Measurement, vol. 9, no. 33, pp. 23-51, Oct. 2018.

J. J. Hox, C. J. M. Maas, and M. J. S. Brinkhuis, "The effect of estimation method and sample size in multilevel structural equation modeling," Statistica Neerlandica, vol. 64, no. 2, pp. 157-170, 2010. https://doi.org/10.1111/j.1467-9574.2009.00445.x

G. Makransky, L. Lilleholt, and A. Aaby, "Development and validation of the Multimodal Presence Scale for virtual reality environments: A confirmatory factor analysis and item response theory approach," Computers in Human Behavior, vol. 72, pp. 276-285, Jul. 2017. https://doi.org/10.1016/j.chb.2017.02.066

J. A. Costa, J. Maroco, and J. Pinto‐Gouveia, "Validation of the psychometric properties of cognitive fusion questionnaire. A study of the factorial validity and factorial invariance of the measure among osteoarticular disease, diabetes mellitus, obesity, depressive disorder, and general populations," Clinical Psychology & Psychotherapy, vol. 24, no. 5, pp. 1121-1129, 2017. https://doi.org/10.1002/cpp.2077


Abstract Views: 64
PDF Downloads: 32

Metrics Information
Bookmark and Share

Most read articles by the same author(s)