Artificial Intelligence Meets Item Analysis (AI meets IA): A Study of Chatbot Training and Performance in detecting and correcting MCQ Flaws
Abstract
Objective: To explore the potential of AI-powered chatbots, specifically ChatGPT, in identifying and correcting flaws in MCQs.
Methods: A three-phase-Interventional study was conducted from February to August 2023 at Riphah International University, Islamabad. In Phase-1, flawed MCQs were selected from the NBME guide and fed into ChatGPT. ChatGPT identified item flaws and suggested corrections. In Phase-2, ChatGPT was trained to detect flaws in MCQs with text data from the NBME item writing guide. In Phase-3, ChatGPT was again tested to detect flaws and correct MCQs. Data were analyzed using SPSS, Version 26 and presented using percentages and McNemar’s test with exact conditional method.
Results: ChatGPT could identify and correct flaws such as use of “None of the above,” “Grammatical cues,” “absolute terms,” and “inconsistently presented numerical data.” However, it struggled with flaws related to “complicated stems,” “long or complex options,” and “vague frequency terms.” After training, ChatGPT became better at identifying and correcting flaws related to complicated stems and absolute terms. It also struggled with recognizing “nonparallel options,” “convergence,” and “word repetition,” both before and after training. ChatGPT’s performance deteriorated during peak hours. The test of significance showed no measurable increase in ChatGPT’s efficiency in detecting item flaws (p = 1.00) and correcting them (p = 0.125).
Conclusion: AI is revolutionizing industries and improving efficiency, but limitations exist in complex conversations, analysis, accuracy, and error prevention. Ongoing research is vital to unlocking AI’s potential, especially in education.
doi: https://doi.org/10.12669/pjms.41.3.11224
How to cite this: Sabqat M, Khan RA, Jawaid M, Sajjad M. Artificial Intelligence Meets Item Analysis (AI meets IA): A Study of Chatbot Training and Performance in detecting and correcting MCQ Flaws. Pak J Med Sci. 2025;41(3):652-656. doi: https://doi.org/10.12669/pjms.41.3.11224
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.