Your conditions: 南昌师范学院
  • Validation and Estimation of Expert-defined Q-matrix with Polytomous Attribute

    Subjects: Psychology >> Psychological Measurement submitted time 2022-05-26

    Abstract:

    Cognitive diagnosis has recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. Generally, entries in the Q-matrix of traditional cognitive diagnostic tests are binary (two levels, defined as 0 and 1). Polytomous attributes (multi-levels, defined as 0, 1, ? ), particularly those defined as part of the test development process, can provide additional diagnostic information. Compared to binary attributes, polytomous attributes can not only describe the student's knowledge profile, but can provide more extensive details.

    As we all know, Q-matrix impacts the accuracy of cognitive diagnostic assessment greatly. Research on the effect of parameter estimation and classification accuracy caused by the error in Q-matrix already existed, and it turned out that Q-matrix gotten from expert definition or experience was more easily subject to be affected by subjective factors, lead to a misspecified Q-matrix. Under this circumstance, it’s urgently needed to find more objective polytomous-attribute Q-matrix verification and inference methods.

    The present research proposes the verification and estimation of expert-defined polytomous attribute Q-matrix based on the polytomous deterministic inputs, noisy, ‘‘and’’ gate (p-DINA) model. We intend to extend the methods adapted to binary Q-matrix verification and estimation to polytomous attribute Q-matrix, and the proposed methods which can be used in different conditions are joint estimation and online estimation. Simulation results show that: the joint estimation algorithm can be applied to the Q-matrix validation which needs an initial Q-matrix defined by experts, the online estimation algorithm can be applied to online estimate the "new items" based on a certain number of "based items". Under the various settings in the simulations, the two estimation algorithms can recover the correct polytomous-attribute Q-matrix at a high probability. Empirical study also indicates that the two proposed algorithms can be applied in Q-matrix validation or estimation for CDA with polytomous attributes.

  • Application of Change Point Analysis to Detect Speededness Based on Response Time Data with Known/Unknown Item Parameters

    Subjects: Psychology >> Psychological Measurement submitted time 2022-05-14

    Abstract:

    In recent years, response time has received a rapidly growing amount of attention in psychometric research, likely due to the increasing availability of (item-level) response time data through computer-based testing and online survey data collection. Compared to the conventional item response data that are often dichotomous or polytomous, the response time is continuous and can provide much more information. Aberrant response behaviors are frequently encountered during testing. It could cause various negative effects. Change point analysis (CPA) is a well-established statistical process control method to detect changes in a sequence, and it has provided testing professionals a new lens through to understand test-taking behavior at both the examinee and item levels.

    In this paper, we took test speededness as an example to illustrate how the CPA method can be used to detect aberrant behavior using item response time data. Response time under speededness was simulated using the gradual-change log-normal model for response time. Two CPA-based test statistics, the Likelihood Ratio Test and Wald Test, were used to detect aberrant response behaviors. The critical values were obtained through Monte Carlo simulations and compared with the approximate critical values in a previous study. Based on the chosen critical values, we examined the performance of the likelihood ratio test and Wald test in detecting speeded responses, specifically in terms of power and empirical Type-I error.

    On the one hand, the critical values are almost identical for Wald and the likelihood ratio test. They vary substantially at different nominal α  levels, but do not differ much across different test lengths. On the other hand, compared to approximate critical values, the critical values are not too far away from them but are different. That may be because the approximate critical values are suitable for situations where the change point appears in the middle of the test. Results indicate that the proposed method is much more powerful based on the critical values than conventional methods that use item response data. The power was close to 1 for most of the conditions while keeping the type-I error rate well-controlled. Real data analysis also demonstrates the performance of the method.

    This study uses CPA with response time data and offers a very promising approach to detecting aberrant response behavior. Through the simulation study, we demonstrated that it is possible to use fixed critical values in different test lengths, which makes the application of the method straightforward. It also means that it is unnecessary to reconduct the simulation to update critical values when small changes occur in the test. CPA is very flexible. This study assumed that the log-normal model fits the response time data, but the method is not bounded by that assumption.

  • A Comparative Study of Item Selection Methods in CD-CAT based on Nominal Response Model

    Subjects: Psychology >> Psychological Measurement submitted time 2022-05-12

    Abstract:

    Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the advantages of cognitive diagnosis and CAT, which could improve the efficiency and accuracy of CD-CAT. CD-CAT can be divided into two types: dichotomous and polytomous. Presently, the majority of researches on CD-CAT are based on dichotomous CD-CAT. However, among the practical tests in psychology and education, there are many polytomous items, which can be further divided into nominal polytomous and ordinal polytomous items according to whether there is an order or grade between every response category. Nominal polytomous items are items whose response categories are independent and without orders or grades between every response category. Although researchers have developed (ordinal) polytomous CDMs and corresponding CD-CAT, few nominal CDMs and CD-CAT are based on nominal responses.

    This study introduces seven commonly used item selection methods in dichotomous CD-CAT into NCD-CAT (CD-CAT based on nominal response models). PMR (pattern match ratio) and test efficiency index are evaluated under different conditions between these item selection methods. Here are details of two simulation studies below. Study 1 compared the performance of NR_PWKL, NR_PWCDI, NR_PWACDI, NR_MPWKL, NR_SHE, NR_MI, and NR_GDI methods under different test lengths (5, 10, 15, 20) and item pool qualities (high and low) in NCD-CAT. Results showed that: (1) the PMRs of NR_PWCDI, NR_PWACDI, NR_MPWKL, NR_SHE, and NR_MI are higher than or equal to that of NR_PWKL, especially in short tests. (2) as test length gets longer, that PMR advantage is missing, which is the same as the results of Zheng and Chang (2016). (3) compared to test length, item quality has a greater impact on PMR. For instance, with item quality descending, the PMR declined about 30% among all conditions. Study 2 is an experiment on variable-length NCD-CAT that was conducted to compare the performance of each item selection method under the conditions of three maximum posterior probabilities (0.8, 0.85, 0.9) and two item qualities (high and low). The results showed that: (1)under all experimental conditions the average test lengths of NR_PWCDI, NR_PWACDI, NR_MPWKL, NR_SHE, and NR_MI are shorter than that of NR_PWKL; the difference is more than 0.738. (2)affected by item quality, the average length of NR_GDI is smaller than that of NR_PWKL under high-quality conditions and larger than it under low-quality conditions.

    To sum up, this study compared the performance of 7 commonly used item selection methods of dichotomous CD-CAT in NCD-CAT with different conditions (fixed and variable length). The simulation study showed that under most conditions, the NR_PWCDI, NR_PWACDI, NR_MPWKL, NR_SHE, and NR_MI methods performed well when compared to baseline algorithm NR_PWKL. This study has expanded the alternatives of item selection methods in NCD-CAT.

    "

  • Research on Person-fit in Cognitive Diagnostic Assessment

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Educational Psychology submitted time 2022-05-12

    Abstract:

    Cognitive Diagnostic Assessment (CDA) has been widely used in educational assessment. It can provide guidance for further study and teaching by analyzing whether the test-takers have acquired knowledge points or skills.

    In psychometrics, statistical methods for assessing the fit of an examinee’s item responses to a postulated psychometric model are often called person-fit statistic. The person-fit analysis can help to verify the individual diagnostic results, and is mainly used to distinguish the abnormal examinees from the normal ones. The abnormal response patterns include “sleeping” behavior, fatigue, cheating, creative responding, random guessing responses and cheating with randomness, and all of these abnormal response patterns can affect the deviation of examinee’s ability estimation. The person-fit analysis can help researchers identify the abnormal response patterns more accurately, so as to delete the abnormal responding examinees and improve the validity of the test. In the past, most of the person fit researches were mainly carried out under the Item Response Theory (IRT) framework, while only few papers have been published dealing with person-fit under the CDM framework. This study attempts to fill a gap in the literature by introducing new methods. In this study, a new person fit index (R) was proposed.

    In order to verify the validity of the newly developed person fit index, this study explores the type I error and statistical test power of R index under different item length, item discrimination and different misfit types of respondent, and compares it with existing methods RCI and lz . Type I error rate was defined as the proportion of flagged abnormal response patterns by a person fit statistic out of 1,000 generated normal response patterns from the DINA model. The control variables of this study include: the number of subjects is controlled to 1000, the cognitive diagnosis model is chosen as DINA model, the attributes are 6, and the Q matrix is fixed. Finally, in order to reflect the value of person fit index in practical application, the R index is applied to the empirical data of fractional subtraction.

    The results show that the type I error of R index is reasonable and stable at 0.05. In the aspect of statistical test power, with the improvement of item differentiation, the statistical test power of each index in different abnormal examinees is improved. With the increase in the number of items, most of the statistical power show an upward trend. For different types of abnormal subjects, R index perform best in the cases of random guessing responses and cheating with randomness. In the case of fatigue, sleep, and creative responding, the lz  index perform better. In the empirical data study, the detection rate of abnormal examinees is 4.29%.

    With the increase of the discrimination of items and the increase of the number of items, the power of R index has improved, and the performance of R index is the most robust when the discrimination of item is low. The R index has a high power for the types of abnormal behavior such as creative responding behavior, random guessing responses and cheating with randomness.

    "

  • Detection of aberrant response patterns using a residual-based statistic in testing with polytomous items

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2022-04-06

    Abstract:本文提出一种多级计分项目下的个人拟合统计量R ,考察它在检测6种常见的异常作答模式(作弊、猜测、随机、粗心、创新作答、混合异常)下的表现,并与标准化对数似然统计量lzp 进行比较。结果表明:(1) 在异常作答覆盖率较低并且异常作答类型为作弊和猜测时,R 的检测率显著高于lzp ;(2) 随着测验长度和被试异常程度的增加,两种统计量的检测率都会上升;(3) 在一些条件下,R 与lzp 检测效果接近。实证数据分析进一步展示了R 统计量的使用方法和过程,结果也表明R 统计量具有较好的应用前景。

  • Operating Unit: National Science Library,Chinese Academy of Sciences
  • Production Maintenance: National Science Library,Chinese Academy of Sciences
  • Mail: eprint@mail.las.ac.cn
  • Address: 33 Beisihuan Xilu,Zhongguancun,Beijing P.R.China