Your conditions: 秦春影
  • 认知诊断评估中缺失数据的处理:随机森林阈值插补法

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: As a new form of test, cognitive diagnostic assessment has attracted wide attention from researchers at home and abroad. At the same time, missing data caused by characteristics of the test design is a rather common issue encountered in cognitive diagnostic tests. It is therefore very important to develop an effective solution for dealing with missing data in cognitive diagnostic assessment ensuring that diagnosis feedback provided to both students and teachers is more accurate and reliable. As a matter of fact, machine learning has been applied to impute missing data in recent years. As one of the machine learning algorithms, the random forest has been proved to be a state-of-the-art learner because it exhibits good performance when handling classification and regression tasks with effectiveness and efficiency, and is capable of solving multi-class classification problems in an efficient manner. Interestingly, this algorithm has a distinct advantage in terms of coping with noise interference. Furthermore, the random forest imputation method, an improved algorithm for dealing with missing data based on the random forest algorithm, makes full use of the available response information and characteristics of response patterns of participants to impute missing data instead of assuming the mechanism of missingness in advance. By combining the advantages of the random forest method in classification and prediction and the assumption-free feature of the random forest imputation method, we attempt to improve the existing random forest imputation algorithm so that the method can be properly applied to handle missing data in cognitive diagnostic assessment. On the basis of the DINA (Deterministic Inputs, Noise "And" Gate) model, widely used in cognitive diagnostic assessment, we introduce the RCI (Response Conformity Index) into missing data imputation to identify threshold of imputation type and hence proposes a new method for handling missing responses in the DINA model: random forest threshold imputation (RFTI) approach. Two simulation studies have been conducted in order to validate the effectiveness of RFTI. In addition, the advantages of the new method have been explored by comparing it with traditional techniques for handling missing data. First, the theoretical basis and algorithm implementation of RFTI were described in detail. Then, two Monte Carlo simulations were employed to validate the effectiveness of RFTI in terms of imputation rate and accuracy as well as the accuracy in DINA model parameter estimation. Moreover, the applicability of RFTI was investigated by considering different mechanisms for missingness (MNAR, MIXED, MAR and MCAR) and different proportions of missing values (10%, 20% 30%, 40% and 50%). The main results indicated: (1) imputation accuracy of RFT was significantly higher than that of the random forest imputation (RFTI) methods, and the data missingness rate treated by RFTI was about 10% under all conditions; (2) the highest attribute pattern match ratio and attribute marginal match ratio of participants were observed using RFTI under all conditions as compared to that of EM algorithm and RFI. Moreover, this behavior depended on the proportion and mechanisms of missing data. Results indicated that this phenomenon became more obvious when the missingness mechanism was MNAR and MIXED and the proportion of missing responses were more than 30%. However, the new algorithm failed to show superiority in estimating DINA model parameter. Based on these results, we conclude the article with an overall summary and recommendations, as well as the further direction.

  • 基于作答时间数据的改变点分析在检测加速作答中的探索——已知和未知项目参数

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: In recent years, response time has received a rapidly growing amount of attention in psychometric research, likely due to the increasing availability of (item-level) response time data through computer-based testing and online survey data collection. Compared to the conventional item response data that are often dichotomous or polytomous, the response time is continuous and can provide much more information. Aberrant response behaviors are frequently encountered during testing. It could cause various negative effects. Change point analysis (CPA) is a well-established statistical process control method to detect changes in a sequence, and it has provided testing professionals a new lens through to understand test-taking behavior at both the examinee and item levels. In this paper, we took test speededness as an example to illustrate how the CPA method can be used to detect aberrant behavior using item response time data. Response time under speededness was simulated using the gradual-change log-normal model for response time. Two CPA-based test statistics, the Likelihood Ratio Test and Wald Test, were used to detect aberrant response behaviors. The critical values were obtained through Monte Carlo simulations and compared with the approximate critical values in a previous study. Based on the chosen critical values, we examined the performance of the likelihood ratio test and Wald test in detecting speeded responses, specifically in terms of power and empirical Type-I error. On the one hand, the critical values are almost identical for Wald and the likelihood ratio test. They vary substantially at different nominal α levels, but do not differ much across different test lengths. On the other hand, compared to approximate critical values, the critical values are not too far away from them but are different. That may be because the approximate critical values are suitable for situations where the change point appears in the middle of the test. Results indicate that the proposed method is much more powerful based on the critical values than conventional methods that use item response data. The power was close to 1 for most of the conditions while keeping the type-I error rate well-controlled. Real data analysis also demonstrates the performance of the method. This study uses CPA with response time data and offers a very promising approach to detecting aberrant response behavior. Through the simulation study, we demonstrated that it is possible to use fixed critical values in different test lengths, which makes the application of the method straightforward. It also means that it is unnecessary to reconduct the simulation to update critical values when small changes occur in the test. CPA is very flexible. This study assumed that the log-normal model fits the response time data, but the method is not bounded by that assumption.

  • 多级属性Q矩阵的验证与估计

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: Cognitive diagnosis has recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. Generally, entries in the Q-matrix of traditional cognitive diagnostic tests are binary (two levels, defined as 0 and 1). Polytomous attributes (multi-levels, defined as 0, 1, …), particularly those defined as part of the test development process, can provide additional diagnostic information. Compared to binary attributes, polytomous attributes can not only describe the student's knowledge profile, but can provide more extensive details. As we all know, Q-matrix impacts the accuracy of cognitive diagnostic assessment greatly. Research on the effect of parameter estimation and classification accuracy caused by the error in Q-matrix already existed, and it turned out that Q-matrix gotten from expert definition or experience was more easily subject to be affected by subjective factors, lead to a misspecified Q-matrix. Under this circumstance, it’s urgently needed to find more objective polytomous-attribute Q-matrix verification and inference methods. The present research proposes the verification and estimation of expert-defined polytomous attribute Q-matrix based on the polytomous deterministic inputs, noisy, ‘‘and’’ gate (p-DINA) model. We intend to extend the methods adapted to binary Q-matrix verification and estimation to polytomous attribute Q-matrix, and the proposed methods which can be used in different conditions are joint estimation and online estimation. Simulation results show that: the joint estimation algorithm can be applied to the Q-matrix validation which needs an initial Q-matrix defined by experts, the online estimation algorithm can be applied to online estimate the “new items” based on a certain number of “based items”. Under the various settings in the simulations, the two estimation algorithms can recover the correct polytomous-attribute Q-matrix at a high probability. Empirical study also indicates that the two proposed algorithms can be applied in Q-matrix validation or estimation for CDA with polytomous attributes.

  • 多级计分测验中基于残差统计量的被试拟合研究

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: Tests are widely used in educational measurement and psychometrics, and the examinee’s aberrant responses will affect the estimation of their abilities. These examinees with aberrant responses should not be treated with conventional methods, the important thing is to accurately screen them out of the normal group. To achieve this, a common method is to construct person-fit statistics to detect whether the response patterns fit their estimated abilities. In this study, a residual-based person-fit statistic R was proposed, which can be applied to both dichotomous or polytomous IRT models. The construction of R is based on a weighted residual between the observed response and the expected response. By accumulating the weighted residuals, the goodness of fit can be calculated and compared with a specific critical value to determine whether an examinee is aberrant or not. Given that tests with polytomous items can provide more information, polytomously scored items are being increasingly popular in educational measurement and psychometrics. The ability of R statistic to detect aberrant response patterns under the graded response model was mainly considered in this article. An existing polytomous person-ft statistic lzp was also introduced in its outstanding standardized form and superior power. In the first study, a simulation study was conducted to generate the empirical distribution of R statistic and lzp. R statistic is an accumulation of weighted residuals, showing a positive skew distribution; lzp shows a negative skew distribution when the test is less than 80 items. Both of them differ from the standard normal distribution, It is necessary to set critical value according to the type 1 error, using it to distinguish whether each respondent's response pattern is fitted. In the second study, examinees with different aberrant behaviors (e.g., Cheaters, Lucky guessers, Random respondents, Careless respondents, Creative respondents and Mixed) under different test length conditions were simulated, and the detection rate as well as area under curve (AUC) were used to compare the effectiveness of the two person-fit statistics. The results show that the R statistic has a better detection rate than lzp when the aberrant behavior affects only a few items or the aberrant behavior is cheating or guessing. When the aberrant behavior covers plenty of items, lzp is slightly better than R statistic. Then, an empirical study was also conducted to show the power of R statistic. Both of the R statistic and the lzp have their own pros and cons, so we may combine them in the future person-fit studies. The R statistic has a better detection rate under certain conditions compared to the lzp, especially when cheating and lucky guessing happened. Considering that cheating and guessing behaviors of low-ability examinees are more preferred in many aberrant test behaviors, the R statistic is worthy of further research and exploration in real-world applications.

  • Validation and Estimation of Expert-defined Q-matrix with Polytomous Attribute

    Subjects: Psychology >> Psychological Measurement submitted time 2022-05-26

    Abstract:

    Cognitive diagnosis has recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. Generally, entries in the Q-matrix of traditional cognitive diagnostic tests are binary (two levels, defined as 0 and 1). Polytomous attributes (multi-levels, defined as 0, 1, ? ), particularly those defined as part of the test development process, can provide additional diagnostic information. Compared to binary attributes, polytomous attributes can not only describe the student's knowledge profile, but can provide more extensive details.

    As we all know, Q-matrix impacts the accuracy of cognitive diagnostic assessment greatly. Research on the effect of parameter estimation and classification accuracy caused by the error in Q-matrix already existed, and it turned out that Q-matrix gotten from expert definition or experience was more easily subject to be affected by subjective factors, lead to a misspecified Q-matrix. Under this circumstance, it’s urgently needed to find more objective polytomous-attribute Q-matrix verification and inference methods.

    The present research proposes the verification and estimation of expert-defined polytomous attribute Q-matrix based on the polytomous deterministic inputs, noisy, ‘‘and’’ gate (p-DINA) model. We intend to extend the methods adapted to binary Q-matrix verification and estimation to polytomous attribute Q-matrix, and the proposed methods which can be used in different conditions are joint estimation and online estimation. Simulation results show that: the joint estimation algorithm can be applied to the Q-matrix validation which needs an initial Q-matrix defined by experts, the online estimation algorithm can be applied to online estimate the "new items" based on a certain number of "based items". Under the various settings in the simulations, the two estimation algorithms can recover the correct polytomous-attribute Q-matrix at a high probability. Empirical study also indicates that the two proposed algorithms can be applied in Q-matrix validation or estimation for CDA with polytomous attributes.

  • Application of Change Point Analysis to Detect Speededness Based on Response Time Data with Known/Unknown Item Parameters

    Subjects: Psychology >> Psychological Measurement submitted time 2022-05-14

    Abstract:

    In recent years, response time has received a rapidly growing amount of attention in psychometric research, likely due to the increasing availability of (item-level) response time data through computer-based testing and online survey data collection. Compared to the conventional item response data that are often dichotomous or polytomous, the response time is continuous and can provide much more information. Aberrant response behaviors are frequently encountered during testing. It could cause various negative effects. Change point analysis (CPA) is a well-established statistical process control method to detect changes in a sequence, and it has provided testing professionals a new lens through to understand test-taking behavior at both the examinee and item levels.

    In this paper, we took test speededness as an example to illustrate how the CPA method can be used to detect aberrant behavior using item response time data. Response time under speededness was simulated using the gradual-change log-normal model for response time. Two CPA-based test statistics, the Likelihood Ratio Test and Wald Test, were used to detect aberrant response behaviors. The critical values were obtained through Monte Carlo simulations and compared with the approximate critical values in a previous study. Based on the chosen critical values, we examined the performance of the likelihood ratio test and Wald test in detecting speeded responses, specifically in terms of power and empirical Type-I error.

    On the one hand, the critical values are almost identical for Wald and the likelihood ratio test. They vary substantially at different nominal α  levels, but do not differ much across different test lengths. On the other hand, compared to approximate critical values, the critical values are not too far away from them but are different. That may be because the approximate critical values are suitable for situations where the change point appears in the middle of the test. Results indicate that the proposed method is much more powerful based on the critical values than conventional methods that use item response data. The power was close to 1 for most of the conditions while keeping the type-I error rate well-controlled. Real data analysis also demonstrates the performance of the method.

    This study uses CPA with response time data and offers a very promising approach to detecting aberrant response behavior. Through the simulation study, we demonstrated that it is possible to use fixed critical values in different test lengths, which makes the application of the method straightforward. It also means that it is unnecessary to reconduct the simulation to update critical values when small changes occur in the test. CPA is very flexible. This study assumed that the log-normal model fits the response time data, but the method is not bounded by that assumption.

  • A Comparative Study of Item Selection Methods in CD-CAT based on Nominal Response Model

    Subjects: Psychology >> Psychological Measurement submitted time 2022-05-12

    Abstract:

    Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the advantages of cognitive diagnosis and CAT, which could improve the efficiency and accuracy of CD-CAT. CD-CAT can be divided into two types: dichotomous and polytomous. Presently, the majority of researches on CD-CAT are based on dichotomous CD-CAT. However, among the practical tests in psychology and education, there are many polytomous items, which can be further divided into nominal polytomous and ordinal polytomous items according to whether there is an order or grade between every response category. Nominal polytomous items are items whose response categories are independent and without orders or grades between every response category. Although researchers have developed (ordinal) polytomous CDMs and corresponding CD-CAT, few nominal CDMs and CD-CAT are based on nominal responses.

    This study introduces seven commonly used item selection methods in dichotomous CD-CAT into NCD-CAT (CD-CAT based on nominal response models). PMR (pattern match ratio) and test efficiency index are evaluated under different conditions between these item selection methods. Here are details of two simulation studies below. Study 1 compared the performance of NR_PWKL, NR_PWCDI, NR_PWACDI, NR_MPWKL, NR_SHE, NR_MI, and NR_GDI methods under different test lengths (5, 10, 15, 20) and item pool qualities (high and low) in NCD-CAT. Results showed that: (1) the PMRs of NR_PWCDI, NR_PWACDI, NR_MPWKL, NR_SHE, and NR_MI are higher than or equal to that of NR_PWKL, especially in short tests. (2) as test length gets longer, that PMR advantage is missing, which is the same as the results of Zheng and Chang (2016). (3) compared to test length, item quality has a greater impact on PMR. For instance, with item quality descending, the PMR declined about 30% among all conditions. Study 2 is an experiment on variable-length NCD-CAT that was conducted to compare the performance of each item selection method under the conditions of three maximum posterior probabilities (0.8, 0.85, 0.9) and two item qualities (high and low). The results showed that: (1)under all experimental conditions the average test lengths of NR_PWCDI, NR_PWACDI, NR_MPWKL, NR_SHE, and NR_MI are shorter than that of NR_PWKL; the difference is more than 0.738. (2)affected by item quality, the average length of NR_GDI is smaller than that of NR_PWKL under high-quality conditions and larger than it under low-quality conditions.

    To sum up, this study compared the performance of 7 commonly used item selection methods of dichotomous CD-CAT in NCD-CAT with different conditions (fixed and variable length). The simulation study showed that under most conditions, the NR_PWCDI, NR_PWACDI, NR_MPWKL, NR_SHE, and NR_MI methods performed well when compared to baseline algorithm NR_PWKL. This study has expanded the alternatives of item selection methods in NCD-CAT.

    "

  • Research on Person-fit in Cognitive Diagnostic Assessment

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Educational Psychology submitted time 2022-05-12

    Abstract:

    Cognitive Diagnostic Assessment (CDA) has been widely used in educational assessment. It can provide guidance for further study and teaching by analyzing whether the test-takers have acquired knowledge points or skills.

    In psychometrics, statistical methods for assessing the fit of an examinee’s item responses to a postulated psychometric model are often called person-fit statistic. The person-fit analysis can help to verify the individual diagnostic results, and is mainly used to distinguish the abnormal examinees from the normal ones. The abnormal response patterns include “sleeping” behavior, fatigue, cheating, creative responding, random guessing responses and cheating with randomness, and all of these abnormal response patterns can affect the deviation of examinee’s ability estimation. The person-fit analysis can help researchers identify the abnormal response patterns more accurately, so as to delete the abnormal responding examinees and improve the validity of the test. In the past, most of the person fit researches were mainly carried out under the Item Response Theory (IRT) framework, while only few papers have been published dealing with person-fit under the CDM framework. This study attempts to fill a gap in the literature by introducing new methods. In this study, a new person fit index (R) was proposed.

    In order to verify the validity of the newly developed person fit index, this study explores the type I error and statistical test power of R index under different item length, item discrimination and different misfit types of respondent, and compares it with existing methods RCI and lz . Type I error rate was defined as the proportion of flagged abnormal response patterns by a person fit statistic out of 1,000 generated normal response patterns from the DINA model. The control variables of this study include: the number of subjects is controlled to 1000, the cognitive diagnosis model is chosen as DINA model, the attributes are 6, and the Q matrix is fixed. Finally, in order to reflect the value of person fit index in practical application, the R index is applied to the empirical data of fractional subtraction.

    The results show that the type I error of R index is reasonable and stable at 0.05. In the aspect of statistical test power, with the improvement of item differentiation, the statistical test power of each index in different abnormal examinees is improved. With the increase in the number of items, most of the statistical power show an upward trend. For different types of abnormal subjects, R index perform best in the cases of random guessing responses and cheating with randomness. In the case of fatigue, sleep, and creative responding, the lz  index perform better. In the empirical data study, the detection rate of abnormal examinees is 4.29%.

    With the increase of the discrimination of items and the increase of the number of items, the power of R index has improved, and the performance of R index is the most robust when the discrimination of item is low. The R index has a high power for the types of abnormal behavior such as creative responding behavior, random guessing responses and cheating with randomness.

    "

  • Detection of aberrant response patterns using a residual-based statistic in testing with polytomous items

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2022-04-06

    Abstract:本文提出一种多级计分项目下的个人拟合统计量R ,考察它在检测6种常见的异常作答模式(作弊、猜测、随机、粗心、创新作答、混合异常)下的表现,并与标准化对数似然统计量lzp 进行比较。结果表明:(1) 在异常作答覆盖率较低并且异常作答类型为作弊和猜测时,R 的检测率显著高于lzp ;(2) 随着测验长度和被试异常程度的增加,两种统计量的检测率都会上升;(3) 在一些条件下,R 与lzp 检测效果接近。实证数据分析进一步展示了R 统计量的使用方法和过程,结果也表明R 统计量具有较好的应用前景。

  • Operating Unit: National Science Library,Chinese Academy of Sciences
  • Production Maintenance: National Science Library,Chinese Academy of Sciences
  • Mail: eprint@mail.las.ac.cn
  • Address: 33 Beisihuan Xilu,Zhongguancun,Beijing P.R.China