ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2023
7
2021
4

Subjects

Authors

Institution

result total 11.

Hide Summary

Hits

Date

Downloads

Your conditions: 应用实验心理北京市重点实验室

1. ChinaXiv:202310.00025
Download

Confidence Interval Width Contours: Sample Size Planning for Linear Mixed-Effects Models

Subjects: Psychology >> Statistics in Psychology submitted time 2023-10-07

Liu, Yue Xu, Lei Liu, Hongyun Han, Yuting You, Xiaofeng Wan, Zhilin

Abstract： Hierarchical data, which is observed frequently in psychological experiments, is usually analyzed with the linear mixed-effects models (LMEMs), as it can account for multiple sources of random effects due to participants, items, and/or predictors simultaneously. However, it is still unclear of how to determine the sample size and number of trials in LMEMs. In history, sample size planning was conducted based purely on power analysis. Later, the influential article of Maxwell et al. (2008) has made clear that sample size planning should consider statistical power and accuracy in parameter estimation (AIPE) simultaneously. In this paper, we derive a confidence interval width contours plot with the codes to generate it, providing power and AIPE information simultaneously. With this plot, sample size requirements in LMEMs based on power and AIPE criteria can be decided. We also demonstrated how to run sensitivity analysis to assess the impact of the magnitude of experiment effect size and the magnitude of random slope variance on statistical power, AIPE and the results of sample size planning.
There were two sets of sensitivity analysis based on different LMEMs. Sensitivity analysis Ⅰ investigated how the experiment effect size influenced power, AIPE and the requirement of sample size for within-subject experiment design, while sensitivity analysis Ⅱ investigated the impact of random slope variance on optimal sample size based on power and AIPE analysis for the cross-level interaction effect. The results for binary and continuous between-subject variables were compared. In these sensitivity analysis, two factors regarding sample size varied: number of subjects (I=10, 30, 50, 70, 100, 200, 400, 600, 800), number of trials (J=10, 20, 30, 50, 70, 100, 150, 200, 250, 300). The additional manipulated factor was the effect size of experiment effect (standard coefficient of experiment condition= 0.2, 0.5, 0.8, in sensitivity analysis Ⅰ) and the magnitude of random slope variance (0.01, 0.09 and 0.25, in sensitivity analysis Ⅱ). A random slope model was used in sensitivity analysis Ⅰ, while a random slope model with level-2 independent variable was used in sensitivity analysis Ⅱ. Data-generating model and fitted model were the same. Estimation performance was evaluated in terms of convergence rate, power, AIPE for the fixed effect, AIPE for the standard error of the fixed effect, and AIPE for the random effect.
The results are as following. First, there were no convergence problems under all the conditions , except that when the variance of random slope was small and a maximal model was used to fit the data. Second, power increased as sample size, number of trials or effect size increased. However, the number of trials played a key role for the power of within-subject effect, while sample size was more important for the power of cross-level effect. Power was larger for continuous between-subject variable than for binary between-subject variable. Third, although the fixed effect was accurately estimated under all the simulation conditions, the width 95% confidence interval (95%width) was extremely large under some conditions. Lastly, AIPE for the random effect increased as sample size and/or number of trials increased. The variance of residual was estimated accurately. As the variance of random slope increased, the accuracy of the estimates of variances of random intercept decreased, and the accuracy of the estimates of random slope increased.
In conclusion, if sample size planning was conducted solely based on power analysis, the chosen sample size might not be large enough to obtain accurate estimates of effects size. Therefore, the rational for considering statistical power and AIPE during sample size planning was adopted. To shed light on this issue, this article provided a standard procedure based on a confidence interval width contours plot to recommend sample size and number of trials for using LMEMs. This plot visualizes the combined effect of sample size and number of trials per participant on 95% width, power and AIPE for random effects. Based on this tool and other empirical considerations, practitioners can make informed choices about how many participants to test, and how many trials to test each one for.

YES

Hits 1417 Downloads 202 Comment
2. ChinaXiv:202303.09604
Download

心理与教育测验中异常作答处理的新技术: 混合模型方法

Subjects: Psychology >> Social Psychology submitted time 2023-03-28 Cooperative journals: 《心理科学进展》

刘玥刘红云

Abstract： Aberrant responses have been repeatedly reported in psychological and educational measurement. If traditional measurement models or methods (e.g., item response theory, IRT) are applied to data sets contaminated by aberrant responses, parameter estimates may be biased. Therefore, it is necessary to identify aberrant responses and to reduce their detrimental effects. In the literature, there are two traditional response time (RT)-based methods to detect aberrant responses: RT threshold method and RT residual method. The focus of these methods is to find a threshold of RT or RT residual. If a RT or RT residual is remarkably less than the threshold, this response should be regarded as an aberrant response with extremely short RT (e.g., speededness, rapid-guessing), and consequently does not provide information about the test taker’s latent trait. Afterwards, down-weighting strategy, which tries to limit the influence of aberrant responses on parameter estimation by reducing their weight in the sample, can be applied. The mixture model method (MMM), is a new method proposed to handle data contaminated by aberrant responses. This method applies the accommodating strategy, which is to extend a model in order to account for the contaminations directly. MMM shows more advantages in terms of: (1) detecting aberrant responses and obtaining parameter estimates simultaneously, instead of two steps (detecting and down-weighting); (2) precisely recovering the severity of aberrant responding. There are two categories of MMM. The first category of methods assumes that the classification (i.e., whether the item is answered normally or aberrantly) can be predicted by RT. While the second category is a natural extension of van der Linden’s (2007) hierarchical model, which models responses and RTs jointly. In this method, the observed RT, as well as the correct response probability of each item-by-person encounter can be decomposed to RT (or probability) caused by normal response and that caused by aberrant response according to the most important difference between the two distinct behaviors. This method leads to more precisely estimated item and person parameters, as well as excellent classification of aberrant/normal behavior. First, this article compares the basic logic of the two traditional RT-based methods and MMM. Aberrant responses are regarded as outliers in both RT threshold method and RT residual method. Therefore, they rely heavily on the severity of aberrance. If data set is contaminated by aberrant responses seriously, the observed RT (or RT residual) distribution will be different from the expected distribution, which in turn leads to low power and sometimes high false detection rate. On the other hand, MMM, which assumes that both observed RT and correct response probability follow a mixture distribution, treats aberrant and normal responses equally. In that way, it has little reliance on the severity of aberrance. In addition to that, MMM can apply to the situation when all the respondents actually respond regularly in theoretic. In that situation, all the responses are assumed to be classified into one category. Second, this article summarizes the disadvantages of the three methods. MMM has three primary limitations: (1) it usually relies heavily on strong assumptions, which means that it may not perform well if these assumptions are violated; (2) low proportion of aberrant response may lead to convergence problem and model identification problem; (3) it is quite complex and time-consuming. In all, practitioners should choose a proper method according to the characteristics of tests and categories of aberrant responses (e.g., rapid-guessing, item with preknowledge, cheating). In the end, this article suggests future researches may investigate the performance of MMM when its assumptions are violated or data consists of more types of aberrant response patterns. Fixing item parameter estimates, proposing some index to help choosing suitable methods, are encouraged to improve the efficiency of MMM.

Hits 210 Downloads 108 Comment
3. ChinaXiv:202303.09619
Download

问题解决测验中过程数据的特征抽取与能力评估

Subjects: Psychology >> Social Psychology submitted time 2023-03-28 Cooperative journals: 《心理科学进展》

韩雨婷肖悦刘红云

Abstract： Computer-based problem-solving tests can record respondents’ response processes when they explore tasks and solve problems as process data, which is richer in information than traditional outcome data and can be used to estimate latent abilities more accurately. The analysis of process data in problem solving tests consists of two main steps: feature extraction and process information modeling. There are two main approaches to extracting information from process data: top-down and bottom-up method. The top-down method refers to developing rubrics by experts to extract meaningful behavioral indicators from process data. This approach extracts behavioral indicators that are closely related to the conceptual framework, have interpretable and clear scores, and can be analyzed directly using psychometric models, as is the case with items in traditional tests. However, such indicator construction methods are laborious and may miss unknown and previously unnoticed student thought processes, resulting in a loss of information. In contrast, the bottom-up method refers to the use of data-driven approaches to extract information directly from response sequences, which can be divided into the following three categories according to their processing ideas: (1) methods that analogize response sequences to character strings and construct indicators by natural language processing techniques; (2) methods that use dimensionality reduction algorithms to construct low-dimensional numerical feature vectors of response sequences; and (3) methods that use directed graphs to characterize response sequences and use network indicators to describe response features. Such methods partially address the task specificity in establishing scoring rules by experts, and the extracted features can be used to explore the behavioral patterns characteristic of different groups, as well as to predict respondents’ future performance. However, such methods may also lose information, and the relationship between the obtained features and the measured psychological traits is unclear. After behavioral indicators have been extracted from process data, probabilistic models that model the relationship between the indicators and the latent abilities can be constructed to enable the estimation of abilities. Depending on whether the model makes use of sequential relationships between indicators and whether continuously interpretable estimates of latent abilities can be obtained, current modeling methods can be divided into the following three categories: traditional psychometric models and their extensions, stochastic process models, and measurement models that incorporate the idea of stochastic processes. Psychometric models focus on estimates of latent abilities but are limited by their assumption of local independence and cannot include sequential information between indicators in the analysis. The stochastic process model focuses on modeling the response process, retaining information about response paths, but with weaker assumptions between indicators and underlying structure, and is unable to obtain continuous and stable estimates of ability. Finally, psychometric models that incorporate the idea of stochastic processes combine the advantages of both taking the sequence of actions as the object of analysis and having experts specify indicator coefficients or scoring methods that are consistent with the direction of abilities, thus allowing continuous interpretable estimates of abilities to be obtained while using more complete process information. However, such modeling methods are mostly suitable for simple tasks with a limited set of actions thus far. There are several aspects where research on feature extraction and capability evaluation modeling of process data could be improved: (1) improving the interpretability of analysis results; (2) incorporating more information in feature extraction; (3) enabling capability evaluation modeling in more complex problem scenarios; (4) focusing on the practicality of the methods; and (5) integrating and drawing on analytical methods from different fields.

Hits 188 Downloads 88 Comment
4. ChinaXiv:202303.09802
Download

混合效应均值-方差模型的建构和样本量规划探索

Subjects: Psychology >> Social Psychology submitted time 2023-03-28 Cooperative journals: 《心理科学进展》

刘玥方梵刘红云雷怡

Abstract： With the development of data-collection technics and increasing complexity of study designs, nested data widely exists in psychological research. Linear mixed-effects models, unfortunately with an unreasonable hypothesis that the residual variances are homogenous, are generally used in nested data analysis. Meanwhile, Mixed-Effects Location-Scale Models (MELSM) has become more and more popular, because they can handle heterogenous residual variances and are able to add predictors for the two substructures (i.e., mean structure denoted as location model and variance structure denoted as scale model) in different levels. MELSM can avoid estimation bias due to inappropriate assumptions of homogenous variance and explore the relationship among traits and simultaneously investigate the inter- and intra-individual variability, as well as their explanatory variables. This study, aims at developing the methods of model construction and sample size planning for MELSM, using simulated studies and empirical studies. In detail, the main contents of this project are as follows. Study 1 focuses on comparing and selecting candidate models based on Bayesian fit indices to construct MELSM, taking into consideration the estimated method for complicated models. We propose that model selection for location model and scale model can be completed sequentially. Study 2 explores the method of sample size planning for MELSM, according to both power analysis (based on Monte Carlo simulation) and the accuracy in parameter estimation analysis (based on the credible interval of the posterior distribution). Adequate sample size is required for both the power and the accuracy in parameter estimation. Study 3 extends the sample size planning method for MELSM to better frame the considerations of uncertainty. By specifying the prior distribution of effect sizes, repeating sampling and selecting model based on the robust Bayesian fit index suggested by Study 1, three main sources of uncertainty can be well controlled: the uncertainty due to unknown population effect size, sampling variability and model approximation. With the simulated study results, we are able to provide reliable Bayesian fit indices for MELSM construction, and summary the process of sample size planning for MELSM in both determinate and uncertain situations. Moreover, Study 4 illustrates the application of MELSM in two empirical psychological studies and verifies the operability of the conclusions of the simulated studies in practice. The unique contribution of this paper is to further promote the methods of model construction and sample size planning for MELSM, as well as provide methodological foundation for researchers. In addition, we plan to integrate the functions above to develop a user-friendly R package for MELSM and provide a basis for promotion and application of MELSM, which help researchers make sample size planning, model construction and parameter estimation for MELSM easily, according to their specification. If these statistical models are widely implemented, the reproducibility and replicability of psychological studies will be enhanced finally.

Hits 199 Downloads 103 Comment
5. ChinaXiv:202303.08308
Download

用于处理不努力作答的标准化残差系列方法和混合多层模型法的比较

Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

刘玥刘红云游晓锋杨建芹

Abstract： Assessment datasets contaminated by non-effortful responses may lead to serious consequences if not handled appropriately. Previous research has proposed two different strategies: down-weighting and accommodating. Down-weighting tries to limit the influence of aberrant responses on parameter estimation by reducing their weight. The extreme form of down-weighting is the detection and removal of irregular responses and response times (RTs). The standard residual-based methods, including the recently developed residual method using an iterative purification process, can be used to detect non-effortful responses in the framework of down-weighting. In accommodating, on the other hand, one tries to extend a model in order to account for the contaminations directly. This boils down to a mixture hierarchical model (MHM) for responses and RTs. However, to the authors’ knowledge, few studies have compared standard residual methods and MHM under different simulation conditions. It is unknown which method should be applied in different situations. Meanwhile, MHM has strong assumptions for different types of responses. It would be valuable to examine the performance of the method when the assumptions are violated. The purpose of this study is to compare standard residual methods and MHM under a fully crossed simulation design. In addition, specific recommendations for their applications are provided. The simulation study included two scenarios. In simulation scenario I, data were generated under the assumptions of MHM. In simulation scenario II, the assumptions of MHM concerning non-effortful responses and RTs were both violated. Simulation scenario I had three manipulated factors. (1) Non-effort prevalence (ππ\pi ), which was the proportion of individuals with non-effortful responses. It had three levels: 0%, 20% and 40%. (2) Non-effort severity (πnoniπinon\pi _{i}^{non}), which was the proportion of non-effortful responses for each non-effortful individual. It varied between two levels: low and high. When πnoniπinon\pi _{i}^{non} was low, πnoniπinon\pi _{i}^{non} was generated from U (0, 0.25); while when πnoniπinon\pi _{i}^{non} was high, πnoniπinon\pi _{i}^{non} was generated from U (0.5, 0.75), where “U” denoted a uniform distribution. (3) Difference between RTs of non-effortful and effortful responses (dRTdRT{{d}_{RT}}). The difference between RTs from two groups, dRTdRT{{d}_{RT}}, had two levels, small and large. The logarithm of RTs of non-effortful responses were generated from normal distribution N (μμ\mu ,0.50.50.52), where μ =−1 μ =−1\text{ }\!\!\mu\!\!\text{ }=-1 when dRTdRT{{d}_{RT}} was small, μ =−2 μ =−2\text{ }\!\!\mu\!\!\text{ }=-2 when dRTdRT{{d}_{RT}} was large. For generating the non-effortful responses, we followed Wang, Xu and Shang (2018), with the probability of a correct response gjgj{{g}_{j}} setting at 0.25 for all non-effortful responses. In simulation scenario II, only the first two factors were considered. Non-effortful RTs were generated from a uniform distribution with a lower bound of exp(−5)exp(−5)\text{exp}\left( -5 \right) and upper bound being the 5th percentile of RT on item j with τ=0τ=0\tau =0. The probability of a correct response for non-effortful responses was dependent on the ability level of each examinee. In all the conditions, sample size was fixed at I = 2,000 and test length was fixed at J = 30. For each condition, 30 replications were generated. For effortful responses, Responses and RTs were simulated from van der Linden’s (2007) hierarchical model. Item parameters were generated with aj ~U(1,2.5)aj ~U(1,2.5){{a}_{j}}\tilde{\ }U\left( 1,2.5 \right), bj ~N(0,1)bj ~N(0,1){{b}_{j}}\tilde{\ }N\left( 0,1 \right), αj ~U(1.5,2.5),βj ~U(−0.2,0.2) αj ~U(1.5,2.5),βj ~U(−0.2,0.2)~{{\alpha }_{j}}\tilde{\ }U\left( 1.5,2.5 \right),{{\beta }_{j}}\tilde{\ }U\left( -0.2,0.2 \right). For simulees, the person parameters (θi,τi)(θi,τi)\left( {{\theta }_{i}},{{\tau }_{i}} \right) were generated from a bivariate normal distribution with the mean vector of μ=(0,0)′μ=(0,0)′\mathbf{\mu }=\left( 0,0 \right)'and the covariance matrix of Σ=[10.250.250.25]Σ=[10.250.250.25]\mathbf{\Sigma }=\left[ \begin{matrix} 1 & 0.25 \\ 0.25 & 0.25 \\ \end{matrix} \right]. Four methods were compared under each condition: the original standard residual method (OSR), conditional estimate standard residual (CSR), conditional estimate with fixed item parameters standard residual method using iterative purifying procedure (CSRI), and MHM. These methods were implemented in R and JAGS using a Bayesian MCMC sampling method for parameter calibration. Finally, these methods were evaluated in terms of convergence rate, detection accuracy and parameter recovery. The results are presented as following. First of all, MHM suffered from convergence issues, especially for the latent variable indicating non-effortful responses. On the contrary, all the standard residual methods achieved convergence successfully. The convergence issues were more serious in simulation scenario II. Secondly, when all the items were assumed to have effortful responses, the false positive rate (FPR) of MHM was 0. Although the standard residual methods had FPR around 5% (the nominal level), the accuracy of parameter estimates was similar for all these methods. Third, when data were contaminated by non-effortful responses, CSRI had higher true positive rate (TPR) almost in all the conditions. MHM showed lower TPR but lower false discovery rate (FDR), exhibiting even lower TPR in simulation scenario II. When πnoniπinon\pi _{i}^{non} was high, CSRI and MHM showed more advantages over the other methods in terms of parameter recovery. However, when πnoniπinon\pi _{i}^{non} was high and dRTdRT{{d}_{RT}} was small, MHM generally had higher RMSE than CSRI. Compared to simulation scenario I, MHM performed worse in simulation scenario II. The only problem CSRI needed to deal with was its overestimation of time discrimination parameter across all the conditions except for when ππ\pi =40% and dRTdRT{{d}_{RT}} was large. In a real data example, all the methods were applied to a dataset collected for program assessment and accountability purposes from undergraduates at a mid-sized southeastern university in USA. Evidences from convergence validity showed that CSRI and MHM might detect non-effortful responses more accurately and obtain more precise parameter estimates for this data. In conclusion, CSRI generally performed better than the other methods across all the conditions. It is highly recommended to use this method in practice because: (1) It showed acceptable FPR and fairly accurate parameter estimates even when all responses were effortful; (2) It was free of strong assumptions, which meant that it would be robust under various situations; (3) It showed most advantages when πnoniπinon\pi _{i}^{non} was high in terms of the detection of non-effortful responses and the improvement of the parameter estimation. In order to improve the estimation of time discrimination parameter in CSRI, the robust estimation methods that down-weight flagged response patterns can be used as an alternative to directly removing non-effortful responses (i.e., the method in the current study). MHM can perform well when all its assumptions are met and πnoniπinon\pi _{i}^{non} is high, dRTdRT{{d}_{RT}} is large. However, some parameters have difficulty in convergence under MHM, which will limit its application in practice.

Hits 221 Downloads 112 Comment
6. ChinaXiv:202303.08363
Download

认知诊断评估中缺失数据的处理：随机森林阈值插补法

Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

游晓锋杨建芹秦春影刘红云

Abstract： As a new form of test, cognitive diagnostic assessment has attracted wide attention from researchers at home and abroad. At the same time, missing data caused by characteristics of the test design is a rather common issue encountered in cognitive diagnostic tests. It is therefore very important to develop an effective solution for dealing with missing data in cognitive diagnostic assessment ensuring that diagnosis feedback provided to both students and teachers is more accurate and reliable. As a matter of fact, machine learning has been applied to impute missing data in recent years. As one of the machine learning algorithms, the random forest has been proved to be a state-of-the-art learner because it exhibits good performance when handling classification and regression tasks with effectiveness and efficiency, and is capable of solving multi-class classification problems in an efficient manner. Interestingly, this algorithm has a distinct advantage in terms of coping with noise interference. Furthermore, the random forest imputation method, an improved algorithm for dealing with missing data based on the random forest algorithm, makes full use of the available response information and characteristics of response patterns of participants to impute missing data instead of assuming the mechanism of missingness in advance. By combining the advantages of the random forest method in classification and prediction and the assumption-free feature of the random forest imputation method, we attempt to improve the existing random forest imputation algorithm so that the method can be properly applied to handle missing data in cognitive diagnostic assessment. On the basis of the DINA (Deterministic Inputs, Noise "And" Gate) model, widely used in cognitive diagnostic assessment, we introduce the RCI (Response Conformity Index) into missing data imputation to identify threshold of imputation type and hence proposes a new method for handling missing responses in the DINA model: random forest threshold imputation (RFTI) approach. Two simulation studies have been conducted in order to validate the effectiveness of RFTI. In addition, the advantages of the new method have been explored by comparing it with traditional techniques for handling missing data. First, the theoretical basis and algorithm implementation of RFTI were described in detail. Then, two Monte Carlo simulations were employed to validate the effectiveness of RFTI in terms of imputation rate and accuracy as well as the accuracy in DINA model parameter estimation. Moreover, the applicability of RFTI was investigated by considering different mechanisms for missingness (MNAR, MIXED, MAR and MCAR) and different proportions of missing values (10%, 20% 30%, 40% and 50%). The main results indicated: (1) imputation accuracy of RFT was significantly higher than that of the random forest imputation (RFTI) methods, and the data missingness rate treated by RFTI was about 10% under all conditions; (2) the highest attribute pattern match ratio and attribute marginal match ratio of participants were observed using RFTI under all conditions as compared to that of EM algorithm and RFI. Moreover, this behavior depended on the proportion and mechanisms of missing data. Results indicated that this phenomenon became more obvious when the missingness mechanism was MNAR and MIXED and the proportion of missing responses were more than 30%. However, the new algorithm failed to show superiority in estimating DINA model parameter. Based on these results, we conclude the article with an overall summary and recommendations, as well as the further direction.

Hits 194 Downloads 94 Comment
7. ChinaXiv:202301.00220
Download

Model Construction and Sample Size Planning for Mixed-Effects Location-Scale Models

Subjects: Psychology >> Statistics in Psychology submitted time 2023-01-31

LIU Yue FANG Fan LIU Hongyun LEI Yi

Abstract： With the advancement of research depth in psychology and the development of data collection technics, interest in Mixed-Effects Location-Scale Models (MELSM) has increased drastically. When residual variances are heterogeneous, these models are able to add predictors in different levels, then help explore the relationship among traits and simultaneously investigate the inter- and intra-individual variability, as well as their explanatory variables. This study includes both simulated studies and empirical studies. In detail, the main contents of this project are: 1) Comparing and selecting candidate models based on Bayesian fit indices to construct MELSM; 2) Planning sample size according to both power analysis and accuracy in parameter estimation analysis for MELSM; 3) Extending the sample size planning method for MELSM to better frame the considerations of uncertainty; 4) Developing an R package for MELSM and illustrating the application of MELSM in empirical psychological studies. Based on the study, we hope these statistical models can be widely implemented. Moreover, the reproducibility and replicability of psychological studies will be enhanced finally.

Peer Review Status:Awaiting Review

Hits 975 Downloads 319 Comment
8. ChinaXiv:202112.00037
Download

问题解决测验中过程数据的特征抽取与能力评估

Subjects: Psychology >> Psychological Measurement submitted time 2021-12-04

韩雨婷肖悦刘红云

Abstract： Computer-based problem-solving tests can record respondents’ response processes in real time as they explore tasks and solve problems and save them as process data. We first introduce the analysis process of process data and then present a detailed description of the new advances in feature extraction methods and capability evaluation modeling commonly used for process data analysis with respect to the problem-solving test. Future research should pay attention to improving the interpretability of analysis results, incorporating more information in feature extraction, enabling capability evaluation modeling in more complex problem scenarios, focusing on the practicality of the methods, and integrating and drawing on analytical methods from different fields.

Peer Review Status:Awaiting Review

Hits 3264 Downloads 753 Comment
9. ChinaXiv:202111.00048
Download

用于处理不努力作答的标准化残差系列方法和混合多层模型法的比较

Subjects: Psychology >> Statistics in Psychology submitted time 2021-11-29

刘玥刘红云

Abstract： Assessment datasets contaminated by non-effortful responses may lead to serious consequences if not handled appropriately. Previous research has proposed two different strategies: down-weighting and accommodating. Down-weighting tries to limit the influence of aberrant responses on parameter estimation by reducing their weight. The extreme form of down-weighting is the detection and removal of irregular responses and response times (RTs). The standard residual-based methods, including the recently developed residual method using an iterative purification process, can be used to detect non-effortful responses in the framework of down-weighting. In accommodating, on the other hand, one tries to extend a model in order to account for the contaminations directly. This boils down to a mixture hierarchical model (MHM) for responses and RTs. However, to the authors’ knowledge, few studies have compared standard residual methods and MHM under different simulation conditions. It is unknown which method should be applied in different situations. Meanwhile, MHM has strong assumptions for different types of responses. It would be valuable to examine the performance of the method when the assumptions are violated. The purpose of this study is to compare standard residual methods and MHM under a fully crossed simulation design. In addition, specific recommendations for their applications are provided. The simulation study included two scenarios. In simulation scenario I, data were generated under the assumptions of MHM. In simulation scenario II, the assumptions of MHM concerning non-effortful responses and RTs were both violated. Simulation scenario I had three manipulated factors. (1) Non-effort prevalence (π), which was the proportion of individuals with non-effortful responses. It had three levels: 0%, 20% and 40%. (2) Non-effort severity (π_i^non), which was the proportion of non-effortful responses for each non-effortful individual. It varied between two levels: low and high. When π_i^non was low, π_i^non was generated from U (0, 0.25); while when π_i^non was high, π_i^non was generated from U (0.5, 0.75), where “U” denoted a uniform distribution. (3) Difference between RTs of non-effortful and effortful responses (d_RT). The difference between RTs from two groups, d_RT, had two levels, small and large. The logarithm of RTs of non-effortful responses were generated from normal distribution N (μ,0.52), where μ=-1 when d_RT was small, μ=-2 when d_RT was large. For generating the non-effortful responses, we followed Wang, Xu and Shang (2018), with the probability of a correct response g_j setting at 0.25 for all non-effortful responses. In simulation scenario II, only the first two factors were considered. Non-effortful RTs were generated from a uniform distribution with a lower bound of exp(-5) and upper bound being the 5th percentile of RT on item j with τ=0. The probability of a correct response for non-effortful responses was dependent on the ability level of each examinee. In all the conditions, sample size was fixed at I = 2,000 and test length was fixed at J = 30. For each condition, 30 replications were generated. For effortful responses, Responses and RTs were simulated from van der Linden’s (2007) hierarchical model. Item parameters were generated with a_j~U(1,2.5), b_j~N(0,1), 〖 α〗_j~U(1.5,2.5)， β_j~U(-0.2,0.2). For simulees, the person parameters (θ_i,τ_i) were generated from a bivariate normal distribution with the mean vector of μ=(0,0)'and the covariance matrix of Σ=[■(1&0.25@0.25&0.25)]. Four methods were compared under each condition: the original standard residual method (OSR), conditional estimate standard residual (CSR), conditional estimate with fixed item parameters standard residual method using iterative purifying procedure (CSRI), and MHM. These methods were implemented in R and JAGS using a Bayesian MCMC sampling method for parameter calibration. Finally, these methods were evaluated in terms of convergence rate, detection accuracy and parameter recovery. The results are presented as following. First of all, MHM suffered from convergence issues, especially for the latent variable indicating non-effortful responses. On the contrary, all the standard residual methods achieved convergence successfully. The convergence issues were more serious in simulation scenario II. Secondly, when all the items were assumed to have effortful responses, the false positive rate (FPR) of MHM was 0. Although the standard residual methods had FPR around 5% (the nominal level), the accuracy of parameter estimates was similar for all these methods. Third, when data were contaminated by non-effortful responses, CSRI had higher true positive rate (TPR) almost in all the conditions. MHM showed lower TPR but lower false discovery rate (FDR), exhibiting even lower TPR in simulation scenario II. When π_i^non was high, CSRI and MHM showed more advantages over the other methods in terms of parameter recovery. However, when π_i^non was high and d_RT was small, MHM generally had higher RMSE than CSRI. Compared to simulation scenario I, MHM performed worse in simulation scenario II. The only problem CSRI needed to deal with was its overestimation of time discrimination parameter across all the conditions except for when π=40% and d_RT was large. In a real data example, all the methods were applied to a dataset collected for program assessment and accountability purposes from undergraduates at a mid-sized southeastern university in USA. Evidences from convergence validity showed that CSRI and MHM might detect non-effortful responses more accurately and obtain more precise parameter estimates for this data. In conclusion, CSRI generally performed better than the other methods across all the conditions. It is highly recommended to use this method in practice because: (1) It showed acceptable FPR and fairly accurate parameter estimates even when all responses were effortful; (2) It was free of strong assumptions, which meant that it would be robust under various situations; (3) It showed most advantages when π_i^non was high in terms of the detection of non-effortful responses and the improvement of the parameter estimation. In order to improve the estimation of time discrimination parameter in CSRI, the robust estimation methods that down-weight flagged response patterns can be used as an alternative to directly removing non-effortful responses (i.e., the method in the current study). MHM can perform well when all its assumptions are met and π_i^non is high, d_RT is large. However, some parameters have difficulty in convergence under MHM, which will limit its application in practice.

Peer Review Status:Awaiting Review

Hits 2963 Downloads 715 Comment
10. ChinaXiv:202111.00004
Download

Disparagement humor: Could laughter dissolve hostility?

Subjects: Psychology >> Social Psychology submitted time 2021-11-04

李龙骄王芳

Abstract： Disparagement humor refers to communication that contains denigration but elicits amusement. Relief theory, superiority theory, incongruity-resolution theory, and benign violation theory attempt to explain the psychological mechanism. Humor does not always arise from disparagement. The humorous effect is influenced by the group identity and attitude of the receiver, the psychological distance between the receiver and the target of disparagement, and the receiver’s personality and cultural background. Disparagement humor could contribute to the release of prejudice and the legitimation of social dominance orientation, but has inconsistent effect on interpersonal relations. The proposed Integrative Process Model of Disparagement Humor describes the mechanisms, precursors, and consequences of disparagement humor in tandem and could serve as a scaffold for future research. Future research should also devote more attention to the negative social impacts of disparagement humor and the corresponding interventions, the potential positive effects of disparagement humor on intergroup relations and social equity, as well as the disparagement humor emerging from Chinese socio-cultural background. " " "

Peer Review Status:Awaiting Review

Hits 1538 Downloads 675 Comment
11. ChinaXiv:202105.00043
Download

心理与教育测验中异常作答处理的新技术：混合模型方法

Subjects: Psychology >> Psychological Measurement submitted time 2021-05-08

刘玥刘红云

Abstract： The mixture model method (MMM) is a new method proposed to handle data contaminated by aberrant responses in psychological and educational measurement. Compared to the traditional response time threshold methods and the response time residual methods, MMM shows the following advantages: (1) MMM detects aberrant responses and obtaining parameter estimates simultaneously; (2) it precisely recovers the severity of aberrant responding. Through building different item response models and response time models for different latent groups, MMM helps to identify aberrant responses from normal responses. Future researches could investigate the performance of MMM when its assumptions are violated or using data with other types of aberrant response patterns. The computation efficiency of MMM is also likely to be improved by fixing part of the item parameter estimates or by using an optimal way of choosing suitable methods.

Peer Review Status:Awaiting Review

Hits 4046 Downloads 1119 Comment