ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

Subjects

Authors

Institution

result total 8.

Hide Summary

Hits

Date

Downloads

Your conditions: 刘玥

1. ChinaXiv:202310.00025
Download

Confidence Interval Width Contours: Sample Size Planning for Linear Mixed-Effects Models

Subjects: Psychology >> Statistics in Psychology submitted time 2023-10-07

Liu, Yue Xu, Lei Liu, Hongyun Han, Yuting You, Xiaofeng Wan, Zhilin

Abstract： Hierarchical data, which is observed frequently in psychological experiments, is usually analyzed with the linear mixed-effects models (LMEMs), as it can account for multiple sources of random effects due to participants, items, and/or predictors simultaneously. However, it is still unclear of how to determine the sample size and number of trials in LMEMs. In history, sample size planning was conducted based purely on power analysis. Later, the influential article of Maxwell et al. (2008) has made clear that sample size planning should consider statistical power and accuracy in parameter estimation (AIPE) simultaneously. In this paper, we derive a confidence interval width contours plot with the codes to generate it, providing power and AIPE information simultaneously. With this plot, sample size requirements in LMEMs based on power and AIPE criteria can be decided. We also demonstrated how to run sensitivity analysis to assess the impact of the magnitude of experiment effect size and the magnitude of random slope variance on statistical power, AIPE and the results of sample size planning.
There were two sets of sensitivity analysis based on different LMEMs. Sensitivity analysis Ⅰ investigated how the experiment effect size influenced power, AIPE and the requirement of sample size for within-subject experiment design, while sensitivity analysis Ⅱ investigated the impact of random slope variance on optimal sample size based on power and AIPE analysis for the cross-level interaction effect. The results for binary and continuous between-subject variables were compared. In these sensitivity analysis, two factors regarding sample size varied: number of subjects (I=10, 30, 50, 70, 100, 200, 400, 600, 800), number of trials (J=10, 20, 30, 50, 70, 100, 150, 200, 250, 300). The additional manipulated factor was the effect size of experiment effect (standard coefficient of experiment condition= 0.2, 0.5, 0.8, in sensitivity analysis Ⅰ) and the magnitude of random slope variance (0.01, 0.09 and 0.25, in sensitivity analysis Ⅱ). A random slope model was used in sensitivity analysis Ⅰ, while a random slope model with level-2 independent variable was used in sensitivity analysis Ⅱ. Data-generating model and fitted model were the same. Estimation performance was evaluated in terms of convergence rate, power, AIPE for the fixed effect, AIPE for the standard error of the fixed effect, and AIPE for the random effect.
The results are as following. First, there were no convergence problems under all the conditions , except that when the variance of random slope was small and a maximal model was used to fit the data. Second, power increased as sample size, number of trials or effect size increased. However, the number of trials played a key role for the power of within-subject effect, while sample size was more important for the power of cross-level effect. Power was larger for continuous between-subject variable than for binary between-subject variable. Third, although the fixed effect was accurately estimated under all the simulation conditions, the width 95% confidence interval (95%width) was extremely large under some conditions. Lastly, AIPE for the random effect increased as sample size and/or number of trials increased. The variance of residual was estimated accurately. As the variance of random slope increased, the accuracy of the estimates of variances of random intercept decreased, and the accuracy of the estimates of random slope increased.
In conclusion, if sample size planning was conducted solely based on power analysis, the chosen sample size might not be large enough to obtain accurate estimates of effects size. Therefore, the rational for considering statistical power and AIPE during sample size planning was adopted. To shed light on this issue, this article provided a standard procedure based on a confidence interval width contours plot to recommend sample size and number of trials for using LMEMs. This plot visualizes the combined effect of sample size and number of trials per participant on 95% width, power and AIPE for random effects. Based on this tool and other empirical considerations, practitioners can make informed choices about how many participants to test, and how many trials to test each one for.

YES

Hits 1448 Downloads 216 Comment
2. ChinaXiv:202303.09604
Download

心理与教育测验中异常作答处理的新技术: 混合模型方法

Subjects: Psychology >> Social Psychology submitted time 2023-03-28 Cooperative journals: 《心理科学进展》

刘玥刘红云

Abstract： Aberrant responses have been repeatedly reported in psychological and educational measurement. If traditional measurement models or methods (e.g., item response theory, IRT) are applied to data sets contaminated by aberrant responses, parameter estimates may be biased. Therefore, it is necessary to identify aberrant responses and to reduce their detrimental effects. In the literature, there are two traditional response time (RT)-based methods to detect aberrant responses: RT threshold method and RT residual method. The focus of these methods is to find a threshold of RT or RT residual. If a RT or RT residual is remarkably less than the threshold, this response should be regarded as an aberrant response with extremely short RT (e.g., speededness, rapid-guessing), and consequently does not provide information about the test taker’s latent trait. Afterwards, down-weighting strategy, which tries to limit the influence of aberrant responses on parameter estimation by reducing their weight in the sample, can be applied. The mixture model method (MMM), is a new method proposed to handle data contaminated by aberrant responses. This method applies the accommodating strategy, which is to extend a model in order to account for the contaminations directly. MMM shows more advantages in terms of: (1) detecting aberrant responses and obtaining parameter estimates simultaneously, instead of two steps (detecting and down-weighting); (2) precisely recovering the severity of aberrant responding. There are two categories of MMM. The first category of methods assumes that the classification (i.e., whether the item is answered normally or aberrantly) can be predicted by RT. While the second category is a natural extension of van der Linden’s (2007) hierarchical model, which models responses and RTs jointly. In this method, the observed RT, as well as the correct response probability of each item-by-person encounter can be decomposed to RT (or probability) caused by normal response and that caused by aberrant response according to the most important difference between the two distinct behaviors. This method leads to more precisely estimated item and person parameters, as well as excellent classification of aberrant/normal behavior. First, this article compares the basic logic of the two traditional RT-based methods and MMM. Aberrant responses are regarded as outliers in both RT threshold method and RT residual method. Therefore, they rely heavily on the severity of aberrance. If data set is contaminated by aberrant responses seriously, the observed RT (or RT residual) distribution will be different from the expected distribution, which in turn leads to low power and sometimes high false detection rate. On the other hand, MMM, which assumes that both observed RT and correct response probability follow a mixture distribution, treats aberrant and normal responses equally. In that way, it has little reliance on the severity of aberrance. In addition to that, MMM can apply to the situation when all the respondents actually respond regularly in theoretic. In that situation, all the responses are assumed to be classified into one category. Second, this article summarizes the disadvantages of the three methods. MMM has three primary limitations: (1) it usually relies heavily on strong assumptions, which means that it may not perform well if these assumptions are violated; (2) low proportion of aberrant response may lead to convergence problem and model identification problem; (3) it is quite complex and time-consuming. In all, practitioners should choose a proper method according to the characteristics of tests and categories of aberrant responses (e.g., rapid-guessing, item with preknowledge, cheating). In the end, this article suggests future researches may investigate the performance of MMM when its assumptions are violated or data consists of more types of aberrant response patterns. Fixing item parameter estimates, proposing some index to help choosing suitable methods, are encouraged to improve the efficiency of MMM.

Hits 236 Downloads 120 Comment
3. ChinaXiv:202303.09802
Download

混合效应均值-方差模型的建构和样本量规划探索

Subjects: Psychology >> Social Psychology submitted time 2023-03-28 Cooperative journals: 《心理科学进展》

刘玥方梵刘红云雷怡

Abstract： With the development of data-collection technics and increasing complexity of study designs, nested data widely exists in psychological research. Linear mixed-effects models, unfortunately with an unreasonable hypothesis that the residual variances are homogenous, are generally used in nested data analysis. Meanwhile, Mixed-Effects Location-Scale Models (MELSM) has become more and more popular, because they can handle heterogenous residual variances and are able to add predictors for the two substructures (i.e., mean structure denoted as location model and variance structure denoted as scale model) in different levels. MELSM can avoid estimation bias due to inappropriate assumptions of homogenous variance and explore the relationship among traits and simultaneously investigate the inter- and intra-individual variability, as well as their explanatory variables. This study, aims at developing the methods of model construction and sample size planning for MELSM, using simulated studies and empirical studies. In detail, the main contents of this project are as follows. Study 1 focuses on comparing and selecting candidate models based on Bayesian fit indices to construct MELSM, taking into consideration the estimated method for complicated models. We propose that model selection for location model and scale model can be completed sequentially. Study 2 explores the method of sample size planning for MELSM, according to both power analysis (based on Monte Carlo simulation) and the accuracy in parameter estimation analysis (based on the credible interval of the posterior distribution). Adequate sample size is required for both the power and the accuracy in parameter estimation. Study 3 extends the sample size planning method for MELSM to better frame the considerations of uncertainty. By specifying the prior distribution of effect sizes, repeating sampling and selecting model based on the robust Bayesian fit index suggested by Study 1, three main sources of uncertainty can be well controlled: the uncertainty due to unknown population effect size, sampling variability and model approximation. With the simulated study results, we are able to provide reliable Bayesian fit indices for MELSM construction, and summary the process of sample size planning for MELSM in both determinate and uncertain situations. Moreover, Study 4 illustrates the application of MELSM in two empirical psychological studies and verifies the operability of the conclusions of the simulated studies in practice. The unique contribution of this paper is to further promote the methods of model construction and sample size planning for MELSM, as well as provide methodological foundation for researchers. In addition, we plan to integrate the functions above to develop a user-friendly R package for MELSM and provide a basis for promotion and application of MELSM, which help researchers make sample size planning, model construction and parameter estimation for MELSM easily, according to their specification. If these statistical models are widely implemented, the reproducibility and replicability of psychological studies will be enhanced finally.

Hits 225 Downloads 118 Comment
4. ChinaXiv:202303.08308
Download

用于处理不努力作答的标准化残差系列方法和混合多层模型法的比较

Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

刘玥刘红云游晓锋杨建芹

Abstract： Assessment datasets contaminated by non-effortful responses may lead to serious consequences if not handled appropriately. Previous research has proposed two different strategies: down-weighting and accommodating. Down-weighting tries to limit the influence of aberrant responses on parameter estimation by reducing their weight. The extreme form of down-weighting is the detection and removal of irregular responses and response times (RTs). The standard residual-based methods, including the recently developed residual method using an iterative purification process, can be used to detect non-effortful responses in the framework of down-weighting. In accommodating, on the other hand, one tries to extend a model in order to account for the contaminations directly. This boils down to a mixture hierarchical model (MHM) for responses and RTs. However, to the authors’ knowledge, few studies have compared standard residual methods and MHM under different simulation conditions. It is unknown which method should be applied in different situations. Meanwhile, MHM has strong assumptions for different types of responses. It would be valuable to examine the performance of the method when the assumptions are violated. The purpose of this study is to compare standard residual methods and MHM under a fully crossed simulation design. In addition, specific recommendations for their applications are provided. The simulation study included two scenarios. In simulation scenario I, data were generated under the assumptions of MHM. In simulation scenario II, the assumptions of MHM concerning non-effortful responses and RTs were both violated. Simulation scenario I had three manipulated factors. (1) Non-effort prevalence (ππ\pi ), which was the proportion of individuals with non-effortful responses. It had three levels: 0%, 20% and 40%. (2) Non-effort severity (πnoniπinon\pi _{i}^{non}), which was the proportion of non-effortful responses for each non-effortful individual. It varied between two levels: low and high. When πnoniπinon\pi _{i}^{non} was low, πnoniπinon\pi _{i}^{non} was generated from U (0, 0.25); while when πnoniπinon\pi _{i}^{non} was high, πnoniπinon\pi _{i}^{non} was generated from U (0.5, 0.75), where “U” denoted a uniform distribution. (3) Difference between RTs of non-effortful and effortful responses (dRTdRT{{d}_{RT}}). The difference between RTs from two groups, dRTdRT{{d}_{RT}}, had two levels, small and large. The logarithm of RTs of non-effortful responses were generated from normal distribution N (μμ\mu ,0.50.50.52), where μ =−1 μ =−1\text{ }\!\!\mu\!\!\text{ }=-1 when dRTdRT{{d}_{RT}} was small, μ =−2 μ =−2\text{ }\!\!\mu\!\!\text{ }=-2 when dRTdRT{{d}_{RT}} was large. For generating the non-effortful responses, we followed Wang, Xu and Shang (2018), with the probability of a correct response gjgj{{g}_{j}} setting at 0.25 for all non-effortful responses. In simulation scenario II, only the first two factors were considered. Non-effortful RTs were generated from a uniform distribution with a lower bound of exp(−5)exp(−5)\text{exp}\left( -5 \right) and upper bound being the 5th percentile of RT on item j with τ=0τ=0\tau =0. The probability of a correct response for non-effortful responses was dependent on the ability level of each examinee. In all the conditions, sample size was fixed at I = 2,000 and test length was fixed at J = 30. For each condition, 30 replications were generated. For effortful responses, Responses and RTs were simulated from van der Linden’s (2007) hierarchical model. Item parameters were generated with aj ~U(1,2.5)aj ~U(1,2.5){{a}_{j}}\tilde{\ }U\left( 1,2.5 \right), bj ~N(0,1)bj ~N(0,1){{b}_{j}}\tilde{\ }N\left( 0,1 \right), αj ~U(1.5,2.5),βj ~U(−0.2,0.2) αj ~U(1.5,2.5),βj ~U(−0.2,0.2)~{{\alpha }_{j}}\tilde{\ }U\left( 1.5,2.5 \right),{{\beta }_{j}}\tilde{\ }U\left( -0.2,0.2 \right). For simulees, the person parameters (θi,τi)(θi,τi)\left( {{\theta }_{i}},{{\tau }_{i}} \right) were generated from a bivariate normal distribution with the mean vector of μ=(0,0)′μ=(0,0)′\mathbf{\mu }=\left( 0,0 \right)'and the covariance matrix of Σ=[10.250.250.25]Σ=[10.250.250.25]\mathbf{\Sigma }=\left[ \begin{matrix} 1 & 0.25 \\ 0.25 & 0.25 \\ \end{matrix} \right]. Four methods were compared under each condition: the original standard residual method (OSR), conditional estimate standard residual (CSR), conditional estimate with fixed item parameters standard residual method using iterative purifying procedure (CSRI), and MHM. These methods were implemented in R and JAGS using a Bayesian MCMC sampling method for parameter calibration. Finally, these methods were evaluated in terms of convergence rate, detection accuracy and parameter recovery. The results are presented as following. First of all, MHM suffered from convergence issues, especially for the latent variable indicating non-effortful responses. On the contrary, all the standard residual methods achieved convergence successfully. The convergence issues were more serious in simulation scenario II. Secondly, when all the items were assumed to have effortful responses, the false positive rate (FPR) of MHM was 0. Although the standard residual methods had FPR around 5% (the nominal level), the accuracy of parameter estimates was similar for all these methods. Third, when data were contaminated by non-effortful responses, CSRI had higher true positive rate (TPR) almost in all the conditions. MHM showed lower TPR but lower false discovery rate (FDR), exhibiting even lower TPR in simulation scenario II. When πnoniπinon\pi _{i}^{non} was high, CSRI and MHM showed more advantages over the other methods in terms of parameter recovery. However, when πnoniπinon\pi _{i}^{non} was high and dRTdRT{{d}_{RT}} was small, MHM generally had higher RMSE than CSRI. Compared to simulation scenario I, MHM performed worse in simulation scenario II. The only problem CSRI needed to deal with was its overestimation of time discrimination parameter across all the conditions except for when ππ\pi =40% and dRTdRT{{d}_{RT}} was large. In a real data example, all the methods were applied to a dataset collected for program assessment and accountability purposes from undergraduates at a mid-sized southeastern university in USA. Evidences from convergence validity showed that CSRI and MHM might detect non-effortful responses more accurately and obtain more precise parameter estimates for this data. In conclusion, CSRI generally performed better than the other methods across all the conditions. It is highly recommended to use this method in practice because: (1) It showed acceptable FPR and fairly accurate parameter estimates even when all responses were effortful; (2) It was free of strong assumptions, which meant that it would be robust under various situations; (3) It showed most advantages when πnoniπinon\pi _{i}^{non} was high in terms of the detection of non-effortful responses and the improvement of the parameter estimation. In order to improve the estimation of time discrimination parameter in CSRI, the robust estimation methods that down-weight flagged response patterns can be used as an alternative to directly removing non-effortful responses (i.e., the method in the current study). MHM can perform well when all its assumptions are met and πnoniπinon\pi _{i}^{non} is high, dRTdRT{{d}_{RT}} is large. However, some parameters have difficulty in convergence under MHM, which will limit its application in practice.

Hits 235 Downloads 123 Comment
5. ChinaXiv:202301.00220
Download

Model Construction and Sample Size Planning for Mixed-Effects Location-Scale Models

Subjects: Psychology >> Statistics in Psychology submitted time 2023-01-31

LIU Yue FANG Fan LIU Hongyun LEI Yi

Abstract： With the advancement of research depth in psychology and the development of data collection technics, interest in Mixed-Effects Location-Scale Models (MELSM) has increased drastically. When residual variances are heterogeneous, these models are able to add predictors in different levels, then help explore the relationship among traits and simultaneously investigate the inter- and intra-individual variability, as well as their explanatory variables. This study includes both simulated studies and empirical studies. In detail, the main contents of this project are: 1) Comparing and selecting candidate models based on Bayesian fit indices to construct MELSM; 2) Planning sample size according to both power analysis and accuracy in parameter estimation analysis for MELSM; 3) Extending the sample size planning method for MELSM to better frame the considerations of uncertainty; 4) Developing an R package for MELSM and illustrating the application of MELSM in empirical psychological studies. Based on the study, we hope these statistical models can be widely implemented. Moreover, the reproducibility and replicability of psychological studies will be enhanced finally.

Peer Review Status:Awaiting Review

Hits 1012 Downloads 339 Comment
6. ChinaXiv:202111.00048
Download

用于处理不努力作答的标准化残差系列方法和混合多层模型法的比较

Subjects: Psychology >> Statistics in Psychology submitted time 2021-11-29

刘玥刘红云

Abstract： Assessment datasets contaminated by non-effortful responses may lead to serious consequences if not handled appropriately. Previous research has proposed two different strategies: down-weighting and accommodating. Down-weighting tries to limit the influence of aberrant responses on parameter estimation by reducing their weight. The extreme form of down-weighting is the detection and removal of irregular responses and response times (RTs). The standard residual-based methods, including the recently developed residual method using an iterative purification process, can be used to detect non-effortful responses in the framework of down-weighting. In accommodating, on the other hand, one tries to extend a model in order to account for the contaminations directly. This boils down to a mixture hierarchical model (MHM) for responses and RTs. However, to the authors’ knowledge, few studies have compared standard residual methods and MHM under different simulation conditions. It is unknown which method should be applied in different situations. Meanwhile, MHM has strong assumptions for different types of responses. It would be valuable to examine the performance of the method when the assumptions are violated. The purpose of this study is to compare standard residual methods and MHM under a fully crossed simulation design. In addition, specific recommendations for their applications are provided. The simulation study included two scenarios. In simulation scenario I, data were generated under the assumptions of MHM. In simulation scenario II, the assumptions of MHM concerning non-effortful responses and RTs were both violated. Simulation scenario I had three manipulated factors. (1) Non-effort prevalence (π), which was the proportion of individuals with non-effortful responses. It had three levels: 0%, 20% and 40%. (2) Non-effort severity (π_i^non), which was the proportion of non-effortful responses for each non-effortful individual. It varied between two levels: low and high. When π_i^non was low, π_i^non was generated from U (0, 0.25); while when π_i^non was high, π_i^non was generated from U (0.5, 0.75), where “U” denoted a uniform distribution. (3) Difference between RTs of non-effortful and effortful responses (d_RT). The difference between RTs from two groups, d_RT, had two levels, small and large. The logarithm of RTs of non-effortful responses were generated from normal distribution N (μ,0.52), where μ=-1 when d_RT was small, μ=-2 when d_RT was large. For generating the non-effortful responses, we followed Wang, Xu and Shang (2018), with the probability of a correct response g_j setting at 0.25 for all non-effortful responses. In simulation scenario II, only the first two factors were considered. Non-effortful RTs were generated from a uniform distribution with a lower bound of exp(-5) and upper bound being the 5th percentile of RT on item j with τ=0. The probability of a correct response for non-effortful responses was dependent on the ability level of each examinee. In all the conditions, sample size was fixed at I = 2,000 and test length was fixed at J = 30. For each condition, 30 replications were generated. For effortful responses, Responses and RTs were simulated from van der Linden’s (2007) hierarchical model. Item parameters were generated with a_j~U(1,2.5), b_j~N(0,1), 〖 α〗_j~U(1.5,2.5)， β_j~U(-0.2,0.2). For simulees, the person parameters (θ_i,τ_i) were generated from a bivariate normal distribution with the mean vector of μ=(0,0)'and the covariance matrix of Σ=[■(1&0.25@0.25&0.25)]. Four methods were compared under each condition: the original standard residual method (OSR), conditional estimate standard residual (CSR), conditional estimate with fixed item parameters standard residual method using iterative purifying procedure (CSRI), and MHM. These methods were implemented in R and JAGS using a Bayesian MCMC sampling method for parameter calibration. Finally, these methods were evaluated in terms of convergence rate, detection accuracy and parameter recovery. The results are presented as following. First of all, MHM suffered from convergence issues, especially for the latent variable indicating non-effortful responses. On the contrary, all the standard residual methods achieved convergence successfully. The convergence issues were more serious in simulation scenario II. Secondly, when all the items were assumed to have effortful responses, the false positive rate (FPR) of MHM was 0. Although the standard residual methods had FPR around 5% (the nominal level), the accuracy of parameter estimates was similar for all these methods. Third, when data were contaminated by non-effortful responses, CSRI had higher true positive rate (TPR) almost in all the conditions. MHM showed lower TPR but lower false discovery rate (FDR), exhibiting even lower TPR in simulation scenario II. When π_i^non was high, CSRI and MHM showed more advantages over the other methods in terms of parameter recovery. However, when π_i^non was high and d_RT was small, MHM generally had higher RMSE than CSRI. Compared to simulation scenario I, MHM performed worse in simulation scenario II. The only problem CSRI needed to deal with was its overestimation of time discrimination parameter across all the conditions except for when π=40% and d_RT was large. In a real data example, all the methods were applied to a dataset collected for program assessment and accountability purposes from undergraduates at a mid-sized southeastern university in USA. Evidences from convergence validity showed that CSRI and MHM might detect non-effortful responses more accurately and obtain more precise parameter estimates for this data. In conclusion, CSRI generally performed better than the other methods across all the conditions. It is highly recommended to use this method in practice because: (1) It showed acceptable FPR and fairly accurate parameter estimates even when all responses were effortful; (2) It was free of strong assumptions, which meant that it would be robust under various situations; (3) It showed most advantages when π_i^non was high in terms of the detection of non-effortful responses and the improvement of the parameter estimation. In order to improve the estimation of time discrimination parameter in CSRI, the robust estimation methods that down-weight flagged response patterns can be used as an alternative to directly removing non-effortful responses (i.e., the method in the current study). MHM can perform well when all its assumptions are met and π_i^non is high, d_RT is large. However, some parameters have difficulty in convergence under MHM, which will limit its application in practice.

Peer Review Status:Awaiting Review

Hits 2983 Downloads 730 Comment
7. ChinaXiv:202105.00043
Download

心理与教育测验中异常作答处理的新技术：混合模型方法

Subjects: Psychology >> Psychological Measurement submitted time 2021-05-08

刘玥刘红云

Abstract： The mixture model method (MMM) is a new method proposed to handle data contaminated by aberrant responses in psychological and educational measurement. Compared to the traditional response time threshold methods and the response time residual methods, MMM shows the following advantages: (1) MMM detects aberrant responses and obtaining parameter estimates simultaneously; (2) it precisely recovers the severity of aberrant responding. Through building different item response models and response time models for different latent groups, MMM helps to identify aberrant responses from normal responses. Future researches could investigate the performance of MMM when its assumptions are violated or using data with other types of aberrant response patterns. The computation efficiency of MMM is also likely to be improved by fixing part of the item parameter estimates or by using an optimal way of choosing suitable methods.

Peer Review Status:Awaiting Review

Hits 4081 Downloads 1136 Comment
8. ChinaXiv:201911.00006
Download

Analysis of the Problem-Solving Strategies in Computer-based Dynamic Assessment: the Extension and Application of Multilevel Mixture IRT Model

Subjects: Psychology >> Statistics in Psychology submitted time 2019-11-08

李美娟刘玥刘红云

Abstract： Problem-solving competence is defined as the capacity to engage in cognitive processing to understand and resolve problem scenarios where a solution is not obvious. Computer-based assessments usually provide an interactive environment in which students can solve a problem by choosing among a set of available actions and taking one or more steps to complete a task. All students’ actions are automatically recorded in system logs as coded and time-stamped strings. These strings are called process data. The process data have multi-level structures in which the actions are nested within a single individual and therefore they are logically interconnected. Recently, researches have focused on characterizing process data and analyzing the response strategies to solve the problem. This study proposed an extended MMixIRT model which incorporated the multilevel structure into a mixture IRT model. It can classify latent groups at process level that have different problem solving strategies, and estimate the students’ abilities at the student level simultaneously. This model takes the accumulated response information as the specific steps at the process level and defines a more free matrix to determine the weight information used for ability estimation at the student level. Specifically, in the standard MMixIRT model, the student-level latent variables are generally obtained from the measurement results made by the process-level response variables, while students’ final responses are used to estimate their problem-solving abilities in the extended MMixIRT model. This research applied process data recorded in one of the items (Traffic CP007Q02) of problem solving in PISA 2012. The samples were 3196 students from Canada, Hongkong-China, Shanghai-China, Singapore, and America. Based on the log file of the process record, there were 139,990 records in the final data file. It was found that (1) The model can capture different problem-solving strategies used by students at the process level, as well as provide ability estimates at the student level. (2) The model can also analyze the typical characteristics of students’ strategy in problem-solving across different countries for targeted instructional interventions. It is concluded that the extended MMixIRT model can analyze response data at process and student levels. These analyses not only play an important role in the scoring, but also provide valuable information to psychometricians and test developers, help them to better understand what distinguishes well performing students from the ones that are not, and eventually lead to better test design. "

Hits 6055 Downloads 1961 Comment