search for




 

베이지안 누적 프로빗 선형 혼합모형을 이용한 사업체 패널조사데이터 분석
Workplace panel survey data analysis using Bayesian cumulative probit linear mixed model
Korean J Appl Stat 2024;37(6):783-799
Published online December 31, 2024
© 2024 The Korean Statistical Society.

권민지a, 이근백1, a
Minji Kwona, Keunbaik Lee1, a

a성균관대학교 통계학과

aDepartment of Statistics, Sungkyunkwan University
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2022R1A2C1002752, RS-2024-00416117). This paper was prepared by extracting part of Minji Kwon’s thesis.
1Corresponding author: Department of Statistics, Sungkyunkwan University, 25-2 Sungkyunkwan-ro, Jongno-gu, Seoul 03063, Korea. E-mail: keunbaik@skku.edu
Received March 16, 2024; Revised May 16, 2024; Accepted June 11, 2024.
Abstract
경시적 자료는 같은 개체에서 시간에 따라 반복 측정된 자료이다. 따라서 반복 측정된 자료는 상관관계가 존재하며 이것을 설명하면서 공변량의 반응변수의 효과를 추정해야 한다. 경시적 순서형 자료분석에서는 잠재변수의 조건부 누적확률을 로짓 연결함수 또는 프로빗 연결함수를 이용한 선형혼합 모형을 이용하여 공변량의 효과를 추정한다. 본 논문에서는 경시적 순서형 자료분석을 위한 두 가지 형태의 연결함수를 가지는 일반화선형혼합모형 및 주변화모형을 고찰한다. 그리고 최근에 제안된 프로빗 연결함수를 가지는 베이지안 누적 프로빗 선형혼합모형을 이용하여 경시적 순서형자료인 사업체 패널조사자료를 분석한다. 이 모형은 잠재변수의 조건부 상관계수 행렬의 모형화에 초구분해를 고려하여 고차원이며 양정치성을 만족하는 상관계수를 추정하는 방법이다. 사업체 패널 조사자료는 반응변수로 순서형 자료인 사업체의 교육훈련참여율을 고려하였고, 상관계수 행렬은 자기상관구조를 가정한 여러 모형을 비교하고 가장 적합한 모형을 제시한다. 그리고 그 모형을 이용하여 연도별 효과와 성과배분제도 실시여부, 1인당 연평균 교육시간, 노동조합여부가 유의미한 것을 찾았다.
Longitudinal data are measured repeatedly over time from the same subject. Therefore, the repeated outcomes have correlations, and it is necessary to estimate the covariate e ect on the response variable while explaining the correlations. In longitudinal ordinal data analysis, the covariate e ect is estimated using generalized linear mixed models using a logit link function or a probit link function. In this paper, we review the generalized linear mixed models and marginalized models with the two types of link functions for longitudinal ordinal data analysis. Specifically, a Bayesian cumulative probit linear mixed model with the probit link function is used to analyze Korean workplace panel survey (WPS) data, which is longitudinal ordinal data. In the model, the correlation matrix is high-dimensional and positive definite, and it is estimated using the hypersphere decomposition. In the WPS data, corporate training participation rate is considered as a response variable. Assuming di erent correlation structures, several models are compared. For the most suitable model, some explanatory variables, the annual e ect, profit sharing schemes status, average annual training hours per person, and labor union status, have e ects on corporate training participation rate.
주요어 : 경시적 순서형 자료분석, 마코프체인 몬테칼로, 분산-상관계수행렬, 초구분해, t-분포
Keywords : covariance-correlation matrix, hypersphere decomposition, longitudinal ordinal data, Markov chain Monte Carlo, t-distribution
References
  1. Agresti A (2013). Categorical Data Analysis (3rd ed), Wiley, New York.
  2. Anderson JA and Pemberton J (1985). The grouped continuous model for multivariate ordered categorical variables and covariate adjustment, Biometrics, 41, 875-885.
    Pubmed CrossRef
  3. Cowles M (1996). Accelerating monte carlo markov chain convergence for cumulative link generalized linear models, Statistics and Computing, 6, 101-111.
    CrossRef
  4. Green CP and Heywood JS (2011). Profit sharing, separation and training, British Journal of Industrial Relations, 49 , 623-642.
    CrossRef
  5. Heagerty PJ and Kurland BF (2001). Misspecified maximum likelihood estimates and generalised linear mixed models, Biometrika, 88, 973-985.
    CrossRef
  6. HedekerD and Mermelstein RJ (1998). A multilevel thresholds of change model for analysis of stages of change data, Multivariate Behavioral Research, 33, 427-455.
    CrossRef
  7. Kim D-B and Lee I (2018). Profit sharing and firm training, Korean Journal of Human Resource Development, 21, 119-141.
    CrossRef
  8. Kim J, Sohn I, and Lee K (2017). Bayesian modeling of random effects precision/- covariance matrix in cumulative logit random effects models, Communications for Statistical Applications and Methods, 24, 81-96.
    CrossRef
  9. Kwon M (2023). WPS data analysis using Bayesian cumulative probit linear mixed model (Sunkyunkwan University MS thesis), Sunkyunkwan University, Seoul.
  10. KimMand Noh Y (2013). Effect of introduction of 40-hour workweek on the training participation rate of incumbent workers, In Proceeding of the 7thWorkplace Panel Survey Conference,
  11. Lee K, Cho H, Kwak M-S, and Jang EJ (2020). Estimation of covariance matrix of multivariate longitudinal data using modified choleksky and hypersphere decompositions, Biometrics , 76, 75-86.
    Pubmed CrossRef
  12. Lee K and Daniels MJ (2008). Marginalized models for longitudinal ordinal data with application to quality of life studies, Statistics in Medicine, 27, 4359-4380.
    Pubmed KoreaMed CrossRef
  13. Lee K-J, Chen R-B, and Lee K (2024). Robust bayesian cumulative probit linear mixed models for Longitudinal ordinal data, Computational Statistics, Resubmitted.
    CrossRef
  14. Pinheiro JC and Bates DM (1996). Unconstrained parametrizations for variancecovariance matrices, Statistics and Computing, 6, 289-296.
    CrossRef
  15. Pourahmadi M (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation, Biometrika , 86, 677-690.
    CrossRef
  16. Varin C and Czado C (2010). A mixed autoregressive probit model for ordinal longitudinal data, Biostatistics, 11, 127-138.
    Pubmed CrossRef
  17. Zhang W, Leng C, and Tang CY (2015). A joint modelling approach for longitudinal studies, Journal of Royal Statistical Society, Series B, 77, 219-238.
    CrossRef
  18. Yun D and Lee K (2020). Comparison between AR and ARMA covariance matrices for multivariate longitudinal data, Journal of the Korean Data & Information Science Society, 31, 721-740.
    CrossRef


December 2024, 37 (6)