search for




 

임상에서 발생할 수 있는 문제 상황에서의 성향 점수 가중치 방법에 대한 비교 모의실험 연구
A simulation study for various propensity score weighting methods in clinical problematic situations
Korean J Appl Stat 2023;36(5):381-397
Published online October 31, 2023
© 2023 The Korean Statistical Society.

정시성a, 민은정1,a,b
Siseong Jeonga, Eun Jeong Min1,a,b

a가톨릭대학교 의생명 · 건강과학과; b가톨릭대학교 의과대학 의생명과학교실

aDepartment of Biomedicine & Health Sciences, The Catholic University of Korea;
bDepartment of Medical Life Sciences, College of Medicine, The Catholic University of Korea
1Department of Medical Life Sciences, College of Medicine, The Catholic University of Korea, Banpo-daero 222, Seocho-gu, Seoul 06591, Korea. E-mail: ej.min@catholic.ac.kr
This study was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (No. NRF-2021R1F1A1058613).
Received March 12, 2023; Revised May 4, 2023; Accepted May 8, 2023.
Abstract
대부분의 임상시험에서 가장 대표적으로 사용되는 실험설계는 무작위화로, 치료 효과를 정확하게 추정하기 위해 이용된다. 그러나 무작위화가 이루어지지 않은 관찰연구의 경우 치료군과 대조군의 비교로 얻는 치료 효과에는 환자 간의 특성 등 여러 조정되지 않은 차이로 인해 편향이 발생한다. 성향 점수 가중치는 이러한 문제점을 해결하기 위해 널리쓰이는 방법으로 치료 효과를 추정하는데에 있어 교란요인을 조정하여 편향을 최소화하도록 하는 방법이다. 성향 점수를 이용한 가중치 방법 중 가장 널리 알려진 역확률 가중치는 관찰된 공변량이 주어졌을 때 특정 치료에 할당될 조건부 확률의 역에 비례하는 가중치를 할당한다. 그러나 이 방법은 극단적인 성향 점수에 의해 종종 방해 받아 편향된 추정치와 과도한 분산을 초래한다는 점이 알려져있어 이러한 문제를 완화하기 위해 절사 역확률 가중치, 중복 가중치, 일치 가중치를 포함한 여러 가지 대안 방법이 제안되었다. 본 논문에서는 제한된 중복, 잘못 지정된 성향 점수 모델 및 예측과 반대되는 치료 등 다양한 문제 상황에서 여러 성향 점수 가중치 방법의 성능을 비교하는 시뮬레이션 비교연구를 수행하였다. 비교연구의 결과 중복 가중치와 일치 가중치는 편향, 제곱근평균제곱오차 및 포함 확률 측면에서 역확률 가중치와 절사 역확률 가중치에 비에 우월한 성능을 보임을 확인하였다.
The most representative design used in clinical trials is randomization, which is used to accurately estimate the treatment effect. However, comparison between the treatment group and the control group in an observational study without randomization is biased due to various unadjusted differences, such as characteristics between patients. Propensity score weighting is a widely used method to address these problems and to minimize bias by adjusting those confounding and assess treatment effects. Inverse probability weighting, the most popular method, assigns weights that are proportional to the inverse of the conditional probability of receiving a specific treatment assignment, given observed covariates. However, this method is often suffered by extreme propensity scores, resulting in biased estimates and excessive variance. Several alternative methods including trimming, overlap weights, and matching weights have been proposed to mitigate these issues. In this paper, we conduct a simulation study to compare performance of various propensity score weighting methods under diverse situation, such as limited overlap, misspecified propensity score, and treatment contrary to prediction. From the simulation results overlap weights and matching weights consistently outperform inverse probability weighting and trimming in terms of bias, root mean squared error and coverage probability.
주요어 : 성향 점수, 역확률 가중치, 모의실험, 제한된 중복
Keywords : propensity score, inverse probability weights, simulation study, limited overlap
References
  1. Arisido MW, Mecatti F, and Rebora P (2022). Improving the causal treatment effect estimation with propensity scores by the bootstrap, AStA Advances in Statistical Analysis, 106, 455-471.
    CrossRef
  2. Austin PC (2008). A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003, Statistics in Medicine, 27, 2037-2049.
    Pubmed CrossRef
  3. Austin PC (2022). Bootstrap vs asymptotic variance estimation when using propensity score weighting with continuous and binary outcomes, Statistics in Medicine, 41, 4426-4443.
    Pubmed KoreaMed CrossRef
  4. Austin PC and Stuart EA (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies, Statistics in Medicine, 34, 3661-3679.
    Pubmed KoreaMed CrossRef
  5. Cochran WG and Rubin DB (1973). Controlling bias in observational studies: A review, Sankhyā: The Indian Journal of Statistics, Series A, 35, 417-446.
  6. Crump RK, Hotz VJ, Imbens GW, and Mitnik OA (2009). Dealing with limited overlap in estimation of average treatment effects, Biometrika, 96, 187-199.
    CrossRef
  7. Freedman DA and Berk RA (2008). Weighting regressions by propensity scores, Evaluation Review, 32, 392-409.
    Pubmed CrossRef
  8. Glynn RJ, Lunt M, Rothman KJ, Poole C, Schneeweiss S, and Stürmer T (2019). Comparison of alternative approaches to trim subjects in the tails of the propensity score distribution, Pharmacoepidemiology and Drug Safety, 28, 1290-1298.
    Pubmed CrossRef
  9. Godambe VP (1970). Foundations of survey-sampling, The American Statistician, 24, 33-38.
    CrossRef
  10. Hirano K, Imbens GW, and Ridder G (2003). Effcient estimation of average treatment effects using the estimated propensity score, Econometrica, 71, 1161-1189.
    CrossRef
  11. Joffe MM and Rosenbaum PR (1999). Invited commentary: Propensity scores, American Journal of Epidemiology, 150, 327-333.
    Pubmed CrossRef
  12. Kim B and Kim JH (2020). Estimating causal effect of multi-valued treatment from observational survival data, Communications for Statistical Applications and Methods, 27, 675-688.
    CrossRef
  13. Kim GS, Paik MC, and Kim H (2017). Causal inference with observational data under cluster-specific non-ignorable assignment mechanism, Computational Statistics & Data Analysis, 113, 88-99.
    CrossRef
  14. Lee BK, Lessler J, and Stuart EA (2011). Weight trimming and propensity score weighting, PloS One, 6, 1-6.
    Pubmed KoreaMed CrossRef
  15. Li F, Morgan KL, and Zaslavsky AM (2018). Balancing covariates via propensity score weighting, Journal of the American Statistical Association, 113, 390-400.
    CrossRef
  16. Li F, Thomas LE, and Li F (2019). Addressing extreme propensity scores via the overlap weights, American Journal of Epidemiology, 188, 250-257.
    CrossRef
  17. Li L and Greene T (2013). A weighting analogue to pair matching in propensity score analysis, The International Journal of Biostatistics, 9, 215-234.
    CrossRef
  18. Lunceford JK and Davidian M (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study, Statistics in Medicine, 23, 2937-2960.
    Pubmed CrossRef
  19. Mao H and Li L (2020). Flexible regression approach to propensity score analysis and its relationship with matching and weighting, Statistics in Medicine, 39, 2017-2034.
    Pubmed KoreaMed CrossRef
  20. Mao H, Li L, and Greene T (2019). Propensity score weighting analysis and treatment effect discovery, Statistical Methods in Medical Research, 28, 2439-2454.
    Pubmed CrossRef
  21. McDonald RJ, McDonald JS, Kallmes DF, and Carter RE (2013). Behind the numbers: Propensity score analysis—a primer for the diagnostic radiologist, Radiology, 269, 640-645.
    Pubmed CrossRef
  22. Robins J (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect, Mathematical Modelling, 7, 1393-1512.
    CrossRef
  23. Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55.
    CrossRef
  24. Rosenbaum PR and Rubin DB (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score, The American Statistician, 39, 33-38.
    CrossRef
  25. Rubin DB (1973). Matching to remove bias in observational studies, Biometrics, 29, 159-183.
    CrossRef
  26. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies, Journal of Educational Psychology, 66, 688-701.
    CrossRef
  27. Rubin DB (1980). Randomization analysis of experimental data: The fisher randomization test comment, Journal of the American Statistical Association, 75, 591-593.
    CrossRef
  28. Stefanski LA and Boos DD (2002). The calculus of m-estimation, The American Statistician, 56, 29-38.
    CrossRef
  29. Stuart EA (2010). Matching methods for causal inference: A review and a look forward, Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 25, 1-21.
    Pubmed KoreaMed CrossRef
  30. Stürmer T, Rothman KJ, Avorn J, and Glynn RJ (2010). Treatment effects in the presence of unmeasured confounding: Dealing with observations in the tails of the propensity score distribution—a simulation study, American Journal of Epidemiology, 172, 843-854.
    Pubmed KoreaMed CrossRef
  31. Stürmer T,Webster-Clark M, Lund JL,Wyss R, Ellis AR, Lunt M, Rothman KJ, and Glynn RJ (2021). Propensity score weighting and trimming strategies for reducing variance and bias of treatment effect estimates: A simulation study, American Journal of Epidemiology, 190, 1659-1670.
    Pubmed KoreaMed CrossRef
  32. Traskin M and Small DS (2011). Defining the study population for an observational study to ensure sufficient overlap: A tree approach, Statistics in Biosciences, 3, 94-118.
    CrossRef
  33. Zhang HT, McGrath LJ, Ellis AR, Wyss R, Lund JL, and Stürmer T (2019). Restriction of pharmacoepidemiologic cohorts to initiators of medications in unrelated preventive drug classes to reduce confounding by frailty in older adults, American Journal of Epidemiology, 188, 1371-1382.
    Pubmed KoreaMed CrossRef
  34. Zhou Y, Matsouaka RA, and Thomas L (2020). Propensity score weighting under limited overlap and model misspecification, Statistical Methods in Medical Research, 29, 3721-3756.
    Pubmed CrossRef


February 2024, 37 (1)