search for




 

빅데이터 환경에서 다중 슬롯머신 문제에 대한 톰슨 샘플링 방법
Thompson sampling for multi-armed bandits in big data environments
Korean J Appl Stat 2024;37(5):663-673
Published online October 31, 2024
© 2024 The Korean Statistical Society.

김민경a, 황범석1,a
Min Kyong Kima, Beom Seuk Hwang1,a

a중앙대학교 응용통계학과

aDepartment of Applied Statistics, Chung-Ang University
1Department of Applied Statistics, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Korea. E-mail: bshwang@cau.ac.kr
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2024-00452772), and was supported by the Chung-Ang University research grant in 2022. This paper was prepared by extracting parts of Min Kyong Kim’s master’s thesis.
Received July 15, 2024; Revised August 4, 2024; Accepted August 8, 2024.
Abstract
MAB (multi-armed bandits) 문제는 순차적 의사 결정 상황에서 나타나며, 동적인 환경 내에서 가능한 여러 행동 중 보상을 최대화할 수 있는 최적의 행동을 선택하는 데 중점을 둔다. 통계적 학습 이론의 맥락에서 MAB 문제를 해결하는 대표적인 알고리즘 중 하나인 톰슨 샘플링은 근사 기법을 적용하면 복잡한 상황에서도 유연하게 적용될 수 있다고 알려져 있다. 그러나 실제 상용 서비스 데이터를 이용한 연구는 부족한 상황이다. 본 연구에서는 대중적인 추천 시스템 환경 중 하나인 배너 클릭 데이터를 활용하여 여러 조건의 모의실험 환경에서 톰슨 샘플링에 다양한 근사 기법 적용 여부에 따른 성능을 평가하였다. 실험 결과, 랑주뱅 몬테 카를로 근사 기법을 적용한 톰슨 샘플링의 성능이 빅데이터 환경에서 기존 톰슨 샘플링과 유사한 성능을 보임을 확인하였다. 본 연구는 근사 기법을 적용한 톰슨 샘플링이 근사 기법의 고유한 장점을 가지면서도 기존 모형과 유사한 성능을 낼 수 있음을 실증 확인하였다는 점에 그 의의가 있다고 볼 수 있다.
The multi-armed bandits (MAB) problem, involves selecting actions to maximize rewards within dynamic environments. This study explores the application of Thompson sampling, a robust MAB algorithm, within the context of big data analytics and statistical learning theory. By leveraging large-scale banner click data from recommendation systems, we evaluate Thompson sampling’s performance across various simulated scenarios, employing advanced approximation techniques. Our findings demonstrate that Thompson sampling, particularly with Langevin Monte Carlo approximation, maintains robust performance and scalability in big data environments. This underscores its practical significance and adaptability, aligning with contemporary challenges in statistical learning.
주요어 : 근사 기법, 다중 슬롯머신, 베이지안 최적화, 톰슨 샘플링, 통계적 학습
Keywords : approximation, Bayesian optimization, multi-armed bandits, statistical learning, Thompson sampling
References
  1. Akram A (2018). Ads CTR Optimisation, Version 1,
  2. Auer P (2002). Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, 3, 397-422.
  3. Burtini G, Loeppky J, and Lawrence R (2015). A survey of online experiment design with the stochastic multi-armed bandit,
  4. Casella G and George EI (1992). Explaining the Gibbs sampler, The American Statistician, 46, 167-174.
    CrossRef
  5. Chandrashekar A, Amat F, Basilico J, and Jebara T (2017). Artwork personalization at Netflix, Netflix TechBlog,
  6. Chapelle O and Li L (2011). An empirical evaluation of Thompson sampling, Advances in Neural Information Processing Systems, 24.
  7. Eckles D and Kaptein M (2014). Thompson sampling with the online bootstrap,
  8. Granmo O-C (2010). Solving two-armed Bernoulli bandits problems using a Bayesian learning automaton, International Journal of Intelligent Computing and Cybernetics, 3, 207-234.
    CrossRef
  9. Hill DN, Nassif H, Liu Y, Iyer A, and Vishwanathan SVN (2017). An efficient bandits algorithm for realtime multivariate optimization, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1813-1821.
    CrossRef
  10. Nemeth C and Fearnhead P (2021). Stochastic gradient Markov Chain Monte Carlo, Journal of the American Statistical Association, 116, 433-450.
    CrossRef
  11. Robbins H and Monro S (1951). A stochastic approximation method, The Annals of Mathematical Statistics, 22, 400-407.
    CrossRef
  12. Roberts GO and Tweedie RL (1996). Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2, 341-363.
    CrossRef
  13. Russo DJ, Van Roy B, Kazerouni A, Osband I, and Wen Z (2018). A tutorial on Thompson sampling, Foundations and Trends® in Machine Learning, 11, 1-96.
    CrossRef
  14. Sutton RS and Barto AG (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.
    CrossRef
  15. Thompson WR (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, 25, 285-294.
    CrossRef
  16. Welling M and Teh YW (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 681-688.


October 2024, 37 (5)