search for




 

Learning-to-rank 기법을 활용한 서울 경마경기 순위 예측
Horse race rank prediction using learning-to-rank approaches
Korean J Appl Stat 2024;37(2):239-253
Published online April 30, 2024
© 2024 The Korean Statistical Society.

정준형a, 신동욱a, 황세용a, 박건웅1,a
Junhyoung Chunga, Donguk Shina, Seyong Hwanga, Gunwoong Park1,a

a서울대학교 통계학과

aDepartment of Statistics, Seoul National University
1Department of Statistics, Seoul National University, 1 Gwanak-ro, Gwanak-Gu, Seoul 08826, Korea. E-mail: gwpark23@snu.ac.kr
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2021R1C1C1004562 and RS-2023-00218231). Additionally, this work was supported by the New Faculty Startup Fund from Seoul National University.
Received September 24, 2023; Revised November 11, 2023; Accepted November 27, 2023.
Abstract
본 연구는 learning-to-rank (LTR) 기법 중 point-wise와 pair-wise learning을 적용하여 서울 경마경기 순위 예측을 수행하였다. Point-wise learning으로는 선형회귀와 랜덤 포레스트를 pair-wise learning으로는 RankNet, LambdaMART (XGBoost Ranker, LightGBM Ranker, CatBoost Ranker)을 활용하였다. 또한 데이터 불균형 문제를 해결하기 위해 전처리 과정에서 경주기록을 경주거리에 따라 표준화하는 방식을 채택하였으며, 모형의 예측 능력 향상을 위해 경기 정보, 기수 정보, 마필 정보, 조교사 정보 등의 다양한 데이터를 사용하였다. 그 결과 아이템 간의 순위관계를 학습할 수 있는 pair-wise learning이 point-wise learning보다 전반적으로 더 뛰어난 예측력을 보이는 것을 확인하였다. 특히 CatBoost Ranker는 제시된 모형들 중 가장 뛰어난 예측 성능을 보였다. 마지막으로 섀플리 값을 통해 CatBoost Ranker에서 경주마의 성적, 직전 경주기록, 경주마의 출발훈련 횟수, 누적 출발훈련 횟수, 질병 진단횟수 등이 상위 10개 중요 변수에 포함된 것을 확인하였다.
This research applies both point-wise and pair-wise learning strategies within the learning-to-rank (LTR) framework to predict horse race rankings in Seoul. Specifically, for point-wise learning, we employ a linear model and random forest. In contrast, for pair-wise learning, we utilize tools such as RankNet, and LambdaMART (XG-Boost Ranker, LightGBM Ranker, and CatBoost Ranker). Furthermore, to enhance predictions, race records are standardized based on race distance, and we integrate various datasets, including race information, jockey information, horse training records, and trainer information. Our results empirically demonstrate that pair-wise learning approaches that can reflect the order information between items generally outperform point-wise learning approaches. Notably, CatBoost Ranker is the top performer. Through Shapley value analysis, we identified that the important variables for CatBoost Ranker include the performance of a horse, its previous race records, the count of its starting trainings, the total number of starting trainings, and the instances of disease diagnoses for the horse.
주요어 : 경마, 람다마트, 순위학습기법, 랭크넷, 순위예측
Keywords : horse race, lambdamart, learning-to-rank, ranknet, rank prediction
References
  1. Breiman L (2001). Random forests, Machine Learning, 45, 5-32.
    CrossRef
  2. Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, and Hullender G (2005). Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. Association for Computing Machinery, New York, NY, 89-96.
    CrossRef
  3. Burges C, Ragno R, and Le Q (2006). Learning to rank with nonsmooth cost functions, Advances in Neural Information Processing Systems, 19.
    CrossRef
  4. Burges CJ (2010). From ranknet to lambdarank to lambdamart: An overview, Learning, 11, 81.
  5. Chen T and Guestrin C (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, San Francisco, CA, USA, 785-794).
    CrossRef
  6. Choe H, Hwang N, Hwang C, and Song J (2015). Analysis of horse races: Prediction of winning horses in horse races using statistical models, The Korean Journal of Applied Statistics, 28, 1133-1146.
    CrossRef
  7. Grinsztajn L, Oyallon E, and Varoquaux G (2022). Why do tree-based models still outperform deep learning on typical tabular data?, Advances in Neural Information Processing Systems, 35, 507-520.
  8. Hu Z,Wang Y, Peng Q, and Li H (2019). Unbiased lambdamart: An unbiased pairwise learning-to-rank algorithm. In Proceedings of The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, 2830-2836.
    CrossRef
  9. Järvelin K and Kekäläinen J (2017). IR evaluation methods for retrieving highly relevant documents, ACM SIGIR Forum, 51, 243-250.
    CrossRef
  10. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, and Liu TY (2017). Lightgbm: A highly efficient gradient boosting decision tree, Advances in Neural Information Processing Systems, 30.
  11. Kholkine L, Servotte T, De Leeuw AW, De Schepper T, Hellinckx P, Verdonck T, and Latré S (2021). A learn-to-rank approach for predicting road cycling race outcomes, Frontiers in Sports and Active Living, 3, 714107.
    Pubmed KoreaMed CrossRef
  12. Li P, Qin Z, Wang X, and Metzler D (2019). Combining decision trees and neural networks for learning-to-rank in personal search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2032-2040.
    CrossRef
  13. Liu TY (2009). Learning to rank for information retrieval, Foundations and Trends® in Information Retrieval, 3, 225-331.
    CrossRef
  14. Park G, Park R, and Song J (2017). Analysis of cycle racing ranking using statistical prediction models, The Korean Journal of Applied Statistics, 30, 25-39.
    CrossRef
  15. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, and Gulin A (2018). CatBoost: Unbiased boosting with categorical features, Advances in Neural Information Processing Systems, 31.
  16. Pudaruth S, Medard N, and Dookhun ZB (2013). Horse racing prediction at the champ de mars using a weighted probabilistic approach, International Journal of Computer Applications, 72, 39-42.
    CrossRef
  17. Soldaini L and Goharian N (2017). Learning to rank for consumer health search: A semantic approach. In Advances in Information Retrieval: 39th European Conference on IR Research, ECIR 2017, Aberdeen, UK, April 8-13, 2017, Proceedings 39 (pp. 640-646). Springer International Publishing.
    CrossRef
  18. Wang X, Li C, Golbandi N, Bendersky M, and Najork M (2018). The lambdaloss framework for ranking metric optimization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 1313-1322.
    CrossRef
October 2024, 37 (5)