Andersen PK and Gill RD (1982). Cox’s regression model for counting processes: A large sample study, The Annals of Statistics, 10, 1100-1120.
Bregman LM (1967). The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, 7, 200-217.
Colson B, Marcotte P, and Savard G (2005). Bilevel programming: A survey, 4OR, 3, 87-107. Cox DR (ðÄ). Partial likelihood, Biometrika, 62, 269-276.
Dempe S (2002). Foundations of Bilevel Programming. Nonconvex Optimization and Its Applications Vol. 61. Springer, Boston, MA.
Dispenzieri A, Katzmann JA, Kyle RA et al. (2012). Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population, Mayo Clinic Proceedings, 87, 517-523.
Harrell FE, Califf M, Pryor DB, Lee KL, and Rosati RA (1982). Evaluating the yield of medical tests, Journal of the American Medical Association, 247, 2543-2546.
Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, and Kluger Y (2018). DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network, BMC Medical Research Methodology, 18, 1-12.
Kvamme H, Borgan Ø, and Scheel I (2019). Time-to-event prediction with neural networks and Cox regression, Journal of Machine Learning Research, 20, 1-30.
McMahan HB (2017). A survey of algorithms and analysis for adaptive online learning, Journal of Machine Learning Research , 18, 1-50.
Orabona F, Crammer K, and Cesa-Bianchi N (2015). A generalized online mirror descent with applications to classification and regression, Machine Learning, 99, 411-435.
Vicente LN and Calamai PH (1994). Bilevel and multilevel programming: A bibliography review, Journal of Global Optimization, 5, 291-306.