Bartik, T.J., Butler, J. S., & Liu, J.T. (1992). Maximum score estimates of the determinants of residential mobility: implications for the value of residential attachment and neighborhood amenities. Journal of Urban Economics, 32(2), 233-256. Bertsimas, D., King, A., & Mazumder, R. (2016). Best subset selection via a modern optimization lens. The Annals of Statistics, 44(2), 813-852. Blundell, R., Fry, V., & Walker, I. (1988). Modelling the take-up of means-tested benefits: the case of housing benefits in the United Kingdom. The Economic Journal, 98(390), 58-74. Caudill, S.B. (2003). Predicting discrete outcomes with the maximum score estimator: The case of the NCAA men’s basketball tournament. International Journal of Forecasting, 19(2), 313-317. Chen, L.Y., & Lee, S. (2018). Best subset binary prediction. Journal of Econometrics, 206(1), 39-56. Chen, L.Y., & Lee, S. (2018). Exact computation of GMM estimators for instrumental variable quantile regression models. Journal of Applied Econometrics, 33(4), 553-567. Chen, L.Y., & Lee, S. (2021). Binary classification with covariate selection through ℓ0-penalised empirical risk minimisation. The Econometrics Journal, 24(1), 103-120. Conforti, M., Cornuéjols, G., & Zambelli, G. (2014). Integer Programming, Berlin: Springer. Cover, T.M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, 3(14), 326-334. Danilov, D., & Magnus, J. R. (2004). On the harm that ignoring pretesting can cause. Journal of Econometrics, 122(1), 27-46. Das, S. (1991). A semiparametric structural analysis of the idling of cement kilns. Journal of Econometrics, 50(3), 235-256. Elliott, G., & Lieli, R.P. (2013). Predicting binary outcomes. Journal of Econometrics, 174(1), 15-26. Florios, K., & Skouras, S. (2008). Exact computation of max weighted score estimators. Journal of Econometrics, 146(1), 86-91. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. Fung, G.M., & Mangasarian, O.L. (2004). A feature selection Newton method for support vector machine classification. Computational Optimization and Applications, 28(2), 185-202. Granger, C.W.J, & Machina, M.J. (2006). Forecasting and decision theory. Handbook of Economic Forecasting, 1, 81-98. Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint. The Annals of Statistics, 34(5), 2367-2386. Horowitz, J.L. (1993). Semiparametric estimation of a work-trip mode choice model. Journal of Econometrics, 58(1-2), 49-70. Kitagawa, T., & Tetenov, A. (2018). Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica, 86(2), 591-616. Kosorok, M.R. (2008). Introduction to Empirical Processes and Semiparametric Inference. New York: Springer. Lee, S., Liao, Y., Seo, M.H., & Shin, Y. (2021). Sparse HP filter: finding kinks in the COVID-19 contact rate. Journal of Econometrics, 220(1), 158-180. Lee, S., Liao, Y., Seo, M.H., & Shin, Y. (2021). Factor-driven two-regime regression. The Annals of Statistics, 49(3), 1656-1678. Li, C.Z. (1996). Semiparametric estimation of the binary choice model for contingent valuation. Land Economics, 72(4), 462-473. Lieli, R.P., & White, H. (2010). The construction of empirical credit scoring rules based on maximization principles. Journal of Econometrics, 157(1), 110-119. Little, J.D., Murty, K.G., Sweeney, D.W., & Karel, C. (1963). An algorithm for the traveling salesman problem. Operations research, 11(6), 972-989. Magnus, J.R., & Durbin, J. (1999). Estimation of regression coefficients of interest when other regression coefficients are of no interest. Econometrica, 67(3), 639-643. Manski, C.F. (1975). Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics, 3(3), 205-228. Manski, C.F. (1985). Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator. Journal of econometrics, 27(3), 313-333. McCullagh, P., & Nelder, J. (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC. Nelder, J.A., & Wedderburn, R.W. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135(3), 370-384. Shin, Y., & Todorov, Z. (2021). Exact computation of maximum rank correlation estimator. The Econometrics Journal, 24(3), 589-607. Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of Statistical Software, 39(5), 1-13. Simon, N., Friedman, J., & Hastie, T. (2013). A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. arXiv:1311.6529. Smith, F.W. (1968). Pattern classifier design by linear programming. IEEE Transactions on Computers, 100(4), 367-372. Su, J.H. (2021). Model selection in utility-maximizing binary prediction. Journal of Econometrics, 223(1), 96-124. Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. The Annals of Probability, 22(1), 28-76. Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., & Tibshirani, R. J. (2012). Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(2), 245-266. Vapnik, V.N., & Chervonenkis, A.Y. (1971). On uniform convergence of the frequencies of events to their probabilities. Theory of Probability and its Application, 16(2), 264-280.