The problem of separation in logistic regression, a solution and an application

Authors

  • Juan C. Correa M. National University of Colombia
  • Marisol Valencia C. Pontifical Bolivarian University

DOI:

https://doi.org/10.17533/udea.rfnsp.8770

Keywords:

logistic model, maximum likelihood estimation, menarche

Abstract

Logistic regression is one of the most used statistical techniques for explaining the probabilistic behavior of a given phenomenon. Data separation is a frequent problem in this model, as successes appear separated from failures and make it impossible to find the maximum likelihood estimators. Objective: to present a revision and a solution to the problem, and to compare it with other solutions. Methodology: a simulation of the logistic model and an estimation of the parameters’ bias using the proposed classical and Bayesian solution with fictitious observations, as well as the Firth method. Results: the bias found is lower when the pair of fictitious observations are generated using the Bayesian method. An example about the age at which menarche occurs is presented. Discussion: an appropriate solution to the problem of separation is provided using a simulation in a simple logistic model. Conclusions: the generation of fictitious observations within the separation region is recommended, and the best solution method is based on Bayesian theory, which achieves convergence of the parameters of the logistic model.

|Abstract
= 267 veces | PDF (ESPAÑOL (ESPAÑA))
= 112 veces|

Downloads

Download data is not yet available.

Author Biographies

Juan C. Correa M., National University of Colombia

Ph.D. in Statistics, University of Kentucky. Professor, National University of Colombia, Medellín, Colombia.

Marisol Valencia C., Pontifical Bolivarian University

Master in Statistics, National University of Colombia, professor, Pontifical Bolivarian University, Medellín, Colombia.

References

(1). Albert A, Anderson JA. On the existence of maximum likeliho-od estimates in logistic regression models. Biometrika 1984;71: 1-10. DOI: https://doi.org/10.1093/biomet/71.1.1

(2). Christmann A, Rousseeuw PJ. Measuring overlap in binary regression. Computational Statistics and Data Analysis 2001; 37: 65-75. DOI: https://doi.org/10.1016/S0167-9473(00)00063-3

(3). Christmann A, Rousseeuw PJ. Robustness against separation and outliers in logistic regression, Computational Statistics and Data Analysis 2003;43: 315-332. DOI: https://doi.org/10.1016/S0167-9473(02)00304-3

(4). King E, Ryan TP. A preliminary investigation of maximum likeli-hood logistic regression versus Exact logisic Regression. Ameri-can Statistical Association 2002; 56 (3): 163-170. DOI: https://doi.org/10.1198/00031300283

(5). Lesaffre E, Albert A. Partial Separation in Logistic Discrimination. Journal of the Royal Statistical Society. Series B (Methodo-logical) 1989; 51(1): 109-116. DOI: https://doi.org/10.1111/j.2517-6161.1989.tb01752.x

(6). Rindskopf D. Infinite parameter estimates in logistic regression: Opportunities, not problems. Journal of Educational and Behavioral Statistics 2002; 27(2): 147-161. DOI: https://doi.org/10.3102/10769986027002147

(7). Gentleman R, Ihaka R. R: A Language and Environment for Statistical Computing. R Development Core Team [internet] R Foundation for Statistical Computing: Vienna; 2009 [acceso 07 de noviembre de 2010]. Disponible en: www.R-project.org..

(8). Santner TJ, Duffy DE. A note on A. Albert and J. A. Anderson’s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika 1986; 73(3): 755-758. DOI: https://doi.org/10.1093/biomet/73.3.755

(9). Ying So. A Tutorial on Logistic Regression [revista en internet]. Journal Of Marriage And The Family 1995; 57(4): 1-6. Disponi-ble en: http://www.mendeley.com/research/a-tutorial-on-logistic-regression/ DOI: https://doi.org/10.2307/353415

(10). Heinze G, Shemper M. A solution to the problem of separation in logistic regression. Statist. Med 2002; 21:2409-2419. DOI: https://doi.org/10.1002/sim.1047

(11). Firth D. Bias reduction, the Je_reys prior and glim. En: Fahrmeir L, Francis B, Gilchrist R, Tutz G, editores. Advances in glimand Statistical Modelling. New York: Springer-Verlag; 1992. p. 91-100. DOI: https://doi.org/10.1007/978-1-4612-2952-0_15

Published

2012-01-24

How to Cite

1.
Correa M. JC, Valencia C. M. The problem of separation in logistic regression, a solution and an application. Rev. Fac. Nac. Salud Pública [Internet]. 2012 Jan. 24 [cited 2025 Dec. 7];29(3):281-8. Available from: https://revistas.udea.edu.co/index.php/fnsp/article/view/8770

Issue

Section

Research