Las pruebas de significación estadística: seis décadas de fuegos artificiales

Luis C. Silva Aycaguer

doi:10.17533/udea.rfnsp.v34n3a11

Autores/as

Luis C. Silva Aycaguer Escuela Nacional de Salud Pública

DOI:

https://doi.org/10.17533/udea.rfnsp.v34n3a11

Palabras clave:

inferencia estadística, prueba de significación estadística, intervalo de confianza, tamaño muestral, valores p

Resumen

Tras varios decenios de críticas a las técnicas inferenciales basadas en las pruebas de significación estadística orientadas al rechazo de la llamada hipótesis nula y, a pesar del notable consenso alcanzado entre los estadísticos profesionales, este recurso se mantiene vigente tanto en las publicaciones biomédicas, entre ellas las de Salud Pública, como en cursos introductorios de estadística. Entre las muchas deficiencias señaladas por los más prominentes especialistas se destacan tres por ser las más obvias y fáciles de comprender: que no contribuyen a cumplimentar la encomienda de la ciencia, que se conocen de antemano las respuestas a las preguntas que se encaran por su conducto y que los resultados que producen dependen de un elemento ajeno a la realidad estudiada: el tamaño muestral. El artículo discute en detalle tales limitaciones, ilustra su perniciosa presencia en la investigación actual y valora las razones para la subsistencia de la sinrazón en esta materia.

|Resumen

= 503 veces | PDF

= 219 veces| | HTML

= 50 veces| | SIN TÍTULO

= 0 veces|

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Luis C. Silva Aycaguer, Escuela Nacional de Salud Pública

Doctorado en Ciencias de la Salud, Licenciado en Matemáticas. Escuela Nacional de Salud Pública, La Habana, Cuba.

Citas

Gerrodette T. Inference without significance: measuring support for hypotheses rather than rejecting them. Mar Ecol 2011; 32: 404-418.

Berkson J. Some difficulties of interpretation encountered in the application of the chi-squaretest. J Am Stat Assoc 1938; 33:526- 542.

Rozeboom WW. The fallacy of the null hypothesis significance test. Psychol Bull 1960; 56:26-47.

Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol 1986; 44: 1276–1284.

Chernoff H. A comment. Am Stat 1986; 40(1): 5–6.

Berger J. Sellke T. Testing a point null hypothesis: the irreconcilability of P-values and evidence. J Am Stat Assoc 1987; 82: 112.

Thompson B. In praise of brilliance: Where that praise really belongs. Am Psycholt 1998; 53: 799–800.

Goodman SN. Toward evidence-based medical statistics (1): The p value fallacy. Ann Intern Med 1999; 130: 995-1004.

Nicholls N. The insignificance of significance testing. B Am Meteorol Soc 2001; 82(5): 981-986.

Armstrong JS. Statistical significance tests are unnecessary even when properly done and properly interpreted: Reply to commentaries. Int J Forecasting 2007; 23: 335–336.

Hubbard R, Lindsay RM. Why p values are not a useful measure of evidence in statistical significance testing. Theor Psychol 2008; 18: 69–88.

Nester MR. An applied statistician’s creed. Appl Stat 1996; 45: 401-410.

Rozeboom WW. Good science is abductive, not hypotheticodeductive. En Harlow LL, Mulaik SA, Steiger JH (Eds.), What if there were no significance tests? Hillsdale, NJ: Erlbaum; 1997 (pp. 366–391).

Ioannidis JPA. Why most published research findings are false. PLoS Med 2005; 2(8): e124.

Skipper JK, Guenther AL, Nass G. The sacredness of 0.05: a note concerning the uses of statistical level of significance in social science. Am Sociol 1967; 2: 16–18.

Nelder JA. Comment. J Roy Stat Soc A Sta 1985; 148(3): 238.

Kelley J. The perils of p-values: Why tests of statistical significance impede the progress of research. Handbook of Evidence-Based Psychodynamic Psychotherapy 2009; 367-377.

Cumming G. Understanding the new statistics: Effect sizes confidence intervals and meta-analysis. New York: Routledge; 2012.

Matthews WJ. What might judgment and decision making research be like if we took a Bayesian approach to hypothesis testing? Judg Dec Mak 2011;6(8): 843–856.

Kruschke JK, Liddell TM. The Bayesian new statistics: two historical trends converge. Judg Dec Mak 2014; 9 (6), 523-547.

Johnson DH. The insignificance of statistical significance testing. J Wildlife Manage 1999; 63(3):763-772.

Hauer E. The harm done by tests of significance. Accident Anal Prev 2004; 36: 495-500.

Kirk RE. The importance of effect magnitude. In S.F. Davis (Ed.), Handbook of research methods in experimental psychology (pp. 83–105). Oxford, UK: Blackwell, 2003.

Lecoutre B. Training students and researchers in Bayesian methods for experimental data analysis. J Data Scien 2006; 4: 207-232.

Berkson J. Tests of significance considered as evidence. J Am Stat Assoc 1942; 37: 325–335.

Yates F. The influence of Statistical Methods for Research Workers on the development of the science of statistics. J Am Stat Assoc 1951; 46: 19-34.

Anscombe FJ. Discussion on Dr. David’s and Dr. Johnson’s Paper. J Roy Stat Soc B Met 1956; 18: 24-27.

Savage IR. Nonparametric statistics. J Am Stat Assoc 1957; 52(279):331–344.

Bakan D. The test of significance in psychological research. Psychol Bull 1966; 66: 423-437.

Deming WE. On probability as a basis for action. Am Stat 1975; 29(4): 146-152.

Thompson B. In praise of brilliance: where that praise really belongs. Am Psycholt 1998; 53: 799–800.

Friedman M. Two lucky people: Memoirs. Chicago: University of Chicago Press; 1998.

Savage IR. Nonparametric statistics. J Am Stat Assoc 1957; 52: 332-333.

Albert J. The numbers guy. Periódico Wall Street Journal. 7 de diciembre 2007, Nueva York.

Faber J, Martins L. How sample size influences research outcomes. Dental Press J Orthod 2014; 19 (4): 27-29.

Bakan D. The test of significance in psychological research. Psychol Bull 1996; 66: 423-437.

Gardner MJ. Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Brit Med J 1986; 292, 746-750.

International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. Ann Intern Med 1988; 108: 258-265.

Nuzzo R. Scientific method: statistical errors. P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume. Nature 2014; 506:150-152.

Lang T, Altman D. Basic statistical reporting for articles published in clinical medical journals: the SAMPL guidelines. En: Science Editors’ Handbook. EASE, 2013.

American Psychological Association. Publication manual of the American Psychological Association (6th ed). Washington DC, 2010.

Trafimow D, Marks M. Editorial. Basic Appl Soc Psych 2015; 37(1): 1-2.

Wasserstein RL, Lazar NA. The ASA´s statement on p-values: context, process, and purpose. Am Statist 2016, DOI: 10.1080/00031305.2016.1154108.

Matloff N. After 150 years, the ASA says no to p-values. [Internet] Disponible en https://matloff.wordpress.com/2016/03/07/after150-years-the-asa-says-no-to-p-values/ Consultada el 16 de mayo de 2016.

Silva LC. Una pincelada estadística con repercusiones extrametodológicas. Salud Colectiva 2012; 7(3): 399-400.

CAPRIE Steering Committee. A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events. Lancet 1996; 348: 1329-1339.

Smith R. The trouble with medical journals. J Roy Soc Med 2006; 99:115–119.

Flotats A. Entrevista a Germán Velásquez. Periódico El País, 25 de octubre de 2015, Madrid.

Stang A, Poole C, Kuss O. The ongoing tyranny of statistical significance testing in biomedical research. Eur J Epidemiol 2010; 25:225–230.

Gigerenzer G. Mindless statistics. SocioEcon 2004; 33: 587–606.

Läärä E. Statistics: reasoning on uncertainty, and the insignificance of testing null. Ann Zool Fenn 2009; 46 (2): 138-157.

Savitz DA. Tolo KA. Poole C. Statistical significance testing in the American Journal of Epidemiology, 1970-1990. Am JEpidemiol 1994; 139 (10): 1047-1052.

Tressoldi PE, Giofré D, Sella F, Cumming G. High impact=high standards? Not necessarily so. PLoS One. 2013; 8(2): e56180

Lambdin C. Significance tests as sorcery: significance tests are not. Theor Psychol 2012; 22(1): 67 –90.

Fidler F, Thomason N, Cumming G, Finch S, Leeman J. Editors can lead researchers to confidence intervals but can’t make them think. Psychol Sci 2004; 15: 119-126.

Guttman L. What is not what in statistics? Statistician 1977; 26: 81-107.

Gross JH. Testing what matters (If you must test at all): A ContextDriven Approach to Substantive and Statistical Significance. Am J Polit Sci 2015; 59 (3): 775–788.

Silva LC, Suárez P, Fernández A. The null hypothesis significance test in health sciences research (1995-2006): Statistical analysis and interpretation. BMC Med Res Methodol BMC Med Res Methodol 2010; 10: 44-53.

Fidler F, Burgman MA, Cumming G, Buttrose R. Thomason N. Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology. Conserv. Biol. 2006; 20(5):1539–1544.

Sedlmeier P. Beyond the significance test ritual. J Psychol 2009; 217(1): 1-5.

Odgaard EC, Fowler RL. Statistical reporting practices can be reformed confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology. J Consult Clin Psych 2010; 78: 287–297.