Tests of Statistical Significance: Six Decades of Fireworks

Authors

  • Luis C. Silva Aycaguer National School of Public Health

DOI:

https://doi.org/10.17533/udea.rfnsp.v34n3a11

Keywords:

statistical inference, significant statistical tests, confidence intervals, sampling size, p-values

Abstract

After decades of criticism against inferential techniques based on statistical significance tests, which mainly reject the socalled null hypothesis, and in spite of the remarkable consensus among professional statisticians, this resource remains prevalent in both biomedical publications (including public health journals) and introductory statistics courses. Among the many problems identified by the most prominent specialists, three of them are the most obvious and easy to understand: that these tests do not contribute to the actual enterprise of science, that the answers to the questions that are addressed are known in advance and that their results depend critically on an element that is external to the domain that is being studied: sample size. This paper discusses in detail these limitations, illustrates their pernicious presence in current research and evaluates the reasons for the survival of the senselessness in this matter.

|Abstract
= 481 veces | PDF (ESPAÑOL (ESPAÑA))
= 217 veces| | HTML (ESPAÑOL (ESPAÑA))
= 44 veces| | SIN TÍTULO (ESPAÑOL (ESPAÑA))
= 0 veces|

Downloads

Download data is not yet available.

Author Biography

Luis C. Silva Aycaguer, National School of Public Health

PhD in Health Sciences, Degree in Mathematics. National School of Public Health, Havana, Cuba.

References

Gerrodette T. Inference without significance: measuring support for hypotheses rather than rejecting them. Mar Ecol 2011; 32: 404-418.

Berkson J. Some difficulties of interpretation encountered in the application of the chi-squaretest. J Am Stat Assoc 1938; 33:526- 542.

Rozeboom WW. The fallacy of the null hypothesis significance test. Psychol Bull 1960; 56:26-47.

Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol 1986; 44: 1276–1284.

Chernoff H. A comment. Am Stat 1986; 40(1): 5–6.

Berger J. Sellke T. Testing a point null hypothesis: the irreconcilability of P-values and evidence. J Am Stat Assoc 1987; 82: 112.

Thompson B. In praise of brilliance: Where that praise really belongs. Am Psycholt 1998; 53: 799–800.

Goodman SN. Toward evidence-based medical statistics (1): The p value fallacy. Ann Intern Med 1999; 130: 995-1004.

Nicholls N. The insignificance of significance testing. B Am Meteorol Soc 2001; 82(5): 981-986.

Armstrong JS. Statistical significance tests are unnecessary even when properly done and properly interpreted: Reply to commentaries. Int J Forecasting 2007; 23: 335–336.

Hubbard R, Lindsay RM. Why p values are not a useful measure of evidence in statistical significance testing. Theor Psychol 2008; 18: 69–88.

Nester MR. An applied statistician’s creed. Appl Stat 1996; 45: 401-410.

Rozeboom WW. Good science is abductive, not hypotheticodeductive. En Harlow LL, Mulaik SA, Steiger JH (Eds.), What if there were no significance tests? Hillsdale, NJ: Erlbaum; 1997 (pp. 366–391).

Ioannidis JPA. Why most published research findings are false. PLoS Med 2005; 2(8): e124.

Skipper JK, Guenther AL, Nass G. The sacredness of 0.05: a note concerning the uses of statistical level of significance in social science. Am Sociol 1967; 2: 16–18.

Nelder JA. Comment. J Roy Stat Soc A Sta 1985; 148(3): 238.

Kelley J. The perils of p-values: Why tests of statistical significance impede the progress of research. Handbook of Evidence-Based Psychodynamic Psychotherapy 2009; 367-377.

Cumming G. Understanding the new statistics: Effect sizes confidence intervals and meta-analysis. New York: Routledge; 2012.

Matthews WJ. What might judgment and decision making research be like if we took a Bayesian approach to hypothesis testing? Judg Dec Mak 2011;6(8): 843–856.

Kruschke JK, Liddell TM. The Bayesian new statistics: two historical trends converge. Judg Dec Mak 2014; 9 (6), 523-547.

Johnson DH. The insignificance of statistical significance testing. J Wildlife Manage 1999; 63(3):763-772.

Hauer E. The harm done by tests of significance. Accident Anal Prev 2004; 36: 495-500.

Kirk RE. The importance of effect magnitude. In S.F. Davis (Ed.), Handbook of research methods in experimental psychology (pp. 83–105). Oxford, UK: Blackwell, 2003.

Lecoutre B. Training students and researchers in Bayesian methods for experimental data analysis. J Data Scien 2006; 4: 207-232.

Berkson J. Tests of significance considered as evidence. J Am Stat Assoc 1942; 37: 325–335.

Yates F. The influence of Statistical Methods for Research Workers on the development of the science of statistics. J Am Stat Assoc 1951; 46: 19-34.

Anscombe FJ. Discussion on Dr. David’s and Dr. Johnson’s Paper. J Roy Stat Soc B Met 1956; 18: 24-27.

Savage IR. Nonparametric statistics. J Am Stat Assoc 1957; 52(279):331–344.

Bakan D. The test of significance in psychological research. Psychol Bull 1966; 66: 423-437.

Deming WE. On probability as a basis for action. Am Stat 1975; 29(4): 146-152.

Thompson B. In praise of brilliance: where that praise really belongs. Am Psycholt 1998; 53: 799–800.

Friedman M. Two lucky people: Memoirs. Chicago: University of Chicago Press; 1998.

Savage IR. Nonparametric statistics. J Am Stat Assoc 1957; 52: 332-333.

Albert J. The numbers guy. Periódico Wall Street Journal. 7 de diciembre 2007, Nueva York.

Faber J, Martins L. How sample size influences research outcomes. Dental Press J Orthod 2014; 19 (4): 27-29.

Bakan D. The test of significance in psychological research. Psychol Bull 1996; 66: 423-437.

Gardner MJ. Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Brit Med J 1986; 292, 746-750.

International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. Ann Intern Med 1988; 108: 258-265.

Nuzzo R. Scientific method: statistical errors. P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume. Nature 2014; 506:150-152.

Lang T, Altman D. Basic statistical reporting for articles published in clinical medical journals: the SAMPL guidelines. En: Science Editors’ Handbook. EASE, 2013.

American Psychological Association. Publication manual of the American Psychological Association (6th ed). Washington DC, 2010.

Trafimow D, Marks M. Editorial. Basic Appl Soc Psych 2015; 37(1): 1-2.

Wasserstein RL, Lazar NA. The ASA´s statement on p-values: context, process, and purpose. Am Statist 2016, DOI: 10.1080/00031305.2016.1154108.

Matloff N. After 150 years, the ASA says no to p-values. [Internet] Disponible en https://matloff.wordpress.com/2016/03/07/after150-years-the-asa-says-no-to-p-values/ Consultada el 16 de mayo de 2016.

Silva LC. Una pincelada estadística con repercusiones extrametodológicas. Salud Colectiva 2012; 7(3): 399-400.

CAPRIE Steering Committee. A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events. Lancet 1996; 348: 1329-1339.

Smith R. The trouble with medical journals. J Roy Soc Med 2006; 99:115–119.

Flotats A. Entrevista a Germán Velásquez. Periódico El País, 25 de octubre de 2015, Madrid.

Stang A, Poole C, Kuss O. The ongoing tyranny of statistical significance testing in biomedical research. Eur J Epidemiol 2010; 25:225–230.

Gigerenzer G. Mindless statistics. SocioEcon 2004; 33: 587–606.

Läärä E. Statistics: reasoning on uncertainty, and the insignificance of testing null. Ann Zool Fenn 2009; 46 (2): 138-157.

Savitz DA. Tolo KA. Poole C. Statistical significance testing in the American Journal of Epidemiology, 1970-1990. Am JEpidemiol 1994; 139 (10): 1047-1052.

Tressoldi PE, Giofré D, Sella F, Cumming G. High impact=high standards? Not necessarily so. PLoS One. 2013; 8(2): e56180

Lambdin C. Significance tests as sorcery: significance tests are not. Theor Psychol 2012; 22(1): 67 –90.

Fidler F, Thomason N, Cumming G, Finch S, Leeman J. Editors can lead researchers to confidence intervals but can’t make them think. Psychol Sci 2004; 15: 119-126.

Guttman L. What is not what in statistics? Statistician 1977; 26: 81-107.

Gross JH. Testing what matters (If you must test at all): A ContextDriven Approach to Substantive and Statistical Significance. Am J Polit Sci 2015; 59 (3): 775–788.

Silva LC, Suárez P, Fernández A. The null hypothesis significance test in health sciences research (1995-2006): Statistical analysis and interpretation. BMC Med Res Methodol BMC Med Res Methodol 2010; 10: 44-53.

Fidler F, Burgman MA, Cumming G, Buttrose R. Thomason N. Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology. Conserv. Biol. 2006; 20(5):1539–1544.

Sedlmeier P. Beyond the significance test ritual. J Psychol 2009; 217(1): 1-5.

Odgaard EC, Fowler RL. Statistical reporting practices can be reformed confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology. J Consult Clin Psych 2010; 78: 287–297.

Published

2016-09-05

How to Cite

1.
Silva Aycaguer LC. Tests of Statistical Significance: Six Decades of Fireworks. Rev. Fac. Nac. Salud Pública [Internet]. 2016 Sep. 5 [cited 2025 Feb. 5];34(3):372-9. Available from: https://revistas.udea.edu.co/index.php/fnsp/article/view/25859