Tests of Statistical Significance: Six Decades of Fireworks
DOI:
https://doi.org/10.17533/udea.rfnsp.v34n3a11Keywords:
statistical inference, significant statistical tests, confidence intervals, sampling size, p-valuesAbstract
After decades of criticism against inferential techniques based on statistical significance tests, which mainly reject the socalled null hypothesis, and in spite of the remarkable consensus among professional statisticians, this resource remains prevalent in both biomedical publications (including public health journals) and introductory statistics courses. Among the many problems identified by the most prominent specialists, three of them are the most obvious and easy to understand: that these tests do not contribute to the actual enterprise of science, that the answers to the questions that are addressed are known in advance and that their results depend critically on an element that is external to the domain that is being studied: sample size. This paper discusses in detail these limitations, illustrates their pernicious presence in current research and evaluates the reasons for the survival of the senselessness in this matter.
Downloads
References
Gerrodette T. Inference without significance: measuring support for hypotheses rather than rejecting them. Mar Ecol 2011; 32: 404-418.
Berkson J. Some difficulties of interpretation encountered in the application of the chi-squaretest. J Am Stat Assoc 1938; 33:526- 542.
Rozeboom WW. The fallacy of the null hypothesis significance test. Psychol Bull 1960; 56:26-47.
Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol 1986; 44: 1276–1284.
Chernoff H. A comment. Am Stat 1986; 40(1): 5–6.
Berger J. Sellke T. Testing a point null hypothesis: the irreconcilability of P-values and evidence. J Am Stat Assoc 1987; 82: 112.
Thompson B. In praise of brilliance: Where that praise really belongs. Am Psycholt 1998; 53: 799–800.
Goodman SN. Toward evidence-based medical statistics (1): The p value fallacy. Ann Intern Med 1999; 130: 995-1004.
Nicholls N. The insignificance of significance testing. B Am Meteorol Soc 2001; 82(5): 981-986.
Armstrong JS. Statistical significance tests are unnecessary even when properly done and properly interpreted: Reply to commentaries. Int J Forecasting 2007; 23: 335–336.
Hubbard R, Lindsay RM. Why p values are not a useful measure of evidence in statistical significance testing. Theor Psychol 2008; 18: 69–88.
Nester MR. An applied statistician’s creed. Appl Stat 1996; 45: 401-410.
Rozeboom WW. Good science is abductive, not hypotheticodeductive. En Harlow LL, Mulaik SA, Steiger JH (Eds.), What if there were no significance tests? Hillsdale, NJ: Erlbaum; 1997 (pp. 366–391).
Ioannidis JPA. Why most published research findings are false. PLoS Med 2005; 2(8): e124.
Skipper JK, Guenther AL, Nass G. The sacredness of 0.05: a note concerning the uses of statistical level of significance in social science. Am Sociol 1967; 2: 16–18.
Nelder JA. Comment. J Roy Stat Soc A Sta 1985; 148(3): 238.
Kelley J. The perils of p-values: Why tests of statistical significance impede the progress of research. Handbook of Evidence-Based Psychodynamic Psychotherapy 2009; 367-377.
Cumming G. Understanding the new statistics: Effect sizes confidence intervals and meta-analysis. New York: Routledge; 2012.
Matthews WJ. What might judgment and decision making research be like if we took a Bayesian approach to hypothesis testing? Judg Dec Mak 2011;6(8): 843–856.
Kruschke JK, Liddell TM. The Bayesian new statistics: two historical trends converge. Judg Dec Mak 2014; 9 (6), 523-547.
Johnson DH. The insignificance of statistical significance testing. J Wildlife Manage 1999; 63(3):763-772.
Hauer E. The harm done by tests of significance. Accident Anal Prev 2004; 36: 495-500.
Kirk RE. The importance of effect magnitude. In S.F. Davis (Ed.), Handbook of research methods in experimental psychology (pp. 83–105). Oxford, UK: Blackwell, 2003.
Lecoutre B. Training students and researchers in Bayesian methods for experimental data analysis. J Data Scien 2006; 4: 207-232.
Berkson J. Tests of significance considered as evidence. J Am Stat Assoc 1942; 37: 325–335.
Yates F. The influence of Statistical Methods for Research Workers on the development of the science of statistics. J Am Stat Assoc 1951; 46: 19-34.
Anscombe FJ. Discussion on Dr. David’s and Dr. Johnson’s Paper. J Roy Stat Soc B Met 1956; 18: 24-27.
Savage IR. Nonparametric statistics. J Am Stat Assoc 1957; 52(279):331–344.
Bakan D. The test of significance in psychological research. Psychol Bull 1966; 66: 423-437.
Deming WE. On probability as a basis for action. Am Stat 1975; 29(4): 146-152.
Thompson B. In praise of brilliance: where that praise really belongs. Am Psycholt 1998; 53: 799–800.
Friedman M. Two lucky people: Memoirs. Chicago: University of Chicago Press; 1998.
Savage IR. Nonparametric statistics. J Am Stat Assoc 1957; 52: 332-333.
Albert J. The numbers guy. Periódico Wall Street Journal. 7 de diciembre 2007, Nueva York.
Faber J, Martins L. How sample size influences research outcomes. Dental Press J Orthod 2014; 19 (4): 27-29.
Bakan D. The test of significance in psychological research. Psychol Bull 1996; 66: 423-437.
Gardner MJ. Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Brit Med J 1986; 292, 746-750.
International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. Ann Intern Med 1988; 108: 258-265.
Nuzzo R. Scientific method: statistical errors. P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume. Nature 2014; 506:150-152.
Lang T, Altman D. Basic statistical reporting for articles published in clinical medical journals: the SAMPL guidelines. En: Science Editors’ Handbook. EASE, 2013.
American Psychological Association. Publication manual of the American Psychological Association (6th ed). Washington DC, 2010.
Trafimow D, Marks M. Editorial. Basic Appl Soc Psych 2015; 37(1): 1-2.
Wasserstein RL, Lazar NA. The ASA´s statement on p-values: context, process, and purpose. Am Statist 2016, DOI: 10.1080/00031305.2016.1154108.
Matloff N. After 150 years, the ASA says no to p-values. [Internet] Disponible en https://matloff.wordpress.com/2016/03/07/after150-years-the-asa-says-no-to-p-values/ Consultada el 16 de mayo de 2016.
Silva LC. Una pincelada estadística con repercusiones extrametodológicas. Salud Colectiva 2012; 7(3): 399-400.
CAPRIE Steering Committee. A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events. Lancet 1996; 348: 1329-1339.
Smith R. The trouble with medical journals. J Roy Soc Med 2006; 99:115–119.
Flotats A. Entrevista a Germán Velásquez. Periódico El País, 25 de octubre de 2015, Madrid.
Stang A, Poole C, Kuss O. The ongoing tyranny of statistical significance testing in biomedical research. Eur J Epidemiol 2010; 25:225–230.
Gigerenzer G. Mindless statistics. SocioEcon 2004; 33: 587–606.
Läärä E. Statistics: reasoning on uncertainty, and the insignificance of testing null. Ann Zool Fenn 2009; 46 (2): 138-157.
Savitz DA. Tolo KA. Poole C. Statistical significance testing in the American Journal of Epidemiology, 1970-1990. Am JEpidemiol 1994; 139 (10): 1047-1052.
Tressoldi PE, Giofré D, Sella F, Cumming G. High impact=high standards? Not necessarily so. PLoS One. 2013; 8(2): e56180
Lambdin C. Significance tests as sorcery: significance tests are not. Theor Psychol 2012; 22(1): 67 –90.
Fidler F, Thomason N, Cumming G, Finch S, Leeman J. Editors can lead researchers to confidence intervals but can’t make them think. Psychol Sci 2004; 15: 119-126.
Guttman L. What is not what in statistics? Statistician 1977; 26: 81-107.
Gross JH. Testing what matters (If you must test at all): A ContextDriven Approach to Substantive and Statistical Significance. Am J Polit Sci 2015; 59 (3): 775–788.
Silva LC, Suárez P, Fernández A. The null hypothesis significance test in health sciences research (1995-2006): Statistical analysis and interpretation. BMC Med Res Methodol BMC Med Res Methodol 2010; 10: 44-53.
Fidler F, Burgman MA, Cumming G, Buttrose R. Thomason N. Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology. Conserv. Biol. 2006; 20(5):1539–1544.
Sedlmeier P. Beyond the significance test ritual. J Psychol 2009; 217(1): 1-5.
Odgaard EC, Fowler RL. Statistical reporting practices can be reformed confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology. J Consult Clin Psych 2010; 78: 287–297.
Published
How to Cite
Issue
Section
License
The contents of the articles are the responsibility of the authors
The editorial committee has editorial independence from the National School of Public Health "Héctor Abad Gómez" of the University of Antioquia.
The editorial committee is not responsible for aspects related to copying, plagiarism or fraud that may appear in the articles published in it.
When you are going to reproduce and disclose photographs or personal data in printed or digital format, informed consent is required. Therefore, this requirement is required of the author at the time of receipt of the manuscript.
Authors are responsible for obtaining the necessary permissions to reproduce any material protected by reproduction rights.
The authors preserve the moral rights and assign the economic rights that will correspond to the University of Antioquia, to publish it, distribute electronic copies, include them in indexing services, directories or national and international databases in Open Access, under the Creative Commons Attribution license -Not Commercial-Share Equal 4.0 International Commercial (CC BY-NC-SA) which allows others to distribute, remix, retouch, and create from the work in a non-commercial way, as long as the respective credit and license are granted. new creations under the same conditions.
The authors will sign the declaration of transfer of economic rights to the University of Antioquia, after the acceptance of the manuscript.
The editorial committee reserves the right to reject the articles whose authors do not offer satisfactory explanations about the contribution of each author, to meet the criteria of authorship in the submission letter. All authors must meet the four criteria of authorship according to ICMJE: "a) .- That there is a substantial contribution to the conception or design of the article or to the acquisition, analysis or interpretation of the data. b) That they have participated in the design of the research work or in the critical review of its intellectual content. c) .- That has been intervened in the approval of the final version that will be published.d). That they have the capacity to respond to all aspects of the article in order to ensure that issues related to the accuracy or integrity of any part of the work are adequately investigated and resolved. "