This December 13 article published by The New Yorker adds fuel to fire for deemphasizing significance testing as the criterion for accepting purported advancements in science. It’s well worth reading for anyone with a stake in statistics, despite raking over the same coals seen in this March 27 Science News article, which I discussed in a previous blog.*
“A lot of extraordinary scientific data are nothing but noise.”
– Jonah Lehrer, author of “The Truth Wears Off: Is there something wrong with the scientific method”
Evidently much of the bad science stems from “significance-chasers” – those who hunt out findings that pass the generally-accepted p-value of 5% for hypothesis testing. Unfortunately a statistically-significant outcome from an badly-designed experiment is of no value whatsoever.
PS. I credit blogger William Briggs for bringing this article to my attention. His attitude is provided succinctly by this assertion: “Scientists are too damn certain of themselves.”
*Misuse of statistics calls into question the credibility of science March 28, 2010.
#1 by jem777dy on February 2, 2011 - 3:51 pm
As a chemist and statistician, I am continually appalled at how poorly designed a lot of experiments are. It does not bother me that the results of experiments change, sometimes greatly. I tried to help a few of my friends analyze data and design experiments for their psychology experiments. All they wanted to do was test group a versus group b. When I suggested that they add in a covariates (age, gender, race, etc) their reply was, “That’s not part of the study. I want a simple comparison, not a PhD thesis!”
I believe that we, as people that have at least some statistical knowledge, need to help out fellow scientists and engineers, to create, design and analyze better experiments. I read a lot of articles where the authors get publications based upon simple comparisons, using OFAT, misuse and abuse factorial designs, response surfaces and mixture designs. When I see these types of articles, I will write the author(s) and sometimes the journal editors. I’ll ask the authors why they didn’t use a more rigorious statistical methods or more properly analyze their data. I wrote a few editors about how some authors don’t use appropriate methods to analyze the data. When they do respond, the general feeling is that the articles are publication worthy. Even if a proper analysis of the data yeilds a completely different outcome than what the authors published!
In the end, I just keep thinking, GIGO.
#2 by Eric Kvaalen on February 8, 2011 - 9:14 am
A very interesting article. But I don’t think the blame can be placed on badly-designed experiments. There was nothing wrong with the design in the examples given in the article. The article mentions “faulty design” as “the real problem” (at least for some cases), but it doesn’t seem to mean that DOE was not used. Rather, that certain things were not specified beforehand et cetera. But even that doesn’t really explain, or fully explain, the phenomenon referred to. It doesn’t explain the results of Schooler on precognition.
If it’s true that whatever has been shown true at one time ceases to be significant later, then “this too shall pass” and everything will be all right again!