I highly recommend Scientific American’s May 25 Opinion by Jack Murtagh explaining How the Guinness Brewery Invented the Most Important Statistical Method in Science. It nicely illustrates the t test—a landmark statistical method developed by William Sealy Gosset to assess a key ingredient in Guiness beer for ideal bitterness and preservation—soft resin content in hop flowers. Gosset calculated that a 1% difference in the amount of soft resins in the hops, the best and cheapest being purchased from Oregon,* increased their value to the brewery by almost 11%.
“Near the start of the 20th century, Guinness had been in operation for almost 150 years and towered over its competitors as the world’s largest brewery. Until then, quality control on its products had consisted of rough eyeballing and smell tests. But the demands of global expansion motivated Guinness leaders to revamp their approach to target consistency and industrial-grade rigor. The company hired a team of brainiacs and gave them latitude to pursue research questions in service of the perfect brew.”
– Jack Murtagh
Back in 2017 on National Beer Day, celebrated yearly on April 7 to commemorate the end of USA’s prohibition of its sale, I saluted Gosset and his very useful t-test of the significance of one treatment versus another, that is, a simple comparative experiment.**
“They began to accumulate data and, at once, they ran into difficulties because their measurements varied. The effects they were looking for were not usually clearcut or consistent, as they had expected, and they had no way of judging whether the differences they found were effects of treatment or accident. Two difficulties were confounded: the variation was high and the observations were few.”
– Joan Fisher Box,*** “Guinness, Gosset, Fisher, and Small Samples,” Statistical Science, Vol. 2, No. 1 (Feb., 1987), pp. 45-52
To see how the t-test works, check out this awesome graphical app developed Even Miller. Using Stat-Ease software, I cross-checked it against a case study (Example 3.3) from the second edition of Box, Hunter and Hunters’ textbook Statistics for Experimenters. It lays out a simple comparative experiment by a tomato gardener who randomly splits 11 plants for treatment either with her standard fertilizer (A) or a far more expensive one (B) that supposedly produces far better yields. Here are the yield results in pounds, which you can assess using the t test:
- 29.9, 11.4, 25.3, 16.5, 21.1
- 26.6, 23.7, 28.5, 14.2, 17.9, 24.3
On average the new fertilizer increases the yield by nearly 2 pounds, but is the difference statistically significant? That would be good to know! I have the answer, but it would be no fun to tell you, being so easy to find out for yourself.
PS: Due to the large variation between plants (a greater than 6-pound standard deviation!), this tomato study is badly underpowered. If you do an experiment like this, do anything possible to get more consistent results. Then assess power for whatever the difference is that makes changing fertilizers worthwhile. For example, let’s say that with better plant management you got the standard deviation reduced to 3 pounds and a difference of 4 pounds is needed at a minimum to make the switch in fertilizer cost-effective. Then, using Stat-Ease software’s power calculator, I figure you would need to test 3-dozen plants each in your randomized experiment to achieve an 80% probability of detecting a difference of 4 pounds given a 3-pound standard deviation. I hope you like tomatoes!
*As reported by Eat This Podcast in their 4/10/18 post on Guinness and the value of statistics
***I was very fortunate to meet Joan Fisher Box in 2019 as related in this StatsMadeEasy blog/