Archive for category design of experiments

The secret sauce in Guinness beer?

I highly recommend Scientific American’s May 25 Opinion by Jack Murtagh explaining How the Guinness Brewery Invented the Most Important Statistical Method in Science. It nicely illustrates the t test—a landmark statistical method developed by William Sealy Gosset to assess a key ingredient in Guiness beer for ideal bitterness and preservation—soft resin content in hop flowers. Gosset calculated that a 1% difference in the amount of soft resins in the hops, the best and cheapest being purchased from Oregon,* increased their value to the brewery by almost 11%.

“Near the start of the 20th century, Guinness had been in operation for almost 150 years and towered over its competitors as the world’s largest brewery. Until then, quality control on its products had consisted of rough eyeballing and smell tests. But the demands of global expansion motivated Guinness leaders to revamp their approach to target consistency and industrial-grade rigor. The company hired a team of brainiacs and gave them latitude to pursue research questions in service of the perfect brew.”

  – Jack Murtagh

Back in 2017 on National Beer Day, celebrated yearly on April 7 to commemorate the end of USA’s prohibition of its sale, I saluted Gosset and his very useful t-test of the significance of one treatment versus another, that is, a simple comparative experiment.**

“They began to accumulate data and, at once, they ran into difficulties because their measurements varied. The effects they were looking for were not usually clearcut or consistent, as they had expected, and they had no way of judging whether the differences they found were effects of treatment or accident. Two difficulties were confounded: the variation was high and the observations were few.”

– Joan Fisher Box,*** “Guinness, Gosset, Fisher, and Small Samples,” Statistical Science, Vol. 2, No. 1 (Feb., 1987), pp. 45-52

To see how the t-test works, check out this awesome graphical app developed Even Miller. Using Stat-Ease software, I cross-checked it against a case study (Example 3.3) from the second edition of Box, Hunter and Hunters’ textbook Statistics for Experimenters. It lays out a simple comparative experiment by a tomato gardener who randomly splits 11 plants for treatment either with her standard fertilizer (A) or a far more expensive one (B) that supposedly produces far better yields. Here are the yield results in pounds, which you can assess using the t test:

  1. 29.9, 11.4, 25.3, 16.5, 21.1
  2. 26.6, 23.7, 28.5, 14.2, 17.9, 24.3

On average the new fertilizer increases the yield by nearly 2 pounds, but is the difference statistically significant? That would be good to know! I have the answer, but it would be no fun to tell you, being so easy to find out for yourself.

PS: Due to the large variation between plants (a greater than 6-pound standard deviation!), this tomato study is badly underpowered. If you do an experiment like this, do anything possible to get more consistent results. Then assess power for whatever the difference is that makes changing fertilizers worthwhile. For example, let’s say that with better plant management you got the standard deviation reduced to 3 pounds and a difference of 4 pounds is needed at a minimum to make the switch in fertilizer cost-effective. Then, using Stat-Ease software’s power calculator, I figure you would need to test 3-dozen plants each in your randomized experiment to achieve an 80% probability of detecting a difference of 4 pounds given a 3-pound standard deviation. I hope you like tomatoes!

*As reported by Eat This Podcast in their 4/10/18 post on Guinness and the value of statistics

**National Beer Day–A fine time for fun facts and paying homage to a wickedly smart brewer from Guinness

***I was very fortunate to meet Joan Fisher Box in 2019 as related in this StatsMadeEasy blog/

No Comments

Mentos volcano rocks Rapid City

It was my pleasure to oversee another outstanding collection of fun experiments by the Chemical and Biological Engineering (CBE) students at South Dakota School of Mines and Technology (SDSMT) for this Spring semester’s Applied Design of Experiments for the Chemical Industry class presented by Stat-Ease. They continued on the excellent tradition established by the class of 2020 which I reported in my blog on “DOE It Yourself” hits the spot for distance-learning projects.

As promised, I am highlighting a few of the many A+ projects in StatsMadeEasy, particularly those with engaging videos. My first selection goes to Dakin Nolan, Erick Hoon and Jared Wilson for their “DOE Soda and Mentos Experiment”. They studied the “heterogenous nucleation of gases on a surface” caused by type of soda, its temperature and volume versus the quantity of Mentos. See the results in the video (“the moment you’ve all been waiting for”). Do not miss the grand finale (“The Masterpiece”) that shows what happens if you mix 15 Mentos in a 2-liter bottle of hot Diet Coke.

It’s hard to say how high the cola spouted in the blow out at the end, but it must have made a big sticky mess of the surrounding area. At similar conditions but at a more prudent maximum of 3 Mentos (the highest level actually tested in the DOE), Design-Expert predicts a peak of 310 inches—an impressive 25 feet of magma.

Further work will be needed to optimize the dosage of Mentos. Perhaps 15 of the sugary oblate spheroids may be overkill. There’s always room for improvement, as well as more fun, making volcanoes.

No Comments

Experiment reveals secret to maximizing microwave popcorn—Part one: Setup

Energized by a new tool in Design-Expert® software (DX) for modeling counts (to be discussed in Part 2—Analysis of results), I laid out a design of experiment (DOE) aimed at reducing the number of unpopped kernels (UPK) from microwaved popcorn. I figured that counting the UPKs would be a far more precise measure of popcorn loss than weighing them, as done in this prior study by me and my son Hank).

My new experiment varied the following two factors in a replicated, full, multilevel, categorical design done with my General Electric (GE) Spacemaker microwave oven:

A. Preheat with 1 cup of water at 1 minute on high, No [L1] vs Yes [L2]

B. Timing, GE default [L1] vs GE++ [L2] vs Popcorn Expert app [L3]

I tested the preheating (factor A) before and found it to be unproductive. However, after seeing it on this list of microwave ‘hacks’, I decided to try again. Perhaps my more precise measuring of UPK might show preheating to be of some help after all.

The timing alternatives (factor B) came about when I discovered Popcorn Expert AI Cooking Assistant for systematically applying the #1 hack—the two-second rule: When this much time passes between pops, stop.

By the way, I also tried the third hack—pouring the popcorn into a covered glass bowl, but that failed completely—causing a very alarming “SENSOR ERROR”. It turns out that the GE Spacemaker uses humidity to determine when your popcorn is done. The plastic cover prevented moisture from escaping. Oops! Next time I try this it will be with a perforated lid.

While researching the user manual for the first time since buying the Spacemaker 15 years ago (engineers rarely read instructions) and learning about the humidity angle for the first time, I also found out that pressing 9 twice after beginning the popcorn cook added 20 and then 10 more seconds (++) at the end.

The original experiment-design of 12 runs (2×3 replicated) was laid out in a randomized recipe sheet by DX, all of them done using 3 ounce bags of Jolly Time, Simply Popped Sea Salt microwave popcorn. Due to a few mistakes by the machine operator (me) misreading the run sheet, two extra runs got added—no harm done: more being better for statistical power.

Part 2 of this two-part blog will delve into the analysis details, but it became readily apparent from a one-to-one comparison that the default popcorn setting of my GE microwave came up far short of Popcorn Expert for reducing UPK. However, the “++” adjustment closed the gap, as you will see.

To be continued…

No Comments

Statisticians earn residuals by airing errors

A new book by David S. Salsburg provides a series of Cautionary Tales in Designed Experiments. Salsburg wrote the classic The Lady Tasting Tea, which I read with great delight. I passed along the titular story (quite amazing!) in a book review (article #4) for the July 2004 DOE FAQ Alert.

Salsburg’s cautionary tales offer a quick read with minimal mathematics on what can go wrong with poorly designed or badly managed experiments—mainly medical. I especially liked his story of the Lanarkshire Milk Experiment of 1930, which attempted to test whether pasteurization removed all the “good”. Another funny bit from Salsburg, also related in The Lady Tasting Tea and passed only by me in my review, stems from his time doing clinical research at Pfizer when a manager complained about him making too many “errors”. He changed this statistical term to “residuals” to make everyone happy.

With all the controversy now about clinical trials of Covid-19 vaccines and the associated politics, Cautionary Tales in Designed Experiments offers a welcome look with a light touch at how far science progressed over the past century in their experimental protocols.

“It is the well-designed randomized experiment that provides the final ‘proof’ of the finding. The terminology often differs from field to field. Atomic physicists look for “six sigma” deviations, structure-activity chemists look for a high percentage of variance accounted for, and medical scientists describe the “specificity” and “sensitivity” of measurements. But all of it starts with statistically based design of experiments.”

David S. Salsburg, conclusion to Cautionary Tales in Designed Experiments

No Comments

Magic of multifactor testing revealed by fun physics experiment: Part Three—the details and data

Detail on factors:

  1. Ball type (bought for $3.50 each from Five Below (www.fivebelow.com)):
    • 4 inch, 41 g, hollow, licensed (Marvel Spiderman) playball from Hedstrom (Ashland, OH)
    • 4 inch, 159 g, energy high bounce ball from PPNC (Yorba Linda, CA)
  2. Temperature (equilibrated by storing overnight or longer):
    • Freezer at about -4 F
    • Room at 72 to 76 F with differing levels of humidity
  3. Drop height (released by hand):
    • 3 feet
    • 6 feet
  4. Floor surface:
    • Oak hardwood
    • Rubber, 3/4″ thick, Anti Fatigue Comfort Floor Mat by Sky Mats (www.skymats.com)

Measurement:

Measurements done with Android PhyPhox app “(In)Elastic”. Record T1 and H1, time and height (calculated) of first bounce. As a check note H0, the estimated drop height—this is already known (specified by factor C low and high levels).

Data:

Std   # Run   # A: Ball type B: Temp deg F C: Height feet D: Floor type Time seconds Height centimeters
1 16 Hollow Room 3 Wood 0.618 46.85
2 6 Solid Room 3 Wood 0.778 74.14
3 3 Hollow Freezer 3 Wood 0.510 31.91
4 12 Solid Freezer 3 Wood 0.326 13.02
5 8 Hollow Room 6 Wood 0.829 84.33
6 14 Solid Room 6 Wood 1.119 153.54
7 1 Hollow Freezer 6 Wood 0.677 56.17
8 4 Solid Freezer 6 Wood 0.481 28.34
9 5 Hollow Room 3 Rubber 0.598 43.92
10 10 Solid Room 3 Rubber 0.735 66.17
11 2 Hollow Freezer 3 Rubber 0.559 38.27
12 7 Solid Freezer 3 Rubber 0.478 28.03
13 15 Hollow Room 6 Rubber 0.788 76.12
14 11 Solid Room 6 Rubber 0.945 109.59
15 9 Hollow Freezer 6 Rubber 0.719 63.43
16 13 Solid Freezer 6 Rubber 0.693 58.96

Observations:

  • Run 7: First drop produced result >2 sec with height of 494 cm. This is >16 feet! Obviously something went wrong. My guess is that the mic on my phone is having trouble picking up the sound of the softer solid ball and missed a bounce or two. In any case, I redid the bounce.
    • Starting run 8, I will record Height 0 in Comments as a check against bad readings.
  • Run 8: Had to drop 3 times to get time registered due to such small, quiet and quick bounces.
    • Could have tried changing setting for threshold provided by the (In)Elastic app.
  • Run 14: Showing as outlier for height so it was re-run. Results came out nearly the same 1.123 s (vs 1.119 s) and 154.62 cm (vs 153.54). After transforming by square root these results fell into line. This makes sense by physics being that distance for is a function of time squared.

Suggestions for future:

  • Rather than drop the balls by eye from a mark on the wall, do so from a more precise mechanism to be more consistent and precise for height
  • Adjust up for 3/4″ loss in height of drop due to thickness of mat
  • Drop multiple times for each run and trim off outliers before averaging (or use median result)
  • Record room temp to nearest degree

No Comments

Magic of multifactor testing revealed by fun physics experiment: Part Two—the amazing results

The 2020 pandemic provided a perfect opportunity to spend time doing my favorite thing: Experimenting!

Read Part One of this three-part blog to learn what inspired me to investigate the impact of the following four factors on the bounciness of elastic spheroids:

  A. Ball type: Hollow or Solid

  B. Temperature: Room vs Freezer

  C. Drop height: 3 vs 6 feet

  D. Floor surface: Hardwood vs Rubber

Design-Expert® software (DX) provides the astonishing result: Neither the type of ball (factor A) nor the differing surfaces (factor D) produced significant main effects on first-bounce time (directly related to height per physics). I will now explain.

Let’s begin with the Pareto Chart of effects on bounce time (scaled to t-values).

First observe the main effects of A (ball type) and D (floor surface) falling far below the t-Value Limit: They are insignificant (p>>0.05). Weird!

Next, skipping by the main effect of factor B (temperature) for now (I will get back to that shortly), notice that C—the drop height—towers high above the more conservative Bonferroni Limit: The main effect of drop height is very significant. The orange shading indicates that increasing drop height creates a positive effect—it increases the bounce time. This makes perfect sense based on physics (and common knowledge).

Now look at a multi-view Model Graphs for all four main effects.

The plot at the lower left shows how the bounce time increased with height. The least-significant-difference ‘dumbbells’ at either end do not overlap. Therefore, the increase is significant (p<0.05). The slope quantifies the effect—very useful for engineering purposes.

However, as DX makes clear by its warnings, the other three main effects, A, B and D, must be approached with great caution because they interact with each other. The AB and BD interactions will tell the true story of the complex relationship of ball type (A), their temperature (B) and the floor material (D).

See by the interaction plot how the effect of ball type depends on the temperature. At room temperature (the top red line), going from the hollow to the solid ball produces a significant increase in bounce time. However, after being frozen, the balls behaved completely opposite—hollow beating solid (bottom green line). These opposing effects caused the main effect of ball type (factor A) to cancel!

Incredibly (I’ve never seen anything like this!), the same thing happened with the floor surface: The main effect of floor type got washed out by the opposite effects caused by changing temperature from room (ambient) to that in the freezer (below 0 degrees F).

Changing one factor at a time (OFAT) in this elastic spheroid experiment leads to a complete fail. Only by going to the multifactor testing approach of statistical DOE (design of experiments) can researchers reveal breakthrough interactions. Furthermore, by varying factors in parallel, DOE reveals effects far faster than OFAT.

If you still practice old-fashioned scientific methods, give DOE a try. You will surely come out far ahead of your OFAT competitors.

P.S. Details on elastic-spheroid experiments procedures will be laid out in Part 3 of this series.

No Comments

Magic of multifactor testing revealed by fun physics experiment: Part One—the setup

The behavior of elastic spheres caught my attention due to a proposed, but not completed, experiment on ball bounciness turned in by a student from the South Dakota School of Mines and Technology.* I decided to see for myself what would happen.

To start, I went shopping for suitable elastic spheres. As pictured, I found two ball-toys with the same diameter—one of them with an eye-catching Spider-Man graphic.

My grandkids all thought that “Spidey” would bounce higher than the other ball—the one in swirly blue and yellow. Little did they know just by looking that “Swirley” was the one with superpowers, it being made from exceptionally elastic, solid synthetic rubber. Sadly, Spidey turned out to be a hollow airhead. This became immediately obvious when I dropped the two balls side by side from shoulder height. Spidey rebounded only to my knee while Swirley shot all the way back to nearly to the original drop level, which really amazed the children.

My next idea for the bouncy experiment came from Frugal Fun for Boys and Girls, a website that provides many great science projects. Their bouncy ball experiment focuses on the effect of temperature as seen here.

However, I could see one big problem straight away: How can you get an accurate measure of bounce height? That led me an amazing cell-phone app called Phyphox (Physics Phone Experiments) which provided an ingenious way to calculate how high a ball bounces by listening to them hit the floor.** Watch this short video to see how. (If you are a physicist, stay on for how the narrator of the demo, Sebastian Staacks, worked out all his calculations for the Phyphox (In)elastic tool.)

The third factor came easy: Height of drop. To make this obvious but manageable, I chose three versus six feet.

The fourth and final factor occurred to me while washing dishes. We recently purchased a thick rubber mat for easy cleanup and comfortable standing in front of our sink. I realized that this would provide a good contrast to our hardwood floors for bounce height, the softer surface being obviously inferior.

To recap, the four factors and their levels I tested were:

A. Ball type: Hollow or Solid

B. Temperature: Room vs Freezer

C. Drop height: 3 vs 6 feet

D. Floor surface: Hardwood vs Rubber

Using Design-Expert® software (DX) I then laid out a two-level, full factorial of 16 runs in random order. To be sure of temperature being stabilized, I did only one run per day, recording the time the first bounce and its height (calculated by the Phypox boffins as detailed in the videos).

When I completed the experiment and analyzed the results using DX, I was astounded to see that neither the type of ball nor the differing surfaces produced significant main effects. That made no sense based on my initial demonstrations on side-by-side bounce for the two balls on the floor versus the rubber mat.

Keeping in mind that my experiment provided a multifactor test of two other variables, perhaps you can guess what happened. I will give you a hint: Factors often interact to produce surprising results, such as time and temperature suddenly coming together to create a fire (or as I would say as a chemical engineer—an “exothermic reaction”).

Stay tuned for Part 2 of this blog on my elastic spheroid experiment to see how the factors interacted in delightful ways that, once laid out, make perfect sense to even for non-physicists.

*For background on my class and an impressive list of home experiments, see “DOE It Yourself” hits the spot for distance-learning projects.

**I credit Rhett Alain of Wired for alerting me to Phyphox via his 8/16/18 post on Three Science Experiments You Can Do With Your Phone. From there he provides a link to a prior, more detailed, post on Modeling a Bouncing Ball.

No Comments

Business community discovers that “Experimentation Works”

Last month the Wall Street Journal “Bookshelf” (3/15/20, David A. Shaywitz) featured a review of a book about The Surprising Power of Business Experiments.

“Tests at Microsoft in 2012 revealed that a tiny adjustment in the way its Bing search engine displayed ad headlines resulted in a 12% increase in revenue, translating into an extra $100 million annually for the company in the U.S. alone.”

Stefan Thomke, author of Experimentation Works: The Surprising Power of Business Experiments.

It’s great to see attention paid to the huge advantages gained from statistically rigorous experiments. However, vastly greater returns await those willing to go beyond simple-comparative one-factor A/B testing to multifactor design of experiments. The reason is obvious: Only by testing more than one factor at a time, can interactions be discovered.

A case in point is provided by an experiment I did on postcard advertisements. It produced a non-intuitive finding that, unlike marketers, our engineering clients preferred less colorful layouts. Knowing this, we succeeded in increasing our response at a far lower printing cost. See the proof in the interaction plot at the conclusion of this white paper on That Voodoo We Do – Marketers Are Embracing Statistical Design of Experiments.

Another compelling example of the value of multifactor testing is illustrated by website-conversion results* shown here—produced from a replicated, full, two-level factorial design.

The key to a more than 5-fold increase in clicks turned out to be the combination of going to a modern font (factor A) with a more compelling button label (C). A third factor (B), background being white versus blue, did not create a significant effect, which also provided valuable insights on the drivers for conversion.

Why settle for testing only one factor when, without investing much more time, if any, you can investigate many factors and, as a huge bonus, detect possible interactions?

*From Pochiraju & Seshadri, Essentials of Business Analytics, 2019, Springer, p 737.

No Comments

Enlightenment by an accidental statistician under the Great Comet of 1996

A small, but select, group of people came Friday to University of Wisconsin, Madison for the celebration George E. P. Box’s 100th birthday, including his second wife Joan Fisher, whose father Ronald invented modern-day design of experiments (DOE) and the whole field of industrial statistics. Box, who doubled down on Fisher by his development of response surface methods (RSM), went by the name “Pel”. This nickname stemmed from the second of his middle names “Edward Pelham” (E. P. not standing for Elvis Presley as some who admired him thought more apropos).

In my blog on March 30, 2013—just after his death, I relayed stories of my two memorable encounters with Box. Friday marked my first visit to UW-Madison since I last saw him in 1996 for his short-course on DOE. Looking over Lake Mendota from the Memorial Union Terrace brought back memories of the incredible view during my class, when Comet Hyakutake peaked in spectacular fashion before rapidly diminishing. I rate Hyakutake on par with Hale-Bopp that came a year later, just as I view Box and Fisher as the luminaries for DOE.

Inspired by the Centenary, I ordered a copy of Box’s autobiography—The Accidental Statistician, which he completed in the last year of his life. I look forward to reading more about this remarkable fellow.

The video presented by Box at the time of publication—March 2013—provides a sampling of the stories he told to inspire experimenters to be more observant and methodical:

  • How a monk discovered the secret to making champagne,
  • What to make of seeing bloody Mr. Jones running down the street pursued by Mrs. Jones with a hatchet (good one for this Halloween season!).
https://www.youtube.com/watch?v=svmKEhsp1Gg

No Comments

Designed experiment creates egg-splosive results

Design-Expert® software version 12 (DX12) released this summer with a cool new tool to model binary responses, for example, pass-versus-fail quality-testing. For what it’s worth, the methodology is called “logistic regression”, but suffice it to say that it handles results restricted to only two values, typically 0 or 1. The user deems which level is a success, most often “1”.

During development of DX12 Stat-Ease moved to a penthouse office on a building with a cascade of balconies. So, when our programmers, led by Hank Anderson, considered how to test this feature with an experiment, they came up with the idea of trying various packing on eggs to see if they could be dropped some distance without breaking—a project that high-school science teachers assign their students. However, we figured that our neighboring tenants down below and our new landlord might not be very happy about the mess that this would create. Therefore, Hank and his team took a less problematic tack by testing various factors for microwaving eggs to an edible stage. This experiment (or ‘eggs-periment’ if you like) also was more productive for varying the diet of the programmers from their staple of boiled ramen noodles—the focus of a prior DOE.* If they could achieve consistent success in cooking eggs by microwave, a combination of these with ramen might be the ideal sustenance for awesome coding for new versions of Design-Expert.

The Stat-Ease experiment began with a bang during the range-finding stage with an explosive result. You might say that the yolk was on us—bits of overcooked egg and shell dispersed throughout the chamber of the microwave. The picture below shows the messy aftermath (note the safety glasses).

After this learning experience (‘eggs-perience’?), Hank and his lab technician, Mike Brownson, settled into a safer range of factors, shown below, that kept the contents from reaching the catastrophic breaking point:

  1. Preheat—0 to 180 seconds
  2. Cooking time—120 to 420 seconds
  3. Power—60 to 100 percent
  4. Salt—0 to 2 teaspoons
  5. Egg Size —Large or Jumbo

Hank and Mike, with input from Stat-Ease Consultant Martin Bezener, put together an ambitious design with 92 runs using Design-Expert’s custom design builder (i-optimal) for response surface methods. Heads-up: When responses are restricted to just two outcomes (binary), many more runs are required to provide adequate power than would be required for a continuous measure.

The investment of nearly 100 trials for the ‘eggs-periment’ paid off by producing significant results on pass/fail measures of undercooking and overcooking. For example, the 3D graph below shows the probability of eggs being undercooked as a function of time and power for the microwaving. Notice by the corner at the left being cut off that potentially catastrophic combinations of high power and long cooking were excluded via a multifactor constraint. Clever!

.

Based on models produced from this experiment, Design-Expert’s multiple-response optimization recommends a most desirable setup for microwaving eggs as follows: Heavily salted jumbos preheated to the maximum level and then cooked for 315 seconds at medium power.

Thanks to the research by Hank, Mike and Martin, our programming staff now is fueled not just by ramen, but also with eggs—a spectacular success for DX12’s new logistic-regression tools!

* “The Optimal Recession-Proof Recipe”, Brooks Henderson, pp 1-2, September 2012 Stat-Teaser, followed up by “Confirming the Optimal Ramen”, p3, January 2013 Stat-Teaser.

No Comments