A simple statistic reveals amazing wisdom from crowds

My good friend Rich Burnham, knowing my interest in off-beat science and stats, drew my attention to this video by YouTuber Michael Stevens (aka “Vsauce”) on an experiment that failed to confirm a phenomenon called the “wisdom of the crowds.”

Normally, as demonstrated by Sir Francis Galton in 1906 from data collected at a country fair on 787 guesses at the weight of an ox,* groups of people exhibit a high level of collective intelligence via a simple median (the “middlemost estimate”)—being off by only 9 pounds for the 1,198 pound ox. This amazes me—blowing away my mindset that the wisdom of a crowd degrades to the ‘lowest common denominator,’ that is, the people with the least knowledge.

Experts agree with Vsauce’s hypothesis that the complete failure of his crowd to correctly guess the number of jelly beans in his jar stemmed from the estimates being shared, rather than gathered with no cross-talk.

“The wisdom of crowds requires that people’s estimates be independent. Studies have found that when people can observe the estimates of others, the accuracy of the crowd typically goes down. People’s errors become correlated or dependent, and are less likely to cancel each other out. We follow our peers, to the detriment of the performance of the group.”

  – Psychology professor Tania Lombrozo, No Man Is An Island: The Wisdom Of Deliberating Crowds, posted 3/12/18 by WGCU, a National Public Radio-member station on Florida’s Gulf Coast

I made the same mistake in a 2019 contest for my Anderson clan. While vacationing together at a lakeside resort, I gathered individuals’ estimates on the number of aluminum-can pull-tabs I’d collected for donation to the Ronald McDonald House in Minneapolis. See the picture below of my wife Karen (holding Bertie) working with our oldest grandchild Archer do the count. I asked the participants to write down their guesses on a clipboard by the jar, which created more fun via the gaming aspects of going just above or below a competitor, but violated the statistical requirement for independence.

An interesting workaround that allows collaboration for tapping the “wisdom of the crowds” is to first break the group into a number of teams and then average out their consensus estimates. See the research, based on results from a group of 5,180 people asked to estimate the height of the Eiffel Tower or the like, at this 2018 Letter by Nature Human Behavior on Aggregated knowledge from a small number of debates outperforms the wisdom of large crowds.

To keep things simple, the next time my bottle of pull-tabs fills up for another contest to guess the total, I will go with the simpler approach for crowd wisdom by banning cross talk and then seeing if the median estimate wins. If it doesn’t work, I will blame it on our family group being too small (though it does exceed 20—all in one cabin!).

* “Vox Populi,” Nature, 1907

,

No Comments

An homage to an engineer extraordinaire—my dad

James J. (Jim) Anderson, my dad (seen pictured in 1990 with former President Gerald Ford), passed away peacefully on Wednesday, January 29, at the age of 95 after being hit hard by influenza A (be careful out there—it’s spiking now across the USA). Both of us being engineers, him inspired by his dad—an engineer also, and me by him, we enjoyed many great talks in his later years about technical matters along the lines of this StatsMadeEasy blog. This is my homage to Jim’s engineering accomplishments, mostly based on what I gathered from his stories (which I will greatly miss) and my memories, thus off a bit on some of the particulars, but fairly accurate, hopefully. It’s quite a story!

Jim was a stellar masters civil engineer who specialized in wastewater treatment before the realization of the damage being wreaked on our rivers by unconstrained dumping. His first job after achieving his bachelor’s degree in sanitary engineering in 1952—a newly created specialty at University of Minnesota—was a demonstration project for The American Meat Institute aimed at cleaning up waste streams from the Hormel plant in Austin, Minnesota (where I was born in 1953). He then moved us (by then my sister Nancy also) to the Twin Cities for a job at Toltz, King, Duvall, Anderson, and Associates (now known as TKDA), being assigned as City Engineer for West Saint Paul.

Jim then took an engineering job at the aptly named Pigs Eye Wastewater Treatment Plant in Saint Paul. While there he became skeptical of sketchy studies dating back to the 1930s that led to heavy use of flocculants at a substantial ongoing cost. This seemed to produce little effect, but Jim knew it would take a definitive experiment to overcome the ‘common knowledge’ of its efficacy. Someone suggested that a fellow at University of Wisconsin by the name of George Box might provide some help on the design of experiments (DOE). Sure enough, the DOE designed by Box did the trick—no more wasteful use of chemicals after that.

In 1968, Jim completed his master’s thesis, which involved regression modeling—a tool undergoing rapid development at that time. He had to defend the methodology against a professor who doubted anything produced by a computer could ever be relied upon.

About then, Jim was at the right place at the right time by submitting a proposal to the Federal Water Pollution Control Administration (created in 1965 by the Clean Water Act) for ways to deal with secondary sewage—the surge of dirty water pouring through single-pipe systems in Saint Paul and many other cities after heavy rains. To his surprise, a grant came through in an amount that seemed overwhelmingly large at the time—$1,741,000.* Even more surprisingly his engineering manager went for it. Predictive modeling of rainfall based on regression was the key to Jim’s solution for dealing with overflow. Knowledge was power. Being able to anticipate the surges, weirs (small mechanical barriers inside the sewage pipe) and inflatable dams trapped the water long enough to then be released in a volume that would not overwhelm the wastewater treatment plant.

Other cities jumped on his solutions, starting with Cleveland, infamous for their 1969 Cuyahoga River fire. Jim then started a consulting firm called Watermation, which did a lot of good for major cities worldwide. The timing again was ideal due to the rapid development of computing power, such as the Digital Equipment Corporation (DEC) PDP-8 used by Jim and his team—the same machine that got Bill Gates going on Microsoft.

That’s enough to provide the gist of what Jim did that fascinated me from an early age and still amazes me as a chemical process development engineer and fan of stats and computers. Not only was he a great mind, Dad could do anything hands on—welding, wiring, soldering (a whiz at fixing TV’s and radios), plumbing, woodworking, etc. He brought me and my younger six siblings up to Saint Paul YMCA’s Camp DuNord for many years. Naturally when the time came to put up outhouses, he got volunteered. ; ) On of my proudest memories was Dad winning a contest for log splitting by cutting it in 4 parts with two chops—the advantage of being an engineer (and handy with an ax).

By the way, though a bit too young to serve in World War II, Jim graduated from training in Pensacola, Florida as a Naval Aviator before being honorably discharged in early 1949 due to reductions in the armed forces at the time. His service falls within the window of WWII vets, thus I feel that all-in-all he deserves consideration as an honorable member of the Greatest Generation. I can certainly say there will never be another individual of the caliber of James Joseph Anderson.

*Page down the Selected Urban Storm Water Runoff Abstracts, Second Quarterly Issue for #048 Interim Report to the Federal Water Pollution Control Administration on an May 1969 interim report on this demonstration project.

PS: A few more of Jim’s engineering achievements that I gleaned from searching the internet (probably not comprehensive):

  • Publication of “Remote Control of Combined Sewer Overflows” (co-authored by his colleague Robert L. Callery) in the Journal of the Water Pollution Control Federation, Vol. 46, No. 11 (Nov., 1974), pp. 2555-2564 (10 pages)—see the abstract for some very impressive statistics on reduction of pollution due to work by him and Watermation.
  • Him speaking on “Present Practice and Research Needs in Wastewater Collection System Design and Operation” for a workshop in 1975 sponsored by the EPA.
  • US Patent 4,168,233 for an “Automatic activated sludge control system” (published 9/18/1979)

No Comments

Distance learning vs in-person training—pros and cons

In March of 2020 when the Covid-19 pandemic came to a head with widespread quarantines, the Stat-Ease training team quickly Zoomed (pun intended) our workshops from in-person (IP) to distance learning (DL). It went amazingly well from the start.

Coincidentally, two of my grandchildren shifted from IP grade-school to DL at our home. The youngest, a kindergartner (Laine), benefited greatly by the oversight of my wife Karen—a retired preschool teacher. I helped the other (Archer), a third grader. It did not start well due to many technical difficulties and troublesome adjustments for teachers and students. We continued our DL home schooling the following school year due to the ongoing quarantine in Minnesota. By the time IP classes resumed, the DL went about as well as could be expected—the most difficult class being physical education, especially in the winter due to our home lacking a gym.

This unplanned experiment on DL across the range of child versus adult revealed a big interaction effect due to the age of the learners—IP being best for grade schoolers and DL being a very viable alternative for mature students. A few weeks ago, I got reinforcement for this observation when teaching cribbage IP as a volunteer to Laine—now in 4th grade—and two of her classmates. This would have been far harder DL.

The reason I’m bringing all this up is that my colleague Shari Kraber, who retired as our workshop manager but continues to provide training, asserts that “in-person training is not as ideal educationally and that the retention of the materials is BETTER using distance learning.”* I’m also a big fan of DL—far easier for me to teach from my home offices in my summer or my winter home (or on the road between). Google’s AI (Gemini) says that there’s no definitive answer on IP vs DL, and that the biggest factor is quality of the teaching and the materials, which makes a lot of sense to me.

Rachel Poleke, our current workshop manager, suggests that another big factor is the preference of individual students for IP versus DL. I totally agree: Ideally the delivery would be tailored to each student. This being impractical, Stat-Ease instead offers on a class-wide basis to deliver private training either way, depending on the preference of our client. For example, one year ago last September I traveled to Netherlands to teach a DOE workshop for a client headquartered in Leiden’s Bio Science Park. That was fun and very gratifying for the great response. It’s nice to take a break from Dl, benefitting by much stronger feedback from students (e.g., the ‘deer in the headlights’ look when clueless) and the ability to watch them work through case studies (our workshops are computer intensive).

Stat-Ease plans to present a rare IP public workshop—Modern DOE for Medical Devices—at our Minneapolis headquarters this year. This brings a huge advantage of DL training immediately to mind: Anyone from anywhere in the world can Zoom in, thus making it far easier for us to achieve a critical mass for class.

One thing I can say for sure—it’s great to have such a viable option for DL nowadays. When I first began working as a trainer of quality-engineering tools in the 1970’s, the technology for DL existed (e.g., PLATO) but, being pre-Internet and all, it could not compete with IP.

It will be interesting to see how things settle out in coming years for IP versus DL, both for corporate training and schooling at primary and secondary levels. Hopefully, the quality of education (based on subjective measures!) will not be lost in the shuffle of convenience for scheduling and the relative costs.

*1/1/25 Stat-Ease blog Ask An Expert: Shari Kraber

No Comments

Keep your black plastic spatulas for math-challenged scientists

Back in my day, when a boy misbehaved, he would be threatened with “The Paddle” being administered to his back side. I don’t advocate going back to this corporal punishment. However, with the news that scientists raising the alarm about a dangerous fire retardant in black plastic spatulas miscalculated 60 times 7,000 to a product of 42,000—off by a factor of 10,* I think these less fearsome implements could be repurposed to gently tap some math sense into their heads.

As you can see pictured, I found one, very old, black plastic spatula (classic KitchenAid!) in our kitchen utensil drawer, which I will retire from service but retain as a Halloween decoration or donate to the cause of math ‘education.’

*That viral black plastic kitchen utensil study was overblown thanks to a simple math mistake , Mashable, Tim Marcin, December 16, 2024.

No Comments

Colors to dye for

I grew up in the golden age for kids’ cereals, first with Trix from General Mills—introduced in 1954 in three colors: raspberry red, orangey orange and lemony yellow (now also wildberry blue, grapity purple and watermelon), followed in 1963 with Froot Loops from Kellog—also in red, orange and yellow—Toucan Sam style (now also green, blue and purple). Back then nobody worried much about how these manufacturers colored their cereals—artificially or otherwise. However, nowadays a consensus has built up about a “rainbow of risks” caused by synthetic food dyes. Political pressure across the spectrum from Gavin Newsome to Robert F Kennedy, Jr continues to build for banning these presumably harmful additives.

This sets the stage for some interesting history by American Heritage magazine on letting the food industry “poison” us as RFK, Jr puts it. Their Senior Editor Bruce Watson reported in the November/December issue how “many of our first food-safety laws arose after healthy young volunteers became sick when they tried commercial foods containing toxic additives.” These daredevils comprised “The Poison Squad” created in 1902 by Harvey Wiley Washington—who became known as the “Father of the Pure Food and Drugs Act” when it became law in 1906.

“NONE BUT THE BRAVE CAN EAT THE FARE.”

– Sign posted outside the Department of Agriculture building to enlist human ‘guinea pigs’

As historian Deborah Blum noted in her book The Poison Squad: One Chemist’s Single-Minded Crusade for Food Safety at the Turn of the Twentieth Century Washington deserves credit for “one of the most significant experiments in the 20th century.” For example, just prior to his crusading work, hundreds or perhaps thousands of children died from milk “embalmed” with formaldehyde.

Not to lessen the current concern over artificial dyes, we can be thankful for the relative safety of our food compared to the fare in the early 1900s. But I do not advocate going back to the days when potential poisons were tested on human subjects. Though I suppose there’s worse things than being tasked with eating large quantities of Trix and Froot Loops, provided, of course, that the milk is not embalmed. ; )

No Comments

Microwave popcorn still expanding nicely but in shrinking amounts

When I first ran a multifactor design of experiment (DOE) on microwave popcorn in 1993,* the bags contained 3.5 ounces of product. Since then, this product and many other foodstuffs suffered from shrinkflation—a way for their manufactures to fool us into paying the same for less. For example, Pop Secret—one of the snacks tested in my 1993 DOE, now comes in 3.2-ounce bags—a shrinkage of 8.6 percent over the years. Tricky!

I asked Google’s experimental Generative AI for stats on shrinkflation. GAI (my new go-to guy!) tells me that:

  • The most common products to experience shrinkflation are savory snacks, chocolate, and sweets. (Popcorn fits the bill.)
  • In the US, 71% of people have noticed shrinkflation, with 57% reporting multiple incidents in the past year. Baby boomers are more likely to notice shrinkflation than millennials and Gen Zers. (I am a baby boomer and I am well aware of this trend.)
  • Shrinkflation can be harder to notice than price increases because the price of the item stays the same, making it harder to budget. (That’s the idea!)
  • According to the US Bureau of Labor Statistics (BLS), shrinkflation has little impact on overall inflation rates. A BLS report from March said that the price of snacks inflated by 26% from January 2019 to October 2023. However, shrinkflation accounted for only 2.5 percentage points of the increase. (OK, so maybe we are making too big of a deal about this, but nobody likes to be tricked.)

The increasing cost of food products is currently creating a great deal of consternation, despite it seemingly abating. But so long as there’s plenty of delicious popcorn to share, even at a higher price for less of it, I don’t mind much.

However, when it comes to the recent trend for popcorn manufacturers selling “mini bags” with 1.5 ounces of product, I draw the line!

*Applying DOE to Microwave Popcorn

No Comments

Hoping to cell-abrate meat substitutes before I die

As a consultant on statistical design and analysis of experiments, I’ve been working with many leading-edge developers of cell-based meats (and fish). I am a carnivore—me loving a juicy burger, tender pulled pork, medium-rare steak or barbecued chicken. However, I’d happily switch to lab-grown protein once it passes a properly designed double-blind taste test. This will be a huge breakthrough by not killing animals and greatly reducing greenhouse gases—including “enteric fermentation” (nice way of referring to cow farts, ha ha).

Some experts do not foresee this happening in our lifetime according to this report last February by CBC. But after reading this cover story posted yesterday by Chemical & Engineering News on recent developments on lab-grown meats, I am more optimistic.

There is a fly in the food, so to speak, though: I cannot eat lab-grown meat while wintering in my Florida home—it’s been banned per this May 1 press release from Governor DeSantis. No fair!

“Today, Florida is fighting back against the global elite’s plan to force the world to eat meat grown in a petri dish or bugs to achieve their authoritarian goals.”

– Governor Ron DeSantis

By the way, I do agree with the Governor on one thing by not being a big fan of eating bugs. On the other hand, I applaud a Stat-Ease client from Bulgaria—Nasekomo (meaning ‘‘insect’’)—for developing a high-protein chicken feed made from soldier flies. I helped one of their researchers on her experimentation after first being assured that the EU approves the use of their product only for animals, not humans. She told me that chickens who eat the fly-based food tend to be less aggressive and healthier. Sounds good to me: Cock-a-doodle-do!

No Comments

Analytics explain why the NFL stiffs running backs

My Minnesota Vikings are on a roll this year due to unexpectedly stellar play from their quarterback Sam Darnold. After being drafted very highly, Darnold turned out to be a dud. But suddenly he blossomed—no doubt helped greatly by our superstar wide receiver Justin Jefferson. This Sunday the Vikings play in London against the New York Jets and their future hall-of-fame QB Aaron Rodgers.

There’s no doubt that quarterbacks are the most important factors for success in the NFL, so it’s no surprise that there’s a positive correlation of 0.7 between annual passing yards and annual revenue according to Harvard economist Roland Fryer.* But it’s quite shocking that he finds a negative correlation of 0.01 for the value of running backs. I agree with Fryer that its delightful to “see analytics put to good use but sad to see football’s best position taking a back seat.”

Go Darnold, go Vikes!

P.S. As reported earlier this year by SI, The NFL Treats Elite Wide Receivers Very Differently From Top Running Backs. As a case in point, they highlight the huge contract just signed by Jefferson. “Show me the money”—the demand given by the wide receiver to his agent Jerry McGuire played by Tom Cruise—isn’t working for running backs, though they do make a lot more money than kickers or punters as seen in this ESPN ranking of pay by position.

*Comments on “The Economics of Running Backs,” Wall Street Journal, September 4.

No Comments

Australia overcomes USA for Olympian heights: Seriously?

Now that Tom Cruise swooped in on the Stade de France outside of Paris and carried off the Olympic flag to Los Angeles, the final reckoning can be made on which country ‘won’ the 2024 Summer Games. I figured that by tying for tops in gold medals and winning the most silver and bronze, the USA was the clear winner.

However, to be fair, one must take population by country into account (within reason by excluding very small countries such as Grenada, who only need to win a few medals top the Olympic chart on a per capita basis). Earlier this year Robert Duncan and Andrew Parece proposed a population-adjusted probability-based index “U”.*

See how your country ranks in by this measure in this final ranking for the Paris Olympics. Aussies rule—gold medals to all! The Peoples Republic of China, who outnumber Australians by 53 to 1, fall to 89th on the list—second to last. Ouch! Kudos to France for coming in second (silver) and Great Britain third (bronze). The USA ranks fifth—not too bad.

Congratulations to all the Olympians and the organizers of this summer’s games for a very entertaining spectacle. Let’s not bogged down by the medal counts—all who participated get full credit for their all-out efforts.

*Per equation 9 in their Journal of Sports Analytics vol. 10, no. 1, pp. 87-104, 2024, research paper on Population-adjusted national rankings in the Olympics

No Comments

The secret sauce in Guinness beer?

I highly recommend Scientific American’s May 25 Opinion by Jack Murtagh explaining How the Guinness Brewery Invented the Most Important Statistical Method in Science. It nicely illustrates the t test—a landmark statistical method developed by William Sealy Gosset to assess a key ingredient in Guiness beer for ideal bitterness and preservation—soft resin content in hop flowers. Gosset calculated that a 1% difference in the amount of soft resins in the hops, the best and cheapest being purchased from Oregon,* increased their value to the brewery by almost 11%.

“Near the start of the 20th century, Guinness had been in operation for almost 150 years and towered over its competitors as the world’s largest brewery. Until then, quality control on its products had consisted of rough eyeballing and smell tests. But the demands of global expansion motivated Guinness leaders to revamp their approach to target consistency and industrial-grade rigor. The company hired a team of brainiacs and gave them latitude to pursue research questions in service of the perfect brew.”

  – Jack Murtagh

Back in 2017 on National Beer Day, celebrated yearly on April 7 to commemorate the end of USA’s prohibition of its sale, I saluted Gosset and his very useful t-test of the significance of one treatment versus another, that is, a simple comparative experiment.**

“They began to accumulate data and, at once, they ran into difficulties because their measurements varied. The effects they were looking for were not usually clearcut or consistent, as they had expected, and they had no way of judging whether the differences they found were effects of treatment or accident. Two difficulties were confounded: the variation was high and the observations were few.”

– Joan Fisher Box,*** “Guinness, Gosset, Fisher, and Small Samples,” Statistical Science, Vol. 2, No. 1 (Feb., 1987), pp. 45-52

To see how the t-test works, check out this awesome graphical app developed Even Miller. Using Stat-Ease software, I cross-checked it against a case study (Example 3.3) from the second edition of Box, Hunter and Hunters’ textbook Statistics for Experimenters. It lays out a simple comparative experiment by a tomato gardener who randomly splits 11 plants for treatment either with her standard fertilizer (A) or a far more expensive one (B) that supposedly produces far better yields. Here are the yield results in pounds, which you can assess using the t test:

  1. 29.9, 11.4, 25.3, 16.5, 21.1
  2. 26.6, 23.7, 28.5, 14.2, 17.9, 24.3

On average the new fertilizer increases the yield by nearly 2 pounds, but is the difference statistically significant? That would be good to know! I have the answer, but it would be no fun to tell you, being so easy to find out for yourself.

PS: Due to the large variation between plants (a greater than 6-pound standard deviation!), this tomato study is badly underpowered. If you do an experiment like this, do anything possible to get more consistent results. Then assess power for whatever the difference is that makes changing fertilizers worthwhile. For example, let’s say that with better plant management you got the standard deviation reduced to 3 pounds and a difference of 4 pounds is needed at a minimum to make the switch in fertilizer cost-effective. Then, using Stat-Ease software’s power calculator, I figure you would need to test 3-dozen plants each in your randomized experiment to achieve an 80% probability of detecting a difference of 4 pounds given a 3-pound standard deviation. I hope you like tomatoes!

*As reported by Eat This Podcast in their 4/10/18 post on Guinness and the value of statistics

**National Beer Day–A fine time for fun facts and paying homage to a wickedly smart brewer from Guinness

***I was very fortunate to meet Joan Fisher Box in 2019 as related in this StatsMadeEasy blog/

No Comments