Posts Tagged statistics

Random thoughts

The latest issue of Wired magazine provides a great heads-up on random numbers by Jonathan Keats.  Scrambling the order of runs is a key to good design of experiments (DOE)—this counteracts the influence of lurking variables, such as changing ambient conditions.

Designing an experiment is like gambling with the devil: only a random strategy can defeat all his betting systems.

— R.A. Fisher

Along those lines, I watched with interest when weather forecasts put Tampa at the bulls-eye of the projected track for Hurricane Isaac.  My perverse thought was this might the best place to be, at least early on when the cone of uncertainty is widest.

In any case, one does best by expecting the unexpected.  That gets me back to the topic of randomization, which turns out to be surprisingly hard to do considering the natural capriciousness of weather and life in general.  When I first got going on DOE, I pulled numbered slips of paper out of my hard hat.  Then a statistician suggested I go to a phone book and cull numbers from the last 4 digits from whatever page opened up haphazardly.  Later I graduated to a table of random numbers (an oxymoron?).  Nowadays I let my DOE software lay out the run order.

Check out how Conjuring Truly Random Numbers Just Got Easier, including the background by Keats on pioneering work in this field by British (1927) and American (1947) statisticians.  Now the Australians have leap-frogged (kangarooed?) everyone, evidently, with a method that produces 5.7 billion “truly random” (how do they know?) values per second.  Rad mon!

,

No Comments

Statisticians no more—now “data scientists”

I spent a week earlier this month at the Joint Statistical Meetings (JSM)—an annual convocation of “data scientists”, as some of these number crunchers now deem themselves.  But most statisticians remain ‘old school’ as evidenced by this quote:

“Some time during the past couple of years, statistics became data sciences older, more boring sibling that always played by the rules.”

— Nathan Yau*

I tend to agree—being suspicious of changes in titles as a cover for shenanigans.  It seems to me that “data science” provides a smoke screen to take unwarranted leaps from shaky numbers.  As the shirt sold at JSM by American Statistical Association (ASA) says, “friends don’t let friends extrapolate.”

*Incorrectly attributed initially (my mistake) to Carnegie Mellon statistics professor Cosma Shalizi, who was credited by Yau for speaking up on this subject.

2 Comments

Beware of obvious answers and positive results

“Most results, including those that appear in top-flight peer-reviewed journals, can’t be reproduced.”

This is a “dirty secret” revealed by the Wall Street Journal’s Gautam Naik in this December report.  It cites statistics from Bayer that nearly two-thirds of published studies could not be replicated.  Naik blames the complicated nature of experiments nowadays along with the “positive bias” researchers driven to produce results.  Glenn Begley, vice president of research at Amgen, a biotechnology company, suggests that “academic scientists, like drug companies, should perform more experiments in a ‘blinded” manner to reduce any bias toward positive findings.”

Meanwhile, Duncan Watts, author of Everything is Obvious: *Once You Know the Answer says

“When you do the experiment properly [randomized and controlled], all the numbers go down.”

He’s speaking on the bias of marketing executives toward their own sensibilities, which often do not reflect those of the population being sold to.  See what the Financial Times “undercover economist” Tim Harford says about this here.  Unfortunately, in my experience, those (the analysts) who know better than to extrapolate from small, non-representative sample of opinions from the ‘powers-that-be’ (often n=1, that is—the Boss) get very little support for spending money to put these assertions to the test.  Even though you know the top dogs might be barking up the wrong tree it’s easiest just to go along with the pack and press ahead.  To do otherwise risks suffering a painful bite-back.  Yes, I am a cynic.

No Comments

Extracting Sunbeams from Cucumbers

With this intriguing title Richard Feinberg and Howard Wainer draw readers of Volume 20, Number 4 into what might have been a dry discourse: How contributors to The Journal of Computational and Graphical Statistics rely mainly on tables to display data.  Given that “Graphical” is in the title of this publication, it begs the question on whether this method of for presenting statistics really works.

When working on the committee that developed the ASTM 1169-07 Standard Practice for Conducting Ruggedness Tests, I introduced the half-normal plot for selecting effects from two-level factorial experiments.  Most of the committee favored this, but one individual – a professor emeritus from a top school of statistics – resisted the introduction of this graphical tool.  He believed that only numerical methods, specifically analysis of variance (ANOVA) tables, could support objective decisions for model selection.  My comeback was to dodge the issue by simply using graphs and tables – this need not be an either/or choice.  Why not do both, or merge them by putting number on to graphs – the best of both worlds?

“A heavy bank of figures is grievously wearisome to the eye, and the popular mind is as incapable of drawing any useful lessons from it as of extracting sunbeams from cucumbers.”

— Economists (brothers) Farquhar and Farquhar (1891)

In their article which can be seen here Feinberg and Wainer take a different tack (path of least resistance?): Make tables look more like graphs.  Here are some of their suggestions for doing so:

  • Round data to 3 digits or less.
  • Line up comparable numbers by column, not row.
  • Provide summary statistics, in particular medians.
  • Don’t default to alphabetical or some other arbitrary order: Stratify by size or some other meaningful attribute.
  • Call out data that demands attention by making it bold and/or bigger and/or boxing it.
  • Insert extra space between rows or columns of data where they change greatly (gap).

Check out the remodeled table on arms transfers which makes it clear that, unlike the uptight USA, the laissez faire French will sell to anyone.  It would be hard to dig that nugget out of the original data compilation.

No Comments

Clickers allow students to vote on which answer is right for math questions

Yesterday I attended a fun webinar on Interactive Statistics Education by Dale Berger of Claremont Graduate University.  Because I was multitasking  (aka “continuous partial attention” — ha ha) at work while attending this webinar my report provides just the highlights.  However, you can figure out for yourself what they (the stats dept at Claremont) have to offer by going to this web page offering WISE (Web Interface for Statistics Education) tutorials and applets.*

After the presentation a number of educators brainstormed on interactive stats.  David Lane of Rice U (author of many stat applets) suggested the use of “interactive clickers” – see this short (< 2 min.) newscast, for example.  I wonder what happen when a majority vote for the wrong answer?  For some teachers it might be easiest just to declare the most popular response as the correct answer.  That would be consistent with the way things seem to be going in politics nowadays. ; )

*Just for fun try the Investigating the Central Limit Theorem (CLT) applet (click the link from the page referenced above or simply click here).  This would be a good applet to provide when illustrating CLT using dice (such as is done in this in-class exercise developed by two professors from De Anza College). In this case, pick the uniform Population and sample size 2.  Then Draw a Sample repeatedly, and, finally, just Draw 100 samples.  Repeat this exercise with sample size 5 a la the game of Yahtzee (a favorite in my youth). Notice how as n goes up the distribution of averages becomes more normal and narrower. That’s the power of averaging.

No Comments

Mind-reading fish know I am out to catch them

Last week I enjoyed a relaxing sojourn up in the north woods of Wisconsin.  The resort encompasses its own pristine pine-ringed lake featuring a 26-foot fishing hole.  Just before I headed off for my vacation I read this Scientific American report on The Mind-Reading Salmon: The True Meaning of Statistical Significance.  Although I think they meant to be disrespectful of p-values in this case, my feeling, based on empirical evidence from a large sample size – hundreds of unsuccessful casts of my lure around the shore and over the hole, is that some fish living in isolated areas have developed mental telepathy.  How else do they avoid being caught?

PS. Here’s a picture of me in happier days at a different lake last summer.   My brother-in-law insisted that the first one to catch a crappie would have to kiss it.  Evidently this fish thought it might be fun to try, knowing I’d then release it back into the lake.

No Comments

An Easter experiment for those who still believe a bunny bears eggs* *(Beware of the green ones!)

Today’s Saint Paul Pioneer Press “Bulletin Board” provides an idea on how to provide some added delight for any children who still believe in the Easter Bunny: Have them plant one of their jelly beans, then watch for it to grow into a lollipop.  Doesn’t that sound like a fun experiment!

By the way, be careful with the green jelly beans – they cause acne (p<0.05) according to this exhaustive statistical-study of every available color.

No Comments

Armed and dangerous – switchblades and statistics

(Warning: Quirky material ahead =>)

Seeing this CBS News about Maine legalizing switchblades for one-armed people reminded me of a riddle about limbs that’s posed by some statisticians for educational purposes.  Here it is: “The great majority of people in [fill in your country here] have more than the average number of [choose either arms or legs here].”

For an answer {UK, legs}, see this posting on averages by Kevin McConway, Professor of Applied Statistics in the Department of Mathematics and Statistics at The Open University.  I heard this riddle also from Hans Rosling in his BBC TV program on “The Joy of Statistics.”*  He spoke of his home country of Sweden, whose inhabitants on average have 1.999 legs.

I’m quitting while I’m ahead.  Oops, this makes me wonder if I have an average number of heads – a scary thought, my hunch being that I’m below average for this.  I never imagined that averages could be so creepy!

*See this StatsMadeEasy blog on Rosling

 

1 Comment

Supreme Court overturns tyranny of statistical significance

In today’s Wall Street Journal, The Numbers Guy (Carl Bialik) reports on a unanimous ruling by the Supreme Court that companies cannot hide behind statistical significance (lack thereof in this case) as an excuse for nondisclosure of adverse research.  He passes along this practical advice:

“A bigger effect produced in a study with a big margin of error is more impressive than a smaller effect that was measured more precisely.”

— Stephen Ziliak, economics professor

However, this legal analysis of the ruling cautions that statistical significance remains relevant for assessing materiality of an adverse event.

Given all this, we can be certain of only one thing – more lawsuits.

 

2 Comments

Fun graphs and charts on names: How popular is yours and where is it populated?

My latest issue of National Geographic came with this fascinating mapping of population by surname.  Seeing “Anderson” looming large over Minnesota did not surprise me, but I didn’t realize how many of us “snow birds” had permanently escaped to California.  Take a look and see if you can locate any of you long-lost wander-kin around the USA.

The Junk Charts blog, one of my favorites, gave a generally favorable review of the “Nat-Geo” name chart, but they recommended an even-better one – the Baby Name Wizard, which plots the popularity of first names over the last 130 years. 

I am expecting my first grandchild this summer, so there’s been lots of talks about names lately, thus this statistical chart caught my eye.  You, too, may find it interesting. I suggest you start by hovering mouse over the widest streams (blue for boy, pink for girl) at the left (John, Mary, etc)* and then see how their popularity changes over the past 130 years.  A tip: Click the graph to see trends for any given name, or enter it directly.  Press “x” to get out of any specific name field (or type in another).  I typed in my name and saw an explosion of popularity in mid-20th century, but now it’s fading away.  The same holds true for my sister Nancy and my wife Karen – we all get tagged as baby-boomers straight away.

If you think there’s any chance of your name ranking in the top 1,000 for popularity in the USA at any time since 1880, type it in.  How do you do, _______ (<= name here)?

,

No Comments