Obscurity does not equal profundity


“GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking.” This is how Steve Lohr of the New York Times leads off his article in today’s Sunday paper on The Age of Big Data. Certainly the abundance of data has created a big demand for people who can crunch numbers. However, I am not sure the end result will be nearly as profitable as employers may hope.

“Many bits of straw look like needles.”

– Trevor Hastie, Professor of Statistics, Stanford University, co-author of The Elements of Statistical Learning (2nd edition).

I take issue with extremely tortuous paths to complicated models based on happenstance data.  This can be every bit as bad as oversimplifications such as relying on linear trend lines (re Why you should be very leery of forecasts). As I once heard DOE guru George Box say (in regard to overly complex Taguchi methodologies): Obscurity does not equal profundity.

For example, Lohr touts the replacement of earned run average (ERA) with the “Siera”—Skill-Interactive Earned Run Average. Get all the deadly details here from the inventors of this new pitching performance metric. In my opinion, baseball itself is already complicated enough (try explaining it to someone who only follows soccer) without going to such statistical extremes for assessing players.

The movie “Moneyball” being up for Academy Awards is stoking the fever for “big data.” I am afraid that in the end the call may be for “money back” after all is said and done.

  1. #1 by Tom Pyzdek on February 16, 2012 - 6:41 pm

    More optimistically we might hope that this creates a group of leaders who use data to help them with their decisions, rather than flying by the seat of their pants. While I agree that statistically designed experiments are the gold standard for data-driven learning, I find it hard to accept the extreme position: ignore all data that are not the result of a DOE. While I’ve seen more than a few researchers misled by happenstance data, I’ve also seen many occasions where historical data has helped add a bit of rigor to opinions and provide ideas and hypotheses that could be examined with DOE.

  2. #2 by Tom Pyzdek on February 16, 2012 - 6:44 pm

    PS: I agree with you that we baseball fans can live without Siera. As for me, I prefer the strikeout to walk ratio to the ERA. It has two advantages: (1) it only involves the pitcher and the batter and (2) it’s not a new statistic in baseball. I could go on, but it’s probably best for all of us if I don’t.

  3. #3 by mark on February 19, 2012 - 4:43 pm

    Strikeout to walks is a good one for keeping it simple statistically (KISS). In this Bill James Online blog, John Dewan offers A Couple of Alternatives to ERA including one with great appeal to me for KISS. It’s the Opposing OPS (On-base plus Slugging) or “OOPS.”

You must be logged in to post a comment.