Archive for September, 2010
Yankees leverage wins by throwing money at their players
Posted by mark in sports, Uncategorized on September 26, 2010
Today’s New York Times sports section provided this intriguing graphic on “putting a price tag on winning”. Their hometown Yankees stand out as the big spenders by far. It paid off in wins over the last decade – the period studied. However, if you cover up the point depicting the Yanks, the graph becomes far less compelling that salary buys wins – mainly due to counteractive results enjoyed by two low-payroll teams: The Minnesota Twins and the Oakland Athletics.
I found similar patterns and, more importantly, data to reproduce these, in this study of MLB Payroll Efficiency, 2006-2008 by Baseball Analyst Rich Lederer. No offense to Rich or the NY Times – it is the damn Yankees (sorry but I am weary of them defeating the Twins every post-season) who are the blame for this flaw in drawing conclusions from this data: One point exerts undue leverage on the fit, which you can see on this diagnostic graph generated by Design-Expert® software.
However, after doing the obvious thing – yanking the Yanks from the data, the conclusion remains the same: Higher payroll translates to more wins in Major League baseball. Here are the stats with/without the Yankees:
- R-squared: 0.41/0.34
- Wins per $ million of payroll (slope of linear fit): +0.13/0.16
In this case, a high leverage point does not exert the potential influence, that is, the end result does not change due to its location. If you’d like to simulate how leverage impacts fit, download this educational simulation posted by Hans Lohninger, Associate Professor of Chemometrics at Vienna University of Technology.
Minnesota’s ’08 Senate race dissed by British math master Charles Seife
Posted by mark in Basic stats & math, politics, Uncategorized on September 20, 2010
Sunday’s New York Times provided this review of Proofiness – The Dark Arts of Mathematical Deception – due for publication later this week. The cover, seen here in Amazon, depicts a stats wizard conjuring numbers out of thin air.
What caught my eye in the critique by Steven Strogatz – an applied mathematics professor at Cornell, was the deception caused by “disestimation” (as Proofiness author Seife terms it) of the results from Minnesota’s razor-thin 2008 Senate race, which Al Franken won by a razor-thin 0.0077 percent margin (225 votes out of 1.2 million counted) over Norm Coleman. Disestimation is the act of taking a number too literally, understating or ignoring the uncertainties that surround it; in other words, giving too much weight to a measurement, relative to its inherent error.
“A nice anecdote I like to talk about is a guide at the American Museum of Natural History, who’s pointing at the Tyrannosaurus rex. Someone asks, how old is it, and he says it’s 65 million and 38 years old. Sixty-five million and 38 years old, how do you know that? The guide says, well, when I started at this museum 38 years ago, a scientist told me it was 65 million years old. Therefore, now it’s 65 million and 38. That’s an act of disestimation. The 65 million was a very rough number, and he turned it into a precise number by thinking that the 38 has relevance when in fact the error involved in measuring the dinosaur was plus or minus 100,000 years. The 38 years is nothing.”
– Charles Seife (Source: This transcript of an interview by NPR.)
We Minnesotans would have saved a great deal of money if our election officials had simply tossed a coin to determine the outcome of the Franken-Coleman contest. Unfortunately, disestimation is embedded in our election laws, which are bound and determined to make every single vote count, even though many thousands in a State-wide race prove very difficult to decipher.
Stats reveal real ace of Twins pitching staff
Twins fever is running rampant now in Minnesota. The home baseball club is leading their division and riding a wave of popularity with a new stadium and a home-town hero (Joe Mauer – a fellow graduate of my high school in Saint Paul). Over dinner this week with a colleague and a master statistician in town for a visit the talk turned to the Twins and who now should be considered their ace pitcher – Carl Pavano or Francisco Liriano. Although appreciative of Pavano’s consistently good performance over the entire year, I felt that Liriano has come on stronger in the second half of the season. Having reached a stalemate, the three of us agreed that the data might tell the story.
I found everything I needed to make my case for Liriano at ESPN’s statistics site for Major League Baseball. Here is the pitching “split” for the second half of the MLB season so far for Liriano vs Pavano; respectively:
- Win-Loss: 7-0 vs 6-4
- WHIP (walks and hits per inning pitched): 1.27 vs 1.32
- ERA (earned run average): 2.22 vs 3.41
Pavano is good, but Liriano is my pick as the current ace of the Minnesota Twins pitching staff. Why argue with words? Let the data speak.
Harvard economist advises students of all ages to learn some statistics
Posted by mark in Basic stats & math on September 5, 2010
In this Sunday New York Times “Economic View” column, Harvard professor N. Gregory Mankiw advises that those who wish to pursue this “dismal science” take one or more courses in statistics while in college. He sees a dearth of knowledge on this subject in his first year students.
“High school mathematics curriculums spend too much time on traditional topics like Euclidean geometry and trigonometry. For a typical person, these are useful intellectual exercises but have little applicability to daily life. Students would be better served by learning more about probability and statistics.”
— N. Gregory Mankiw
I’m with him on learning more about stats but not at the expense of less geometry and trig, which come in very handy for anyone pursuing an engineering career. Also, budding economists could benefit from a little knowledge of period functions such as sine waves. It seems to me that what goes around comes around.