Statisticians apply stylometry to identify authors and they invent algorithms that assess essays


My colleague Tryg, who, like me, loves word play, drew my attention to this podcast* that explains how “By Their Words You Shall Know Them.”  I teed it up on my smart phone and listened on my way to work yesterday—a fun way to pass my half hour commute into Minneapolis from my home in Stillwater, Minnesota.  One thing that caught my ear was the early 1960s work by Harvard statistician Frederick Mosteller to pin down who wrote 12 of the 85 Federalist papers published under the pen name “Publius”.  He and colleague David Wallace (University of Chicago) applied Bayes; theorem to attribute these writings to James Madison (as opposed to Alexander Hamilton).  Mosteller also led the way to today’s reliance on statistics in sports by doing the first known academic analysis of baseball in 1946—concluding that luck rules even in a seven game World Series.  He didn’t agree that, though the Cardinals beat his home town Red Sox, the best team actually won.

This analytical dissection of written words has come to be known as “stylometry”.  As computing power increases and algorithms develop, writings are being put to the test.  For example, see this New York Times Digital Domain column from earlier this month that details developments in ‘essay-scoring engines’.  For now the students hold the upper hand on computer-based grading of papers—web-based essay mills can easily throw together fact-laden gibberish that fools the virtual professors.  These are easily seen by teachers when they skim the results—check out some goofy passages passed along by Duke University professor Dan Ariely in this editorial for the Los Angeles Times .

The advent of spell-checking and grammar inspection in word processors has been a boon for writers.  However, passing these tests does not necessarily lead to clear prose.  When I started work as an engineer, the head of our process development group handed me a little booklet by Robert Gunning on “How to Take the Fog Out of Writing”.  He advocated short, active sentences—not the passive, long and pedantic style I’d grown accustomed to from academia.  See how your writing scores for fog using this online tool by Simon Bond.  The quote below scored 20.86.  This paragraph came back with a fog index of 9.152 (up to this clause to be precise!).  Gunning’s score estimates the years of formal education needed to understand text on a first reading.  Thus my writing supposedly can be understood by 10th grader.  Draw your own conclusions on the readability of our founding fathers.

“As there is a degree of depravity in mankind which requires a certain degree of circumspection and distrust, so there are other qualities in human nature which justify a certain portion of esteem and confidence.”

– Madison, Federalist Papers #55, 346

*By online Slate magazine’s Lexicon Valley host Mike Vuolo

  1. No comments yet.

You must be logged in to post a comment.