Sunday, August 9, 2009

Data Mining Isn't a Good Bet For Stock-Market Predictions

Slicing and dicing data to predict the future can get dicey.

The Super Bowl market indicator holds that stocks will do well after a team from the old National Football League wins the Super Bowl. The Pittsburgh Steelers, an original NFL team, won this year, and the market is up as well. Unfortunately, the losing Arizona Cardinals also are an old NFL team.

The "Sell in May and go away" rule advises investors to get out of the market after April and get back in after October. With the market up 17% since April 30, that rule isn't looking so good at this point.

Meanwhile, dozens -- probably hundreds -- of Web sites hawk "proprietary trading tools" and analytical "models" based on factors with cryptic names like McMillan oscillators or floors and ceilings.

There is no end to such rules. But there isn't much sense to most of them either. An entertaining new book, "Nerds on Wall Street," by the veteran quantitative money manager David Leinweber, dissects the shoddy thinking that underlies most of these techniques.

The stock market generates such vast quantities of information that, if you plow through enough of it for long enough, you can always find some relationship that appears to generate spectacular returns -- by coincidence alone. This sham is known as "data mining."

...

Mr. Leinweber got so frustrated by "irresponsible" data mining that he decided to satirize it. After casting about to find a statistic so absurd that no sensible person could possibly believe it could forecast U.S. stock prices, Mr. Leinweber settled on annual butter production in Bangladesh. Over an 13-year period, he found, this statistic "explained" 75% of the variation in the annual returns of the Standard & Poor's 500-stock index.

By tossing in U.S. cheese production and the total population of sheep in both Bangladesh and the U.S., Mr. Leinweber was able to "predict" past U.S. stock returns with 99% accuracy.

But the entire exercise, he says, is a total crock. There is no conceivable reason why U.S. stock returns would be determined by Bangladeshi livestock returns.

...
Read the full article