The Dangers of Data Mining

Data MiningI recommend this outstanding article on data mining with links to other relevant sources. Data mining is not a bad thing a priori. However, in connection with investing one has to be very careful. This expression describes an attempt to find a time series that would explain the movements of the stock market at any cost. This model would thus be able to predict the stock market.

Several yeasrs ago a blatant example of data mining was published. They showed that the annual production of butter in Bangladesh, in the USA and the state of sheep population in both countries can explain more than 90% of the movements of the US index S&P 500. Naturally, it is obvious to everybody that these variables cannot be responsible for the returns of this stock market index. This is should be a warning to everybody. Even sofisticated models including economic variables (GDP, unemployment, interest rates etc.) can be the result of data mining which means that the relationship or correlation between them and the stock market is just the result of randomness.

Given the vast number of time series and the computing power that are available today it is hardly suprising to find a model or correlation that explains the stock market pretty well. Moreover, we have to bear in mind that with 100 000 regressions (I can imagine that the number of regressions conducted every day can amount to several ten thousands) we will end up with 5000 false positives (95% significance). In other words, we have to be very careful when somebody proclaims to have a model that most certainly can predict the stock market gyrations. It may just be random. Because if you torture data long enough, it will confess eventually.


pošli na vybrali.sme.sk

Tags: , , , ,

Ak Vás článok zaujal, rád si prečítam Váš názor.

*


PageRank ikona zdarma