Twitter mood predicts the stock market?

I've been following the trend toward analyzing online data for emotion and sentiment, and attempting to parse that information for useful patterns. (There is even a proposed emotion markup language standard.) One practical area is voice analytics in the call center. Another example is what we are doing at Pega, which is to analyze Twitter conversations and harvest potential brand damaging or other sentiments into a multi-channel customer service hub.

Now companies are even promising to predict movements in the stock market based on online sentiment.

One UK hedge fund, Dervent Capital Markets, is actually making a point of using Twitter data as a guide to potential stock market swings. Check out this publically available research paper from academics they are working with at the University of Manchester and Indiana University, Johan Bollen, Huina Mao and Xiao-Jun Zeng.  The title says it all: “Twitter mood predicts the stock market". The authors claim that they can obtain “an accuracy of 87.6% in predicting the daily up and down changes in the closing values of the DJIA and a reduction of the Mean Average Percentage Error by more than 6%.” In quant-speak, I understand that that is pretty darn good. They use OpinionFinder that “measures positive vs negative mood and Google profile of mood states (GPOMS) that measures mood in terms of six dimensions (Alert, Sure, Vital, Kind, and Happy]“. They then compare this data with a Granger causality analysis and a “Self-Organizing Fuzzy Neural Network” (what a great expression, every home should have their own self-organizing fuzzy neural network!).

Figure 1 from "Twitter Predicts Stock Market" by Bullen, Mao and Zeng

Figure 1 from Johan Bollen, Huina Mao and Xiao-Jun Zeng, October 2010.  

I sincerely hope that the algorithms that are being used for this will also be able to detect patterns of faux traffic that represents a clear attempt to manipulate these systems. There is an entire industry out there of companies whose sole purpose is to pump up Google rankings by splogging massive numbers of bogus links back to websites. How hard could it be to systematically distort other systems that monitor this chatter for truth? I can see Vin Diesel in a sequel to “Boiler Room” where instead of high-pressure phone calls to widows and retired investors you have instead of massive networks of unethical splog farms. Perish the thought.