Widespread Worry and the Stock Market

Saturday, March 13th, 2010

UPDATE: I have released the classifiers, R scripts and aggregate data from this paper. The archive has a README to get you started and some example Java showing how to use the classifiers. Get it here.

I have a new paper at ICWSM 2010. I’m really looking forward to all the great work in the program. The central thesis of my paper: estimating anxiety, worry and fear from blogs provides some novel information about future stock market prices.

ABSTRACT: Our emotional state influences our choices. Research on how it happens usually comes from the lab. We know relatively little about how real world emotions affect real world settings, like financial markets. Here, we demonstrate that estimating emotions from weblogs provides novel information about future stock market prices. That is, it provides information not already apparent from market data. Specifically, we estimate anxiety, worry and fear from a dataset of over 20 million posts made on the site LiveJournal. Using a Granger-causal framework, we find that increases in expressions of anxiety, evidenced by computationally-identified linguistic features, predict downward pressure on the S&P 500 index. We also present a confirmation of this result via Monte Carlo simulation. The findings show how the mood of millions in a large online community, even one that primarily discusses daily life, can anticipate changes in a seemingly unrelated system. Beyond this, the results suggest new ways to gauge public opinion and predict its impact.

pdf Widespread Worry and the Stock Market.
Proc. ICWSM, 2010.

Who Do You Gossip About?

Monday, October 26th, 2009

I so should be doing other things. Like reviewing CHI papers. But I’m making fun little apps. I’ve always been fascinated with gossip. “Gossip” has a negative connotation, but it’s essential to social life. In a few bored hours last night, I wrote a little gossip app, and you can download it.

Before we go any further, there’s two pretty tight requirements: you need to use Mail.app on a Mac and you need to use IMAP. The app is called “Bit of Gossip.” It crawls through your sent mail looking for people you mention in the message body but don’t include on the recipient list. Don’t worry, we all do it. And don’t worry, the app does everything locally; your mail never leaves your machine.

This isn’t a research project. Just a fun little hack. It’s also pretty bare bones. A little dialog just pops up as it processes, then TextEdit shows you the results. Like I said, not a research project, not a finished project. But I found it pretty fun and enlightening. And, the name extraction stage can take a while. Oh, and it also handles nicknames (e.g., Tom is short for Thomas, etc. … I couldn’t get anything good without it.) The source is in there if you want it. Do with it what you will.

DOWNLOAD BIT OF GOSSIP
(drag to Applications)

Couldn’t have done it without the distinguished Stanford Named Entity Recognizer and Platypus. And of course Perl. Where would I be without you, darling?

Verb Paraphrasing Experiment

Tuesday, March 11th, 2008

addressed ~ toasted (sometimes)

I’m taking an NLP class this semester, and it has been interesting. We just completed our first problem set: find verb pairs such that you can replace one with the other in at least one sentence (without changing the meaning of the sentence too much). Example: “President Bush addressed/toasted the crowd.”

For my part, I implemented an algorithm by Glickman and Dagan that takes a probabilistic and unsupervised approach to the problem. The reason I post this here is because my code will just rot on my machine unless I do something with it. The code works on the AQUAINT corpus, processed by minipar. The algorithm finds some legitimate paraphrases and also some bogus ones. The top 5 ranked verbs drawn from a New York Times corpus:

take approached (good)
become defined (not so good)
abandon put (bad)
planned mounted (good)
addressed toasted (good)