Saturday, March 13th, 2010

UPDATE: I have released the classifiers, R scripts and aggregate data from this paper. The archive has a README to get you started and some example Java showing how to use the classifiers. Get it here.

I have a new paper at ICWSM 2010. I’m really looking forward to all the great work in the program. The central thesis of my paper: estimating anxiety, worry and fear from blogs provides some novel information about future stock market prices.

ABSTRACT: Our emotional state influences our choices. Research on how it happens usually comes from the lab. We know relatively little about how real world emotions affect real world settings, like financial markets. Here, we demonstrate that estimating emotions from weblogs provides novel information about future stock market prices. That is, it provides information not already apparent from market data. Specifically, we estimate anxiety, worry and fear from a dataset of over 20 million posts made on the site LiveJournal. Using a Granger-causal framework, we find that increases in expressions of anxiety, evidenced by computationally-identified linguistic features, predict downward pressure on the S&P 500 index. We also present a confirmation of this result via Monte Carlo simulation. The findings show how the mood of millions in a large online community, even one that primarily discusses daily life, can anticipate changes in a seemingly unrelated system. Beyond this, the results suggest new ways to gauge public opinion and predict its impact.

pdf Widespread Worry and the Stock Market.
Proc. ICWSM, 2010.

Tuesday, October 27th, 2009

I’m happy to announce a new paper, a departure from my thesis work. It’s going to appear at CSCW 2010, and it looks at people who write product reviews that really look like other reviews. I call them deja reviewers. I’m also happy to report that the note got the best of CSCW award. Very cool!

ABSTRACT: People who review products on the web invest considerable time and energy in what they write. So why would someone write a review that restates earlier reviews? Our work looks to answer this question. In this paper, we present a mixed-method study of deja reviewers, latecomers who echo what other people said. We analyze nearly 100,000 Amazon.com reviews for signs of repetition and find that roughly 10–15% of reviews substantially resemble previous ones. Using these algorithmically-identified reviews as centerpieces for discussion, we interviewed reviewers to understand their motives. An overwhelming number of reviews partially explains deja reviews, but deeper factors revolving around an individual’s status in the community are also at work. The paper concludes by introducing a new idea inspired by our findings: a self-aware community that nudges members toward community-wide goals. (espresso machine courtesy of jakeliefer.)

pdf Understanding Deja Reviewers.
Proc. CSCW, 2010.

Wednesday, January 14th, 2009

Social media treats all users the same: trusted friend or total stranger, with little or nothing in between. In reality, relationships fall everywhere along this spectrum, a topic social science has investigated for decades under the theme of tie strength. Our work bridges this gap between theory and practice. In this paper, we present a predictive model that maps social media data to tie strength. The model builds on a dataset of over 2,000 social media ties and performs quite well, distinguishing between strong and weak ties with over 85% accuracy. We complement these quantitative findings with interviews that unpack the relationships we could not predict. The paper concludes by illustrating how modeling tie strength can improve social media design elements, including privacy controls, message routing, friend introductions and information prioritization.

We won best paper!

pdf Predicting Tie Strength With Social Media.
Proc. CHI, 2009.

Wednesday, October 1st, 2008

> 3:1 agreement

Tony, a co-author of this work, dreamt up the very clever title (see full citation at end of post). I particularly love the use of the highly academic colon. I will present it at the Social Spaces minitrack, part of the Digital Media track (all very hierarchical). Soon I will release the data, code and algorithm specifics from this paper. I included urls in the text of the paper, so I really need to post it soon. I was very happy to see this work come together, and I very much look forward to seeing some of the other work at the minitrack. Plus, Hawaii in January (+ baby depending on how fussy she seems near ticket-buying time) will be awfully nice. I need to start shopping for parasols and shark repellent.

pdf Blogs Are Echo Chambers: Blogs Are Echo Chambers.
Proc. HICSS, 2009.

Thursday, June 19th, 2008

I recently finished a paper about blogs as echo chambers. Our project was heavily influenced by various books by Cass Sunstein, a law professor at the University of Chicago: InfoTopia, Republic.com and Why Societies Need Dissent. We were siting around in social seminar, and Karrie said, “it seems like most blogs are just echo chambers—everyone always agrees.” I said, “let’s see if we can prove it.” We hand-coded over 1,000 blog comments and wrote a paper on the project. It’s currently in submission, so I won’t post it; I’ll just include the abstract for now.

In the last decade, blogs have exploded in number, popularity and scope. However, many commentators and researchers speculate that blogs isolate readers in echo chambers, cutting them off from dissenting opinions. Our empirical paper tests this hypothesis. Using a hand-coded sample of over 1,000 comments from 33 of the world’s top blogs, we find that agreement outnumbers disagreement in blog comments by more than 3 to 1. However, this ratio depends heavily on a blog’s genre, varying between 2 to 1 and 9 to 1. Using these hand-coded blog comments as input, we also show that natural language processing techniques can identify the linguistic markers of agreement. We conclude by applying our empirical and algorithmic findings to practical implications for blogs, and discuss the many questions raised by our work.

Tuesday, February 26th, 2008

I am very pleased to announce that The Network in the Garden has been awarded best paper at CHI 2008! I’m very honored, especially because the work was a pretty big risk in the first place. I look forward to presenting the paper in Florence! Get me on Facebook or email and let’s meet up.

Friday, January 11th, 2008

corn and chicago
image courtesy of the Illinois state highway system

The Network in the Garden:
An Empirical Analysis of Social Media in Rural Life.
Proc. CHI, 2008.

History repeatedly demonstrates that rural communities have unique technological needs. Yet, we know little about how rural communities use modern technologies, so we lack knowledge on how to design for them. To address this gap, our empirical paper investigates behavioral differences between more than 3,000 rural and urban social media users. Using a dataset collected from a broadly popular social network site, we analyze users’ profiles, 340,000 online friendships and 200,000 interpersonal messages. Using social capital theory, we predict differences between rural and urban users and find strong evidence supporting our hypotheses. Namely, rural people articulate far fewer friends online, and those friends live much closer to home. Our results also indicate that the groups have substantially different gender distributions and use privacy features differently. We conclude by discussing design implications drawn from our findings; most importantly, designers should reconsider the binary friend-or-not model to allow for incremental trust-building.

Full paper as PDF

P.S. I am very happy to announce this paper—I’m especially proud of this work. And, yes, I reused the state highway sign from an earlier post. I love it!

Update (Apr 14): I just learned that danah boyd included this paper in her bibliography of research on social network sites. Thanks, danah!

Monday, December 3rd, 2007

codesaw screenshot
image adapted from Many Eyes

Visualization Annotation at Internet Scale, Social Data Analysis Workshop, CHI 2008.

Visualization annotation allows users to communicate within a visualization as opposed to outside it. While effective in research settings, the technique has not found a home on today’s social data analysis sites. Scaling the technique to an Internet-sized audience represents the most significant obstacle to its wide-spread adoption. In this paper, we discuss the problem and propose four interaction techniques to help visualization annotation scale for a Web audience. Our designs strive for clarity of the underlying visualization while providing integrated and rapid feedback about annotations.

Full paper as PDF

Monday, October 8th, 2007

codesaw screenshot

CodeSaw: A Social Visualization of Distributed Software Development, Interact 2007

We present CodeSaw, a social visualization of distributed software development. CodeSaw visualizes a distributed software community from two important and independent perspectives: code repositories and project communication. By bringing together both shared artifacts (code) and the talk surrounding those artifacts (project mail), CodeSaw reveals group dynamics that lie buried in existing technologies. This paper describes the visualization and its design process. We apply CodeSaw to a popular open source project, showing how the visualization reveals group dynamics and individual roles. The paper ends with a discussion of the results of an online field study with prominent open source developers. The field study suggests that CodeSaw positively affects communities and provides incentives to distributed developers. Furthermore, an important design lesson from the field study leads us to introduce a novel interaction technique for social visualization called spatial messaging.

Full paper as PDF