HICSS 2009: Blogs Are Echo Chambers

> 3:1 agreement

Tony, a co-author of this work, dreamt up the very clever title (see full citation at end of post). I particularly love the use of the highly academic colon. I will present it at the Social Spaces minitrack, part of the Digital Media track (all very hierarchical). Soon I will release the data, code and algorithm specifics from this paper. I included urls in the text of the paper, so I really need to post it soon. I was very happy to see this work come together, and I very much look forward to seeing some of the other work at the minitrack. Plus, Hawaii in January (+ baby depending on how fussy she seems near ticket-buying time) will be awfully nice. I need to start shopping for parasols and shark repellent.

pdf Blogs Are Echo Chambers: Blogs Are Echo Chambers.
Proc. HICSS, 2009.

Carolyn Anne Gilbert

a lovely 6lbs 14oz

My wife and I are very blessed to (belatedly) announce the birth of our first child, Carolyn Anne Gilbert. She was born completely healthy on August 19, 2008 (one month before the CHI deadline for those of you who are counting). I am having a great time when I sleep enough. When I don’t sleep … again, we are blessed. This year’s CHI writing process was intense. Our lives are settling down now, however: Carolyn sleeps for 3 hours at a time now! (If you’re not a parent, that’s a big deal at this stage.) I will post soon about some of my new projects, including perhaps a post about the evolution of my site. I have resolved to post more incomplete work this year.

Blogs Are Echo Chambers

I recently finished a paper about blogs as echo chambers. Our project was heavily influenced by various books by Cass Sunstein, a law professor at the University of Chicago: InfoTopia, Republic.com and Why Societies Need Dissent. We were siting around in social seminar, and Karrie said, “it seems like most blogs are just echo chambers—everyone always agrees.” I said, “let’s see if we can prove it.” We hand-coded over 1,000 blog comments and wrote a paper on the project. It’s currently in submission, so I won’t post it; I’ll just include the abstract for now.

Abstract
In the last decade, blogs have exploded in number, popularity and scope. However, many commentators and researchers speculate that blogs isolate readers in echo chambers, cutting them off from dissenting opinions. Our empirical paper tests this hypothesis. Using a hand-coded sample of over 1,000 comments from 33 of the world’s top blogs, we find that agreement outnumbers disagreement in blog comments by more than 3 to 1. However, this ratio depends heavily on a blog’s genre, varying between 2 to 1 and 9 to 1. Using these hand-coded blog comments as input, we also show that natural language processing techniques can identify the linguistic markers of agreement. We conclude by applying our empirical and algorithmic findings to practical implications for blogs, and discuss the many questions raised by our work.

Citeulike, BibDesk and Pages

Every researcher has (and hopefully solves) the reference management problem, and yet it seems hard to find concrete information on how people do it. I use Apple’s Pages to write up my research. The major alternatives, Word and LaTeX, have two crucial flaws that just drive me crazy. First, and this is a big one, Word handles images very poorly. It does not float text around them well and it provides almost no help in alignment. LaTeX has the type and compile routine that disrupts my concentration. LaTeX does have one thing that I love: \cite{} plus BibDesk.

While writing my latest research paper, I found a way to get the best of LaTeX, BibDesk, citeulike and Pages—and quickly. I love citeulike. The early parts of research involve a lot of page-hopping from research paper to research paper. I often have 25 tabs open in this phase. Citeulike offers a convenient bookmarklet that parses major research sites for reference info (no more hunting for the issue number). Plus, it offers the standard amount of socialness. I love it. Now I can quickly connect BibDesk to citeulike to Pages. It goes like this.

1. Download, install and open BibDesk.
2. Right click on Library and select Add External File Group.
3. Enter http://www.citeulike.org/bibtex/user/yourciteulike?key_type=4
4. Download and install (per readme) CiteInPages.
5. Drag references, one or more at a time, into Pages.
6. Choose CiteInPages alpha numbered from the BibDesk scripts menu.

The CiteInPages scripts are wonderful and open source. This gives me the best of LaTeX and Pages. Very nice. I hacked together a nearly-compliant ACM-style template for BibDesk. Install it in BibDesk’s application support directory: ~/Library/Application Support/BibDesk/Templates. If you want to use it, you first point the CiteInPages alpha numbered script to it by editing the script. Such is the price for good and free.

I’m in love.

CHI 08 Talk: Network in the Garden

zip codes samples

I recently returned from CHI in Italy. I’m happy with how the Network talk turned out, and I’m also happy with the sense of closure that came with it. I got a few requests to post the slides, so here are the slides in PDF and on slideshare.

I received some excellent questions and comments, and I enjoyed meeting a number of people after the talk. Thanks! I wonder if the video will actually be posted in the ACM digital library this year.

Now onto new work and more deadlines…

Verb Paraphrasing Experiment

addressed ~ toasted (sometimes)

I’m taking an NLP class this semester, and it has been interesting. We just completed our first problem set: find verb pairs such that you can replace one with the other in at least one sentence (without changing the meaning of the sentence too much). Example: “President Bush addressed/toasted the crowd.”

For my part, I implemented an algorithm by Glickman and Dagan that takes a probabilistic and unsupervised approach to the problem. The reason I post this here is because my code will just rot on my machine unless I do something with it. The code works on the AQUAINT corpus, processed by minipar. The algorithm finds some legitimate paraphrases and also some bogus ones. The top 5 ranked verbs drawn from a New York Times corpus:

take approached (good)
become defined (not so good)
abandon put (bad)
planned mounted (good)
addressed toasted (good)

CHI 08 Best Paper Award!

I am very pleased to announce that The Network in the Garden has been awarded best paper at CHI 2008! I’m very honored, especially because the work was a pretty big risk in the first place. I look forward to presenting the paper in Florence! Get me on Facebook or email and let’s meet up.

CHI 2008: The Network in the Garden

corn and chicago
image courtesy of the Illinois state highway system

The Network in the Garden:
An Empirical Analysis of Social Media in Rural Life.
Proc. CHI, 2008.

ABSTRACT
History repeatedly demonstrates that rural communities have unique technological needs. Yet, we know little about how rural communities use modern technologies, so we lack knowledge on how to design for them. To address this gap, our empirical paper investigates behavioral differences between more than 3,000 rural and urban social media users. Using a dataset collected from a broadly popular social network site, we analyze users’ profiles, 340,000 online friendships and 200,000 interpersonal messages. Using social capital theory, we predict differences between rural and urban users and find strong evidence supporting our hypotheses. Namely, rural people articulate far fewer friends online, and those friends live much closer to home. Our results also indicate that the groups have substantially different gender distributions and use privacy features differently. We conclude by discussing design implications drawn from our findings; most importantly, designers should reconsider the binary friend-or-not model to allow for incremental trust-building.

Full paper as PDF

P.S. I am very happy to announce this paper—I’m especially proud of this work. And, yes, I reused the state highway sign from an earlier post. I love it!

Update (Apr 14): I just learned that danah boyd included this paper in her bibliography of research on social network sites. Thanks, danah!

Debate Diagrams: Primaries Visualization

debate diagrams

I built Debate Diagrams to make sense of the Democratic primary debates. In such a crowded field, the candidates need to distinguish themselves: one strategy is direct comparison. Debate Diagrams parses the transcripts of 5 officially sanctioned Democratic debates to place an arc between two candidates when one mentions another by name. The arcs become denser as they continue doing it.

My visualization draws substantial inspiration from Martin Wattenberg’s fantastic piece, The Shape of Song. It also follows on the heels of a similar visualization produced by the NY Times for last Sunday’s paper. It’s my first project in Flex.

Try the interactive version!

Paper: Social Data Analysis Workshop

codesaw screenshot
image adapted from Many Eyes

Visualization Annotation at Internet Scale, Social Data Analysis Workshop, CHI 2008.

Abstract
Visualization annotation allows users to communicate within a visualization as opposed to outside it. While effective in research settings, the technique has not found a home on today’s social data analysis sites. Scaling the technique to an Internet-sized audience represents the most significant obstacle to its wide-spread adoption. In this paper, we discuss the problem and propose four interaction techniques to help visualization annotation scale for a Web audience. Our designs strive for clarity of the underlying visualization while providing integrated and rapid feedback about annotations.

Full paper as PDF