CHI 08 Talk: Network in the Garden

zip codes samples

I recently returned from CHI in Italy. I’m happy with how the Network talk turned out, and I’m also happy with the sense of closure that came with it. I got a few requests to post the slides, so here are the slides in PDF and on slideshare.

I received some excellent questions and comments, and I enjoyed meeting a number of people after the talk. Thanks! I wonder if the video will actually be posted in the ACM digital library this year.

Now onto new work and more deadlines…

Verb Paraphrasing Experiment

addressed ~ toasted (sometimes)

I’m taking an NLP class this semester, and it has been interesting. We just completed our first problem set: find verb pairs such that you can replace one with the other in at least one sentence (without changing the meaning of the sentence too much). Example: “President Bush addressed/toasted the crowd.”

For my part, I implemented an algorithm by Glickman and Dagan that takes a probabilistic and unsupervised approach to the problem. The reason I post this here is because my code will just rot on my machine unless I do something with it. The code works on the AQUAINT corpus, processed by minipar. The algorithm finds some legitimate paraphrases and also some bogus ones. The top 5 ranked verbs drawn from a New York Times corpus:

take approached (good)
become defined (not so good)
abandon put (bad)
planned mounted (good)
addressed toasted (good)

CHI 08 Best Paper Award!

I am very pleased to announce that The Network in the Garden has been awarded best paper at CHI 2008! I’m very honored, especially because the work was a pretty big risk in the first place. I look forward to presenting the paper in Florence! Get me on Facebook or email and let’s meet up.

CHI 2008: The Network in the Garden

corn and chicago
image courtesy of the Illinois state highway system

The Network in the Garden:
An Empirical Analysis of Social Media in Rural Life.
Proc. CHI, 2008.

ABSTRACT
History repeatedly demonstrates that rural communities have unique technological needs. Yet, we know little about how rural communities use modern technologies, so we lack knowledge on how to design for them. To address this gap, our empirical paper investigates behavioral differences between more than 3,000 rural and urban social media users. Using a dataset collected from a broadly popular social network site, we analyze users’ profiles, 340,000 online friendships and 200,000 interpersonal messages. Using social capital theory, we predict differences between rural and urban users and find strong evidence supporting our hypotheses. Namely, rural people articulate far fewer friends online, and those friends live much closer to home. Our results also indicate that the groups have substantially different gender distributions and use privacy features differently. We conclude by discussing design implications drawn from our findings; most importantly, designers should reconsider the binary friend-or-not model to allow for incremental trust-building.

Full paper as PDF

P.S. I am very happy to announce this paper—I’m especially proud of this work. And, yes, I reused the state highway sign from an earlier post. I love it!

Update (Apr 14): I just learned that danah boyd included this paper in her bibliography of research on social network sites. Thanks, danah!

Debate Diagrams: Primaries Visualization

debate diagrams

I built Debate Diagrams to make sense of the Democratic primary debates. In such a crowded field, the candidates need to distinguish themselves: one strategy is direct comparison. Debate Diagrams parses the transcripts of 5 officially sanctioned Democratic debates to place an arc between two candidates when one mentions another by name. The arcs become denser as they continue doing it.

My visualization draws substantial inspiration from Martin Wattenberg’s fantastic piece, The Shape of Song. It also follows on the heels of a similar visualization produced by the NY Times for last Sunday’s paper. It’s my first project in Flex.

Try the interactive version!

Paper: Social Data Analysis Workshop

codesaw screenshot
image adapted from Many Eyes

Visualization Annotation at Internet Scale, Social Data Analysis Workshop, CHI 2008.

Abstract
Visualization annotation allows users to communicate within a visualization as opposed to outside it. While effective in research settings, the technique has not found a home on today’s social data analysis sites. Scaling the technique to an Internet-sized audience represents the most significant obstacle to its wide-spread adoption. In this paper, we discuss the problem and propose four interaction techniques to help visualization annotation scale for a Web audience. Our designs strive for clarity of the underlying visualization while providing integrated and rapid feedback about annotations.

Full paper as PDF

Visualizations of Race and Money in Chicago

white and black workers
white and black workers in Chicago

I started a submission to the InfoVis 2006 contest last year, but I decided not to submit it a few weeks before the deadline. The code has hung around my Processing directory for over a year, untouched. During an after-class chat, I brought it up with Karrie and she suggested that I put it here. I decided not to submit mostly because I was a naive first-year student. (The data contains a significant amount of noise — the locations are not exact, for example — but the “triangulation” from many different individual surveys evens it out, IMO.)

The image above shows three screenshots from a visualization of white and black workers in Chicago. Chicago is a notoriously segregated city, and I was interested in whether jobs do anything to bring whites and blacks together. I extracted the data from the 1% PUMS Census dataset. The visualization makes the following mappings: worker’s home (center of circle), race (circle color), length of daily commute to work (size of circle, animates through the commute) and the aggregation of all workers in Chicago at any moment of the day. The visualization animates through time, but unfortunately I could not get Processing’s movie maker library to give me something presentable.

money in the city
the concentration of wealth in Chicago at different times of the day

The second image (above) remaps yearly salary to color (to the saturation of green, particularly). You can watch the city wake up, and you can see when the poor, the middle-class and the very wealthy stockbrokers leave for work in the morning. There is a substantial second-shift around 2:00 PM as well. (I wish the Census data included “return home” time too, but alas.) It’s really quite neat in animation: I included a little feature that highlights all the people just leaving for work, then fades them to their appropriate colors shortly thereafter.

Grasses or Cords

bundles of cords (or grasses)

Living in the prairie, I like grasses. This weekend, I played around with code that generates a system of grass (or cords) bundles (like CAT5 cords). A sprig of grass usually joins an existing bundle, but with some very low probability will strike out on its own. Thick clumps have a higher probability of getting new sprigs than thin clumps: a rich-get-richer scheme. Sprigs that do not attract any friends get killed off by a periodic layer of transparency. The three images above represent 3 different experiments to distribute the clumps: dense noise, sparse randomness and dense randomness. No data…just something to do besides read research papers for a while.

RedSpace, BlueSpace

corn and chicago
image courtesy of the Illinois state highway system. thanks!

I recently completed a project examining differences between rural and urban MySpace users. Currently, I have a paper in submission. ssh. This is really just a placeholder for that paper once it’s published (somewhere). I took an quantitative approach, and found the following: rural users have much smaller networks much closer to home, rural users value privacy more and women represent a much greater proportion of rural users. I will write more once the paper comes out.

Remodeling Reader

reader tag cloud interface
screenshot of remodeled list of feeds

Google Reader is one of my favorite apps. My biggest complaint, however, is the waste of space in the list of feeds on the left side of the interface. This summer I toyed around with a Greasemonkey script that transforms the interface into a tag cloud. It’s not perfect, but I use it day to day. To use: install GreaseMonkey, then install the script (you only need to click the link — GreaseMonkey takes care of the rest).