ICWSM in Cambridge

After six weeks of travel that took me to Rio for WWW, Hamburg for Sunbelt, Copenhagen for NetSci, Chicago for hooding, Alaska for a wedding, Bloomington for PolNet, and St. Louis for another wedding, I am grateful that I don’t have to travel for ICWSM which will be at the Media Lab in Cambridge next week. For those of you unfamiliar with my adopted hometown, I wanted to share some of the sights, bites, and swigs you should check out while here.

Cambridge is the “city of squares” and any intersection you’ll find is a “square” named after one person or another. The bounding box for squares you should check out while here are Kendall Square (closest to MIT), Central Square (the primary commercial and transportation hub in Cambridge if the name didn’t already give it away), and Inman Square (the northeastern-most outpost). The Kendall Square area has changed dramatically since I graduated in 2006, so there is much to explore if you haven’t been back in a few years. Central Square has a “unique” flavor and inhabitants, but some excellents restaurants and bars. Inman Square isn’t terribly convenient to reach via public transportation, but has a number of great restaurants.

Sights

Definitely check out the MIT Museum as well as the Museum of Science if you have a chance — both are excellent. If you want to be a hardcore tourist, then doing a duck tour is essential as it’s conspicuous, interesting, and fun.

Cambridge also has an interesting mashup of club cultures — if you’re into indie music, then you’ll want to stop by the Middle East or All Asia or if a more traditional club music is your scene, then Middlesex Lounge is right across the street.

If you’re around Sunday afternoon, make sure to check out SOWA in Boston — arts, food trucks, and farmers market.

Bites

For breakfast and coffee, I would recommend: Flour (halfway to Central Square on Mass. Ave.), Voltage on 3rd St. (about 5 minutes east of the Kendall/MIT stop), and Cafe Luna (also in Central Square).

For vegetarians, the notoriously granola People’s Republic of Cambridge has a number of excellent options. The Clover foodtruck located in the street behind MIT Medical next to Kendall T-stop, Veggie Galaxy (diner-style food) and Life Alive (juices & salads) in Central Square. For dinner, Helmland (Afghan) and Similians (Thai) near Lechmere or Rangzen (Tibetan) or Baraka (African) near Central or Oleana near Inman are amazing.

For lunch options, you’ll probably end up at the food trucks next to the Kendall T-stop at least once: the lines for Momogoose (asian) and Clover (vegetarian) reveal the preferences of the crowd, but there’s typically also a Middle Eastern and Mexican food truck as well. Firebrand Saints has a really cool video deconstruction installation that’s worth checking out in addition to the sandwiches and roast chicken. There’s also a Chipotle, Au Bon Pain, & Cosi in Kendall if you absolutely must have culinary homogeneity.

For upscale or foodie options, Craigie on Main, Salts, and Cuchi Cuchi (near Central Square) are going to be your best options. Legal Seafood checks the upscale box, but locals generally do not go out of their way to go there for seafood especially when there’s places like Alive & Kicking Lobsters which is out of the way but worth the trip for the freshest lobster rolls. For outstanding high and low Mexican cuisine check out Ole and Olecito (respectively) in Inman, for Southern check out Hungry Mother or Tupelo in Inman. There are a few Asian and Indian places around as well, but none that I would particularly recommend.

Swigs

If you’re a beer snob, there are two notable places worth checking out. Meadhall is just around the corner from the Media Lab and has dozens of beers on tap and some very solid but expensive food. If you’re looking for a more intimate gathering, finding some unexpected beers, and exploring a bit more, then Lord Hobo near Inman Square is for you. Bukowski’s in Inman Square also has an excellent tap lineup and lots of space for groups. More traditional pubs like Atwood’s or The Druid can also be found in Inman. If you’re a cocktail snob, you’ll want to check out Brick & Mortar, Rendezvous, or Green Street, all of which are in Central Square.

Boston Marathon bombing

The Boston Marathon bombing occurred less than a mile from where I work and only 300 feet away from my first apartment after graduating from college. Fortunately, all my kith and kin are safe and well. The proximity and severity of this event has motivated me to expand a bit upon the prior analysis I’ve done of other current events on Wikipedia to examine a new type of data: pageviews. The Wikimedia Foundation makes data available about the number of times an article was requested every hour going back to late 2007. For most purposes, these data can be aggregated at the daily level and accessed via another service made by User:Henrik. It is important to keep in mind that pageviews are requests, not necessarily unique viewers although they are obviously highly correlated.

A characteristic feature of the pageviews around a Wikipedia article for a current news event is a peak followed by a decay. Here is an example from May 2011 about Osama bin Laden. On May 1, there were 7,557 views on the article, after the announcement of his death on late on May 1, the article was viewed more than 4.5 million times on May 2, and by May 31, the pageviews dropped off back to 23,795. (NOTE: The dates in the chart appear to be off by a day since the announcement happened late EDT on May 1 when it was May 2 in UTC).  The vast majority of the 7,557 views on May 1 occurred without any knowledge of his death and are similar to the pageview activity over the entire month of April (between ~6,000 and ~11,000). The magnitude of “burst” of pageview activity on May 2 is obviously indicative of a major event that drove many people to seek information about bin Laden in a narrow period of time.

Screen Shot 2013-04-17 at 6.24.45 PM

Similar patterns of pageview bursts are also found on articles related to bin Laden such as “Abbotobad” or “United States Naval Special Warfare Development Group” which are clearly related to the events of that day. Other articles such as “Saudi royal family” also exhibited characteristic bursts of activity around May 2 while articles such as “Bill Clinton” had no characteristic burst. This suggests that some pages were more related to the events of that day because they received similar types of intense attention versus other articles. In other words, the size of the pageview activity burst for an article on the day reflects users suddenly seeking information about a current news event.

Turning our attention away from this Osama bin Laden example and back to the Boston Marathon bombings, the bursts of pageview activity on a set of articles could reveal information about the event itself. Using the “Boston Marathon bombing” article as a seed, I extracted the 140 other articles the bombing article linked to. Of course, the text of this article is highly unstable and some of these links are likely to come and go. Nevertheless, I will use this list of 140 other articles to examine which received the largest bursts of activity. To quantify the magnitude of the pageview bursts across these articles, I simply took the median number of pageviews for all the articles over the 6 week period from March 1 through April 14 as a baseline. Then I took the maximum number of pageviews on either April 15 or April 16 (the most recent dates available). The ratio of these pageviews (maximum during the days following the event divided by the median over the days preceding the event) gives us some idea of which articles saw the greatest increases in pageview activity.

  1. Ground stop: 329.0
  2. Boylston Street: 268.73
  3. Google Person Finder: 237.22
  4. Patriots’ Day: 201.32
  5. Copley Square: 171.53
  6. Controlled explosion: 168.25
  7. The Lenox Hotel: 116.0
  8. Pressure cooker: 83.98
  9. Massachusetts Emergency Management Agency: 83.5
  10. Boston Police Special Operations Unit: 78.43
  11. BB (ammunition): 59.41

This list excludes a number of articles like “Edward F. Davis“, and “Pressure cooker bomb” that did not exist before April 15. However, the size of the bursts of pageview activity on Wikipedia articles (linked from from the bombing article itself) convey a surprising amount of information about the more salient details of the location, timing, cause, and effects of this story.

Each of these articles’ time series pageview data from March 1 through April 17 can be correlated with each other. For example, the correlation between pageviews for “Ball (bearing)” and “Brigham and Women’s Hospital” is 0.99, which strongly suggests the latter is viewed only when the former is also being viewed. Conversely, the correlation between “Ball (bearing)” and “USA Today” is only 0.13 suggesting the viewing activity for both articles is generally unrelated. These correlations can be done for every pair of articles to establish the relationship between their pageview activity. Thresholding these correlations at the 0.5 level, the resulting relationships can be represented as a correlation network. Here is the network below:

zoom

This image (click to embiggen) also tells a variety of stories despite the hairballness of the network. There are two distinct clusters of nodes: the bluish cluster corresponds to articles highly correlated with each other as they deal with topics pertaining to the bombing itself. These articles are the infrequently trafficked articles that all of a sudden attracted attention all together because of the bombings. The greenish cluster on the lower right reveals articles that are linked from the bombing article but aren’t tightly correlated with the bombing topics but are correlated with each other. These articles are more frequently trafficked and less closely related to the events themselves and pertain to major social institutions like newspapers, government agencies, and financial markets. Their clustering together suggests that being only loosely-related to the bombing itself, nevertheless remain closely-related to each other over time. Thus, this network suggests at least two distinct patterns of on-going Wikipedia use: abrupt information seeking about topics that are suddenly in the news versus on-going information seeking about institutions that are regularly in the news.

As always, this is simply a first cut of the analysis and I’m working on some other analyses that look at the pageview data at an hourly level of resolution and expand the corpus of articles from simply the articles linked from the bombing article to all other English Wikipedia articles. So stay tuned for more.

Co-authorship patterns around Pope Francis

A little late in coming, but here’s a pretty picture based on a conference submission I’m preparing.

  1. Taking the revision history of all 607 unique editors who contributed to the article on Pope Francis after 1 Jan 2013.
  2. Get all the other 22,225 articles they revised since the beginning of the year.
  3. From this two-mode network, project a one-mode article-article network where one article is linked to another article if they share an editor in common.
  4. Filter out all the edges where there is only a single editor in common leaving articles than have been edited by two or more editors in common and remove the resulting isolates.
  5. Identify the largest connected component consisting of 2,671 articles and 3,144 edges.
  6. Visualize! Nodes are sized based on degree and colored based on modularity class. Data (including GraphML files for both the complete graph and LCC, a larger PNG, and a SVG) available here.

article-coauthorship-lcc_pope_20130101_4096

There’s a lot going on there and much more to see by looking around the full image, but I’ll give a few highlights.

The articles with the strongest tie (most editors in common)? A lot of ties between Pope Francis and other papal and Catholic-related articles round out the top 10 as one would expect, but there are some interesting outliers as well: Pier Luigi Bersani and Italian general election, 2013 with 42 editors in common, actually takes first and 2013 Malmö FF season and 2012–13 Svenska Cupen comes in 4th. This is to say these random articles shared at least 2 editors with the Pope Francis article but were themselves the subject of intense co-editing.

(u'Pier Luigi Bersani', u'Italian general election, 2013', {'weight': 42}),
 (u'List of popes', u'Pope Francis', {'weight': 37}),
 (u'Papal conclave, 2013', u'Pope Francis', {'weight': 31}),
 (u'2013 Malmxf6 FF season', u'2012u201313 Svenska Cupen', {'weight': 26}),
 (u'Pope Benedict XVI', u'Pope Francis', {'weight': 24}),
 (u'Papal conclave, 2013', u'Pope Benedict XVI', {'weight': 22}),
 (u'Pope Benedict XVI', u'Resignation of Pope Benedict XVI', {'weight': 22}),
 (u'Papal conclave, 2013',u'Resignation of Pope Benedict XVI',{'weight': 20}),
 (u'South American dreadnought race',u'Argentineu2013Chilean naval arms race',{'weight': 18}),
 (u'Timeline of Vietnamese history',u'First Chinese domination of Vietnam',{'weight': 18})

Of course, a lot of co-authorship was around other Catholic topics: the Papal Enclave, Pope Benedict XVI and his resignation, and other cardinal electors:

catholic

There is a lot of co-authorship around other topics that were also in the news:

breaking_news

Other topics of current events, but not peripheral to these coauthorship patterns include updates to Swedish football club rosters as well as editing of articles about members of the Baathist regime in Syria. Strangely, these two disparate topics are clustered together (both by modularity and by layout) suggesting they draw from a similar communities of editors.

football and syria

If you want to know more, hopefully our paper will be accepted and I can share it 🙂