Co-authorship patterns around Pope Francis

A little late in coming, but here’s a pretty picture based on a conference submission I’m preparing.

  1. Taking the revision history of all 607 unique editors who contributed to the article on Pope Francis after 1 Jan 2013.
  2. Get all the other 22,225 articles they revised since the beginning of the year.
  3. From this two-mode network, project a one-mode article-article network where one article is linked to another article if they share an editor in common.
  4. Filter out all the edges where there is only a single editor in common leaving articles than have been edited by two or more editors in common and remove the resulting isolates.
  5. Identify the largest connected component consisting of 2,671 articles and 3,144 edges.
  6. Visualize! Nodes are sized based on degree and colored based on modularity class. Data (including GraphML files for both the complete graph and LCC, a larger PNG, and a SVG) available here.

article-coauthorship-lcc_pope_20130101_4096

There’s a lot going on there and much more to see by looking around the full image, but I’ll give a few highlights.

The articles with the strongest tie (most editors in common)? A lot of ties between Pope Francis and other papal and Catholic-related articles round out the top 10 as one would expect, but there are some interesting outliers as well: Pier Luigi Bersani and Italian general election, 2013 with 42 editors in common, actually takes first and 2013 Malmö FF season and 2012–13 Svenska Cupen comes in 4th. This is to say these random articles shared at least 2 editors with the Pope Francis article but were themselves the subject of intense co-editing.

(u'Pier Luigi Bersani', u'Italian general election, 2013', {'weight': 42}),
 (u'List of popes', u'Pope Francis', {'weight': 37}),
 (u'Papal conclave, 2013', u'Pope Francis', {'weight': 31}),
 (u'2013 Malmxf6 FF season', u'2012u201313 Svenska Cupen', {'weight': 26}),
 (u'Pope Benedict XVI', u'Pope Francis', {'weight': 24}),
 (u'Papal conclave, 2013', u'Pope Benedict XVI', {'weight': 22}),
 (u'Pope Benedict XVI', u'Resignation of Pope Benedict XVI', {'weight': 22}),
 (u'Papal conclave, 2013',u'Resignation of Pope Benedict XVI',{'weight': 20}),
 (u'South American dreadnought race',u'Argentineu2013Chilean naval arms race',{'weight': 18}),
 (u'Timeline of Vietnamese history',u'First Chinese domination of Vietnam',{'weight': 18})

Of course, a lot of co-authorship was around other Catholic topics: the Papal Enclave, Pope Benedict XVI and his resignation, and other cardinal electors:

catholic

There is a lot of co-authorship around other topics that were also in the news:

breaking_news

Other topics of current events, but not peripheral to these coauthorship patterns include updates to Swedish football club rosters as well as editing of articles about members of the Baathist regime in Syria. Strangely, these two disparate topics are clustered together (both by modularity and by layout) suggesting they draw from a similar communities of editors.

football and syria

If you want to know more, hopefully our paper will be accepted and I can share it 🙂

UFOs in SimCity

SimCity is one of my favorite computer game franchises from my childhood. I began playing SimCity 2000 when it required writing bootup batch scripts to allocate 4 megs of memory in MSDOS circa 1995. The sequels SimCity 3000 and SimCity 4 were part and parcel of my gaming repertoire though much of my adolescence and inevitably shaped my current interests in computing and data analysis. As such, I eagerly await the release of the newest version, simply titled SimCity, this week so I can try to build a version of utopia then try to break it. However, this latest version of SimCity is being released in a social and political context mangled by ideological strife that threatens to appropriate my nostalgia and excitement for a computer game to abet the interests of an extremist political agenda. Case in point, SimCity has already been invoked during the 2012 Republican presidential primary campaign (H/T Grant Edgett). And for this, I will not stand.

tl;dr: Pundits won’t be able to resist using SimCity as an empirical justification for right-wing policies when one has nothing to do with the other.

SimCity was first released in 1987 by Will Wright and is based on the urban planning theories of Jay Forrester [4]. It puts players in the combined positions of mayor, urban planner, and god to manage a simulated city’s development through ordinances and fiscal policy, design expansions through zoning and transportation, and instigate natural disasters like earthquakes and UFO attacks. Wright was interviewed recently by the current lead designer of the new game and he talked about the importance of fiscal discipline, a word of course topical to a city simulator, but also substantially more fraught in the current political context.

What worries me is that SimCity could become a bludgeon wielded by pundits to provide “proof” of the efficacy of some policy approaches over other alternatives for negotiating the political straits we are in. I re-purpose the UFO term here to refer to “Undesired False-equivalency Op-eds” that attempt to pass off tired conventional wisdom as hip and contrarian commentary drawing on current events with geeky cache. Inevitably, some pundit is going to seize upon SimCity as an exemplar of the “realist” solutions we should embrace in response in response to the manufactured “deficit crisis” or self-inflicted “sequester”. In fact, the 500ish-word op-ed written by some “Very Serious Person” like Thomas Friedman, David Brooks, Ross Douthat, David Frum, or Megan McArdle will probably start something like this with classic head-fake, obligatory exposition, contrarian remix and appeal to new ideas, and finally concluding that sociopathic policies are reasonable and centrist compromises all the while mixing in repeated false equivalencies, concern trolling, and zombie ideas. It would probably read something like this:

With unemployment high, economic growth faltering, tax revenues plummeting, and pollution rising, political leaders are facing tough decisions about how to put the economy back on track by cutting services and raising revenue while also responding to complex challenges and investing in new initiatives. But first the hurricanes and meteor strikes have to be attended to.

This was the ripped-from-the-headlines scenario I faced as mayor of Centeropolis in the most recent release of city-building simulator SimCity. The game lets players lay out their city from scratch and manage its growth and budget by setting up zoning,  erecting schools, setting tax rates and enacting ordinances, and managing traffic through road construction and mass transportation. All without the messy details of having to file cloture, attend fundraisers, win elections, worry about elections in Italy, or simultaneously appease lobbyists and people in tri-corner hats.

Amidst the failure of either side budge or offer a compromise and in the absence of real leadership like Tip O’Neill and Ronald Reagan, SimCity’s sandbox offers a third way for us to evaluate how we could address the most serious problems facing our country today: the debt. How much in the way of services can we cut before the seams of society come loose? How much can be raised in revenue before citizens revolt? And is there a centrist and reasonable “Sim-Simpson and Bowles Commission” solution in between these extremes that Republicans and Democrats are committed to that could serve as a model going forward?

Sparing you the arcane and geeky details of how these simulations were setup or run, my research assistants recreated a region called Centeropolis consisting of several interconnected cities reflecting our modern, diversified economy in the midst of economic turmoil that possesses our country with low growth, poor healthcare, rampant unemployment, odious inequality, overworked institutions, decaying infrastructure, and most importantly of all, high debt. Because the government’s budget is just like a household’s, we sought to solve the debt problem as quickly as possible since that would then solve all the other problems.

The results of our simulation are striking. Implementing the Democratic plan of maxing out taxes and regulations while cutting security resulted in a Detroit-like wasteland of over-entitled and under-employed moocher SimCitizens. While some of the Republicans’ positions such as dissolving the Department of Education and requiring transvaginal ultrasounds for birth control are known for being too extreme, we adopted a mix of both sides’ approaches.

Cutting ordinances regulating businesses and lowering punitive taxes while showering subsidies on well-connected industries attracted new business while cutting expensive services such as healthcare and education greatly saved on costs in Centeropolis. Although secondary problems arose like life expectancy decreasing, pollution levels increasing, and research activity departing, we were able to solve the debt problem quickly. While this is not a perfect model because the other problems were not solved, it suggests that politicians should perhaps spend more time translating this obviously common sense SimCity scenario into common sense real world legislation. Is it too much to ask that our politicians respond to our real world crises the same way any of us would respond to the same SimCity crises?

While it’s obviously a caricature, I’m so confident of the appeal that “SimCity Authoritarian Technocrat Simulator as Panacea to Political Dysfunction” will have to pundits everywhere that I’m writing the rest of this blog post simply as a pre-emptive response. In fact, I’m so confident of the appeal of this framing to these usual suspects, I’ll donate $5 to Doctors without Borders every time it appears in a major outlet. Needless to say, this framing has a number of serious fallacies and flaws which should raise the heckles of responsible policymakers, pundits, and citizens. “Why do you care about this, Brian?” Well, it’s a disservice to games, a disservice to our democratic process, conflates the simulated with the real, and is incredibly elitist.

1. Expertise fallacy, or pundits pretending to have played the game. There is no way David Brooks, Thomas Friedman, or the rest of their ilk on the Aspen-Davos circuit would ever condescend to invest the time in playing a game like SimCity when there are panels, boards, and junkets that need their well-remunerated attention. There is something fundamentally offensive about elites entering the magic circle and appropriating a sandbox game that revels in the details of encouraging experimentation and testing boundary conditions of a software simulation to instead make pronouncements about the validity of their worldview of the week. To these elites, demystification, deconstruction, and investigation of how systems operate should be their domain, not something that rubes like you should be allowed to participate in as well by playing out your own simulations. These new computational literacies also contrast the innumeracy of our pundit class who reject indeterminism inherent to simulation: they are confounded if not outright hostile to people like Nate Silver doing “complex” analyses such using weighted averages of polls for forecasting with uncertainty if the results contradict their data-free pronouncements. Similarly, they are alternatively hostile to or scared of games like Grand Theft Auto that permit players the agency to construct narratives beyond the control of culture-makers like directors and writers who are elites like them. Pundits thus will feign numeracy and gain the cover of objectivity and relevance by “running the numbers” for you in playing out the simulation and report the findings produced by the infallible, objective computer that coincidentally reflect their preferences instead of encouraging you to explore the universes of alternatives.

2. Technocratic fallacy, or pundits revealing their anti-democratic dispositions. Pundits frame political dysfunction as something that could be solved if only the decision-making authority were bestowed on a special class of benevolent and data-driven technocrat-unicorns who had no agendas of their own yet can simultaneously resist “special interests” demands for spending on frivolous things like accessible healthcare or unpolluted drinking water. They think, “Sure, democracy is nice in theory, but it’s so messy in practice! It would be so much more six-sigma efficient if we just could leverage multi-level synergies by having a single decision-making body to prioritize some goals and preferences (that just I happen to share) over the goals and preferences of others.” SimCity is a game because players do not have to deal with the messiness of preserving coalitions, attending to legislative procedure, balancing special interests, or even ensuring legality of their actions like real world leaders and planners; it’s fun to see what happens by demolishing swaths of neighborhoods or putting coal plants next to elementary schools! In other words, “China’s approach to development is really great except for those parts about human rights abuses and environmental collapse!” Although this framing is both revealing and obviously appealing to elites, we retain the processes for self-government through democratic processes which means we can reject and change the choices made legislators and regulators, imperfectly and haltingly, but in time. Ralph Waldo Emerson attributed this outstanding quote to Fisher Ames, an 18th-century representative from Massachusetts: “a monarchy is a merchantman, which sails well, but will sometimes strike on a rock, and go to the bottom; whilst a republic is a raft, which would never sink, but then your feet are always in water.”

3. Mapping fallacy, or believing policy choices within a game or simulation are appropriate or realistic for real world. Just because a SimCity can be built relying entirely on mass transportation and clean energy doesn’t mean that its real world analogue would operate with anything like the same efficiency or efficacy. While simulations can be accurate, games need not be reflections of the real world and truths produced by complex machines need not be true: any simulation merely tells you what the programmers told it to tell you. Simulations can be valuable tools for promoting the understanding between complex relationships, but these relationships remain fundamentally influenced by the values and choices of the designers, not the world at large. Agent-based models like SimCity and related abductive approaches to inference absolutely have their place in the methodological toolbelt of computational social scientists and policymakers, but unless you’ve thought critically about the validity of mapping affordances of computed-mediated contexts to the offline world [1,2], leave the games to the gamers and the policy to the policymakers.

4. Abstraction fallacy. Believing that citizens are simply automatons following simple rules that result in predictable and manageable behaviors. This is a distinct type of elitism from that which I outlined under “Expertise Fallacy” because it seeks not to simplify the process of interpreting and understanding a simulation to preserve ideological dominance, but because it constructs the members of the system as abstractions in some other place lacking any autonomy and governed solely by simple rules unlike humans who are diverse, beautiful, unpredictable, and affected by policy choices. Previous versions of SimCity elided the importance of race, irrationality, or sociality when creating a simulation of the material dimensions of city life [5]: the citizens of SimCity do as their code tells them and mostly have no say in what happens to them or their city which works out well for aspiring despots like the player. The pundits’ proffered policies under the guise of validation-via-simulation will not happen to abstract automatons, they will affect a friend’s job or a parent’s healthcare. If there is any value to be gleaned from these inevitable UFOs, it will in the process of abstraction in the other direction and the possibilities these games offer to permit a “reconquest of a sense of place” [3]. These simulations reveal the systemic logic and large and interrelated chains of cause and effect so that familiar places are not seen in isolation but as a confluence of historical, geographic, economic, social, and political choices and relationships that we collectively have the power to influence.

It is absolutely the prerogative of every SimCity player construct a utopian fantasy or reduce her city to rubble because it is a game, not a macro-economic simulator of policy choices facing this nation. SimCity invites players to participate in the deconstruction of the game so that its rules and mechanisms become transparent and demystified. Pundits who will write these columns extolling the virtues of SimCity for rational technocratic governance will be misrepresenting the purpose of the game, oversimplifying the social, economic, and political choices we face, and most egregiously, misusing their position which could reveal and demystify the forces arranged to undermine our institutions to extol the virtues of what the designers themselves call a “toy.” In so doing, they not only reveal their own policy preferences which inevitably serve to benefit elites like them, but they will also provide ideological cover for partisan extremists intractably opposed to reforms and policies most Americans support.

The stakes of how we respond to the hostage-taking political processes that brought us the sequester and fiscal debacles are far too high: real people will be losing their livelihoods and lives so that Congressional teahadists can play out their small government fantasies. Instead of using their privileged position to hold these zealots and their failed policies to account, these pundits will be offering the thin gruel that a simulation like SimCity offers some tincture for what ails us. Might we at least maintain the pretense of keeping our ludic simulacra separate from the very real political stakes? We retain the means for correcting our situation through democratic self-governance, not distracting ourselves with a computer game (albeit, a fun and likely excellent one). Moreover, is there no freedom more essential than getting to replay one’s nostalgia unpolluted by the charlatanism of the pundits abetting teahadists’ zealotry, brinksmanship, and austerity?

See also

References
[1] http://www.dmitriwilliams.com/MappingFinal.pdf
[2] http://james.howison.name/pubs/HowisonEtAl-JAIS-SNA-Revision.pdf
[
3] http://cyber.eserver.org/friedman/default.html
[
4] http://www.deaquellamanera.org/files/Lobo_CityToy05LSE.pdf
[
5] http://www.eric.ed.gov/PDFS/ED384539.pdf

Sandy Hook School massacre

If you follow me on Twitter, you’re probably already well-acquainted with my views on what should happen in the wake of the shooting spree that massacred 20 children and 6 educators at a suburban elementary school in Newton, Connecticut. This post, however, will build on my previous analysis of the Wikipedia article about the Aurora shootings as well as my dissertation examining Wikipedia’s coverage of breaking news events to compare the evolution of the article for the Sandy Hook Elementary School shooting to other Wikipedia articles about recent mass shootings.

In particular, this post compares the behavior of editors during the first 48 hours of each article’s history. The fact that there are 43 English Wikipedia articles about shootings sprees in the United States since 2007 should lend some weight to this much ballyhooed “national conversation” we are supposedly going to have, but I choose to compare just six of these articles to the Sandy Hook shooting article based on their recency and severity as well as an international example.

Wikipedia articles certainly do not break the news of the events themselves, but the first edits to these article happen within two to three hours of the event itself unfolding. However, once created these articles attract many editors and changes as well as grow extremely rapidly.

Figure 1: Number of changes made over time.

The Virginia Tech article, by far and away, attracted more revisions than the other shootings in the same span of time and ultimately enough revisions in the first 48 hours (5,025) to put in within striking distance of the top 1000 most-edited articles in all of Wikipedia’s history. Conversely, the Oak Creek and Binghamton shootings, despite having 21 fatalities between them, attracted substantially less attention from Wikipedians and the news media in general, likely because these massacres had fewer victims and the victims were predominantly non-white.

A similar pattern of VT as an exemplary case, shootings involving immigrants and minorities attracting less attention, and the other shootings having largely similar behavior is also found in the the number of unique users editing an article over time:

Figure 2: Number of unique users over time.

These editors and the revisions they make cause articles to rapidly increase in size. Keep in mind, the average Wikipedia article’s length (albeit highly skewed by many short articles about things like minor towns, bands, and species) is around 3,000 bytes and articles above 50,000 bytes can raise concerns about length. Despite the constant back-and-forth of users adding and copyediting content, the Newtown article reached 50kB within 24 hours of its creation. However, in the absence of substantive information about the event, much of this early content is often related to national and international reactions and expressions of support. As more background and context as information comes to light, this list of reactions is typically removed which can be seen in the sudden contraction of article size as seen in Utoya around 22 hours, and Newtown and Virginia Tech around 36 hours. As before, the articles about the shootings at Oak Creek and Binghamton are significantly shorter.

Figure 3: Article size over time.

However, not every editor does the same amount of work. The Gini coefficient captures the concentration of effort (in this case, number of revisions made) across all editors contributing to the article. A Gini coefficient of 1 indicates that all the activity is concentrated in a single editor while a coefficient of 0 indicates that every editor does exactly the same amount of work.

Figure 4: Gini coefficient of editors’ revision counts over time.

Across all the articles, the edits over the first few hours are evenly distributed: editors make a single contribution and others immediately jump in to also make single contributions as well. However, around hour 3 or 4, one or more dedicated editors show up and begin to take a vested interest in the article, which is manifest in the rapid centralization of the article. This centralization  increases slightly over time across all articles suggesting these dedicated editors continue to edit after other editors move on.

Another way to capture the intensity of activity on these articles is to examine the amount of time elapsed between consecutive edits. Intensely edited articles may have only seconds between successive revisions while less intensely edited articles can go minutes or hours. This data is highly noisy and bursty, so the plot below is smoothed over a rolling average of about 3 hours.

Figure 5: Waiting times between edits (y-axis is log-scaled).

What’s remarkable is the sustained level of intensity over a two day period of time. The Virginia Tech article was still being edited several times every minute even 36 hours after the event while other articles were seeing updates every five minutes more than a day after the event. This means that even at 3 am, all these articles are still being updated every few minutes by someone somewhere. There’s a general trend upward reflecting the initially intense activity immediately after the article is created following increasing time lags as the article stabilizes, but there’s also a diurnal cycle with edits slowing between 18 to 24 hours after the event, before quickening again. This slowing and quickening is seen around about 20 hours as well as around 44 hours suggesting information being released and incorporated in cycles as the investigation proceeds.

Finally, who is doing the work across all these articles? The structural patterns of users contributing to articles also reveals interesting patterns. It appears that much of the editing is done by users who have never contributed to the other articles examined here, but there are a few editors who contributed to each of these articles within 4 hours of their creation.

Figure 6: Collaboration network of articles (red) and the editors who contribute to them (grey) within the first four hours of their existence. Editors who’ve made more revisions to an article have thicker and darker lines.

Users like BabbaQ (edits to Sandy Hook), Ser Amantio di Nicolao (edits to Sandy Hook), Art LaPella (edits to Sandy Hook) were among the first responders to edit several of these articles, including Sandy Hook. However, their revisions are relatively minor copyedits and reference formatting reflecting the prolific work they do patrolling recent changes. Much of the substantive content of the article is from editors who have edited none of the other articles about shootings examined here and likely no other articles about other shootings. In all likelihood, readers of these breaking news articles are mostly consuming the work of editors who have never previously worked on this kind of event. In other words, some of the earliest and most widely read information about breaking news events is written by people with fewer journalistic qualifications than Medill freshmen.

What does the collaboration network look like after 48 hours?

Figure 7: Collaboration network after 48 hours.

3,262 unique users edited one or more of these seven articles, 222 edited two or more of these articles, 60 had edited 3 or more, and a single user WWGB had edited all seven within the first 48 hours of their creation. These editors are at the center of Figure 7 where they connect to many of the articles on the periphery. The stars surrounding each of the articles are the editors who contributed to that article and that article alone (in this corpus). WWGB is an editor who appears to specialize not only in editing articles about current events, but participating in a community of editors engaged in the newswork on Wikipedia. These editors are not the first to respond (as above), but their work involves documenting administrative pages enumerating current events and mediating discussions across disparate current events articles. The ability for these collaborations to unfold as smoothly as they do appears to rest on the ability for Wikipedia editors with newswork experience to either supplant or compliment the work done by amateurs who first arrive on the page.

Of course, this just scratches the surface of the types of analyses that could done on this data. One might look at changes in the structure and pageview activity of each article’s local hyperlink neighborhood to see what related articles are attracting attention, examine the content of the article for changes in sentiment, the patterns of introducing and removing controversial content and unsubstantiated rumors, or broaden the analysis to the other shooting articles. Needless to say, one hopes the cases for future analyses become increasingly scarce.

The IPython Notebook and GEXF network files used in this analysis can be found here.

Edit: As always, Taha Yasseri is on the ball with his analysis of Wikipedia coverage of the events.

Disclosure: I edited the Sandy Hook article twice after publishing this post.