If you follow me on Twitter, you’re probably already well-acquainted with my views on what should happen in the wake of the shooting spree that massacred 20 children and 6 educators at a suburban elementary school in Newton, Connecticut. This post, however, will build on my previous analysis of the Wikipedia article about the Aurora shootings as well as my dissertation examining Wikipedia’s coverage of breaking news events to compare the evolution of the article for the Sandy Hook Elementary School shooting to other Wikipedia articles about recent mass shootings.
In particular, this post compares the behavior of editors during the first 48 hours of each article’s history. The fact that there are 43 English Wikipedia articles about shootings sprees in the United States since 2007 should lend some weight to this much ballyhooed “national conversation” we are supposedly going to have, but I choose to compare just six of these articles to the Sandy Hook shooting article based on their recency and severity as well as an international example.
- Sandy Hook Elementary School shooting (Newtown): December 14, 2012, 14:49 UTC. 28 deaths, 1+ injuries. First edit at 18:05.
- 2012 Aurora shooting (Aurora): July 20, 2012, 06:38 UTC. 12 deaths, 59 injuries. First edit at 9:06.
- Wisconsin Sikh temple shooting (Oak Creek): August 5, 2012, 15:25 UTC. 7 deaths, 4 injuries. First edit at 18:12.
- 2011 Norway attacks (Utøya): July 22, 2011, 13:25 UTC. 77 deaths, 319 injuries. First edit at 13:57.
- 2011 Tucson shooting (Tucson): January 8, 2011, 17:10 UTC. 6 deaths, 14 injuries. First edit at 19:34.
- Binghamton shootings (Binghamton): April 3, 2009, 15:30 UTC. 14 deaths, 4 injuries. First edit at 17:09.
- Virginia Tech massacre (Virginia Tech): April 16, 2007, 13:40 UTC. 33 deaths, 17 injuries. First edit at 15:16.
Wikipedia articles certainly do not break the news of the events themselves, but the first edits to these article happen within two to three hours of the event itself unfolding. However, once created these articles attract many editors and changes as well as grow extremely rapidly.
Figure 1: Number of changes made over time.
The Virginia Tech article, by far and away, attracted more revisions than the other shootings in the same span of time and ultimately enough revisions in the first 48 hours (5,025) to put in within striking distance of the top 1000 most-edited articles in all of Wikipedia’s history. Conversely, the Oak Creek and Binghamton shootings, despite having 21 fatalities between them, attracted substantially less attention from Wikipedians and the news media in general, likely because these massacres had fewer victims and the victims were predominantly non-white.
A similar pattern of VT as an exemplary case, shootings involving immigrants and minorities attracting less attention, and the other shootings having largely similar behavior is also found in the the number of unique users editing an article over time:
Figure 2: Number of unique users over time.
These editors and the revisions they make cause articles to rapidly increase in size. Keep in mind, the average Wikipedia article’s length (albeit highly skewed by many short articles about things like minor towns, bands, and species) is around 3,000 bytes and articles above 50,000 bytes can raise concerns about length. Despite the constant back-and-forth of users adding and copyediting content, the Newtown article reached 50kB within 24 hours of its creation. However, in the absence of substantive information about the event, much of this early content is often related to national and international reactions and expressions of support. As more background and context as information comes to light, this list of reactions is typically removed which can be seen in the sudden contraction of article size as seen in Utoya around 22 hours, and Newtown and Virginia Tech around 36 hours. As before, the articles about the shootings at Oak Creek and Binghamton are significantly shorter.
Figure 3: Article size over time.
However, not every editor does the same amount of work. The Gini coefficient captures the concentration of effort (in this case, number of revisions made) across all editors contributing to the article. A Gini coefficient of 1 indicates that all the activity is concentrated in a single editor while a coefficient of 0 indicates that every editor does exactly the same amount of work.
Figure 4: Gini coefficient of editors’ revision counts over time.
Across all the articles, the edits over the first few hours are evenly distributed: editors make a single contribution and others immediately jump in to also make single contributions as well. However, around hour 3 or 4, one or more dedicated editors show up and begin to take a vested interest in the article, which is manifest in the rapid centralization of the article. This centralization increases slightly over time across all articles suggesting these dedicated editors continue to edit after other editors move on.
Another way to capture the intensity of activity on these articles is to examine the amount of time elapsed between consecutive edits. Intensely edited articles may have only seconds between successive revisions while less intensely edited articles can go minutes or hours. This data is highly noisy and bursty, so the plot below is smoothed over a rolling average of about 3 hours.
Figure 5: Waiting times between edits (y-axis is log-scaled).
What’s remarkable is the sustained level of intensity over a two day period of time. The Virginia Tech article was still being edited several times every minute even 36 hours after the event while other articles were seeing updates every five minutes more than a day after the event. This means that even at 3 am, all these articles are still being updated every few minutes by someone somewhere. There’s a general trend upward reflecting the initially intense activity immediately after the article is created following increasing time lags as the article stabilizes, but there’s also a diurnal cycle with edits slowing between 18 to 24 hours after the event, before quickening again. This slowing and quickening is seen around about 20 hours as well as around 44 hours suggesting information being released and incorporated in cycles as the investigation proceeds.
Finally, who is doing the work across all these articles? The structural patterns of users contributing to articles also reveals interesting patterns. It appears that much of the editing is done by users who have never contributed to the other articles examined here, but there are a few editors who contributed to each of these articles within 4 hours of their creation.
Figure 6: Collaboration network of articles (red) and the editors who contribute to them (grey) within the first four hours of their existence. Editors who’ve made more revisions to an article have thicker and darker lines.
Users like BabbaQ (edits to Sandy Hook), Ser Amantio di Nicolao (edits to Sandy Hook), Art LaPella (edits to Sandy Hook) were among the first responders to edit several of these articles, including Sandy Hook. However, their revisions are relatively minor copyedits and reference formatting reflecting the prolific work they do patrolling recent changes. Much of the substantive content of the article is from editors who have edited none of the other articles about shootings examined here and likely no other articles about other shootings. In all likelihood, readers of these breaking news articles are mostly consuming the work of editors who have never previously worked on this kind of event. In other words, some of the earliest and most widely read information about breaking news events is written by people with fewer journalistic qualifications than Medill freshmen.
What does the collaboration network look like after 48 hours?
Figure 7: Collaboration network after 48 hours.
3,262 unique users edited one or more of these seven articles, 222 edited two or more of these articles, 60 had edited 3 or more, and a single user WWGB had edited all seven within the first 48 hours of their creation. These editors are at the center of Figure 7 where they connect to many of the articles on the periphery. The stars surrounding each of the articles are the editors who contributed to that article and that article alone (in this corpus). WWGB is an editor who appears to specialize not only in editing articles about current events, but participating in a community of editors engaged in the newswork on Wikipedia. These editors are not the first to respond (as above), but their work involves documenting administrative pages enumerating current events and mediating discussions across disparate current events articles. The ability for these collaborations to unfold as smoothly as they do appears to rest on the ability for Wikipedia editors with newswork experience to either supplant or compliment the work done by amateurs who first arrive on the page.
Of course, this just scratches the surface of the types of analyses that could done on this data. One might look at changes in the structure and pageview activity of each article’s local hyperlink neighborhood to see what related articles are attracting attention, examine the content of the article for changes in sentiment, the patterns of introducing and removing controversial content and unsubstantiated rumors, or broaden the analysis to the other shooting articles. Needless to say, one hopes the cases for future analyses become increasingly scarce.
The IPython Notebook and GEXF network files used in this analysis can be found here.
Edit: As always, Taha Yasseri is on the ball with his analysis of Wikipedia coverage of the events.
Disclosure: I edited the Sandy Hook article twice after publishing this post.