On Monday, October 27 Andy Baio posted an analysis of 72 hours of tweets with the #Gamergate hashtag. With the very best of intentions, he also shared the underlying data containing over 300,000 tweets saved as CSV file. There are several technical and potential ethical problems with that, which I’ll get to later, but in a fit of “rules are for thee, not for me,” I grabbed this very valuable data while I could knowing that it wouldn’t be up for long.
I am starting a new job in November. This is not a prank like last time. But before the grand reveal of where, first I’ll subject you to a lengthy blog post about my thoughts about the how and why. Hopefully this provides an additional perspective to the excellent posts by Lana Yarosh and Jason Yip on their experiences on the computer/information science academic job market. But those of you who know the rhythms of the academic job market are already realizing that (spoiler alert), I’m not starting a tenure-track faculty role. Instead, I’m going to spend the next few years being a data scientist. But I definitely promise not to be this guy:
I promised to do a bigger tear down of Wikipedia’s coverage of currents events like Robin Williams’ death and the protests in Ferguson, Missouri this week, but I wanted to share a quick result based on some tool-development work I’m doing with the Social Media Research Foundation‘s Marc Smith. We’re developing the next version of WikiImporter to allow NodeXL users to import many of the multiple types of networks in MediaWikis [see our paper].
On Wednesday, we scraped the 1.5-step ego network of the articles that the Robin Williams article currently connects to and then whether or not these articles also link to each other. For example, his article links to the Wikipedia articles for “Genie (Aladdin)” as well as the article “Aladdin (1992 Disney film)” article, reflecting one of his most celebrated movie roles. These articles in turn link to each other because they are clearly closely related to each other.
However, other articles are linked from Williams’s article but do not link to each other. The article “Afghanistan” (where he performed with the USO for troops stationed there) and the article “Al Pacino” (with whom he co-starred in the 2002 movie, Insomnia) are linked from his article but these articles do not link to each other themselves: Al Pacino’s article never mentions Afghanistan and Afghanistan’s article never mentions Al Pacino. In other words, the extent to which Wikipedia articles link to each other provides a coarse measure of how closely related two topics are.
The links between the 276 articles that compose Williams’s hyperlinked article neighborhood have a lot of variability in whether they link to each other. Some groups around movies and actors are more densely linked than other articles about the cities he’s lived are relatively isolated from other linked articles. These individual nodes can be partitioned into groups using a number of different bottom-up “community detection” algorithms. A group is roughly defined as having more ties inside the group than outside of the group. We can visualize the resulting graph breaking the communities apart into sub-hairballs to reveal the extent to which these sub-communities link to each other.
The communities reveal clusters of related topics about various roles, celebrity media coverage, and biographical details about places he’s lived and hobbies he enjoyed. But buried inside the primary community surrounding the “Robin Williams” article are articles like “cocaine dependence“, “depression (mood)“, and “suicide“. While these articles are linked among themselves, reflecting their similarity to each other, they are scarcely linked to any other topics in the network.
To me, this reveals something profound about the way we collectively think about celebrities and mental health. Among all 276 articles and 1,399 connections in this hyperlink network about prominent entertainers, performances in movies and television shows, and related topics, there are only 4 links to cocaine dependence, 5 links to depression, and 13 to suicide. In a very real way, our knowledge about mental health issues is nearly isolated from the entire world of celebrity surrounding Robin Williams. These problems are so peripheral, they are effectively invisible to the ways we talk about dozens of actors and their accomplishments.
In an alternative world in which mental health issues and celebrity weren’t treated as secrets to be hidden, I suspect issues of substance abuse, depression, and other mental health issues would move in from the periphery and become more central as these topics connect to other actors’ biographies as well as being prominently featured in movies themselves.