Human-centered data science

Infusing social science values into data science methods

My second research focus contributes to the emerging sub-discipline of human-centered data science (HCDS). Tracing its lineage back to “human-centered computing”, HCDS analyzes the situated nature of social behavior within large and complex datasets by combining techniques from qualitative and quantitative methods. While calls for greater methodological collaboration are not new, I find HCDS to be a productive label for research that prioritizes humane values and precedents in the design and deployment of data-intensive technologies. HCDS complements my research interests around disruptions by examining how unexpected events and policy changes provide empirical traction for complex questions around governing online platforms, translating commons traditions, and supporting the transformation of individual and organizational identities. My co-authored 2020 ACM Transactions of Social Computing about variation in image usage across Wikipedia’s language editions [1] illustrates how I approach HCDS through a synthesis of qualitative methods like content analysis and case studies and quantitative methods like network analysis and dimensionality reduction.

Governing online platforms

Questions about the efficacy, legitimacy, and durability of governing social systems are among the oldest in the academy. Disruptions and the social responses to them are boundary cases that test the effectiveness of online governance systems. In 2019, I won a $500,000 NSF grant with co-PI Chenhao Tan to explore the role of genealogical relationships in online communities on Reddit and Wikipedia. We recruited Dr. Estelle Smith from the Grouplens Laboratory as a post-doctoral researcher to lead mixed-methods research developing surveys and validating constructs of user migration and community identification. In the aftermath of the 2016 elections where disinformation and polarization played disruptive roles, my expertise researching Wikipedia led me to collaborate with Professor Casey Fiesler to analyze how Wikipedia’s peer-produced rules changed over time and could be a model for improving the resilience of other online social platforms [2]. Professor Fiesler and I also collaborated with Nathan Beard, a master’s student, to review dozens of social media platforms’ terms of service for how they regulate data collection [3]. I also collaborated with Professor Saiph Savage (Northeastern University) and her student Claudia Flores-Saviaga to explore how the Reddit community “r/The_Donald” mobilized and sustained participation in spite of efforts to undermine it from other communities and the platform itself [4]. As a post-doctoral researcher skeptical of arguments that increasing online disinformation could be combated with improved fact checking, my coauthors and I explored the role that social relationships have on engagement during confrontational Twitter conversations involving links to fact-checking websites [5].

Translating commons traditions

Coming out of these “post-2016” collaborations, I began to reflect on how influential “pre-2016” social computing frameworks failed to anticipate or provide traction for responding to the challenges that online communities face around moderation, polarization, harassment, amplification, and other socio-technical ills. My co-authors and I outlined these critiques and drew from the literature on governing commons to highlight the need for “constitutional-level” participatory mechanisms to improve online platform governance [6]. In a post-API age where researchers’ access to platform data needed to characterize these socio-technical ills is being circumscribed by platforms in the name of privacy, my collaborators and I emphasized the need for “counterfactual infrastructures” like the community-maintained Pushshift.io service used by the Reddit research community [7].

Data for labor and explanation

My human-centered data science research interests around governing online platform in the face of disruptions is represented through the research of my Ph.D. students. Samantha Dalal is exploring themes involving algorithmic management, labor rights, and “data strikes” her position paper about commons-based governance of valuable data assets was accepted to a CSCW 2020 workshop. My collaborators and I were awarded a $300,000 grant ($70,000 to CU) from a U.S. Air Force technology accelerator (AFWERX) in 2020 to explore the role of explainable artificial intelligence (XAI) in the space of predictive maintenance. Our research assistants partnered with CU facilities management teams to conduct interviews for participatory and speculative designs for improving predictive maintenance solutions with XAI.

New Work: Commons-based online social resilience.

The resilience of commons-based online social platforms like Wikipedia and Wikidata to propaganda, harassment, polarization, and amplification is a remarkable—but unintended—product of a complex confluence of social and technical factors. The non-personalized content, community-led governance, and high-tempo moderation on Wikipedia and Wikidata are valuable counterfactuals to conventional wisdom about the intractability of effectively governing online social platforms. Their open, detailed, and lengthy digital archives of contributions are a rich source of cases to mine for understanding the antecedents and processes of its resilience to socio-technical ills. Because socio-technical ills are urgent issues of national security, public health, and social cohesion, I am interested in developing empirically-backed strategies for hardening online social platforms by adapting the resilient features of commons-based platforms like Wikipedia.

Bibliography

  1. Visual Narratives and Collective Memory across Peer-Produced Accounts of Contested Sociopolitical Events
    Porter, Emily, Krafft, P. M., and Keegan, Brian C.
    ACM Transactions on Social Computing Feb 2020
  2. The Evolution and Consequences of Peer Producing Wikipedia’s Rules
    Keegan, Brian C., and Fiesler, Casey
    Proceedings of the International AAAI Conference on Web and Social Media May 2017
  3. No Robots, Spiders, or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service
    Fiesler, Casey, Beard, Nathan, and Keegan, Brian C.
    Proceedings of the International AAAI Conference on Web and Social Media May 2020
  4. Mobilizing the Trump Train: Understanding Collective Action in a Political Trolling Community
    Flores-Saviaga, Claudia,  Keegan, Brian C., and Savage, Saiph
    Proceedings of the International AAAI Conference on Web and Social Media Jun 2018
  5. Get Back! You Don’t Know Me Like That: The Social Mediation of Fact Checking Interventions in Twitter Conversations
    Hannak, Aniko, Margolin, Drew,  Keegan, Brian C., and Weber, Ingmar
    Proceedings of the International AAAI Conference on Web and Social Media May 2014
  6. "This Place Does What It Was Built For": Designing Digital Institutions for Participatory Change
    Frey, Seth, Krafft, P. M., and Keegan, Brian C.
    Proceedings of ACM Human-Computer Interaction Nov 2019
  7. The Pushshift Reddit Dataset
    Baumgartner, Jason, Zannettou, Savvas,  Keegan, Brian C., Squire, Megan, and Blackburn, Jeremy
    Proceedings of the International AAAI Conference on Web and Social Media May 2020