Public interest data science

Data analysis and storytelling for the public good.

My research is interested in building resources for advancing data science in the public interest. I am particularly interested in how data, algorithms, and computational tools can be mobilized to support democratic institutions, accountability, and collective well-being, rather than simply reinforcing private, commercial, or technocratic priorities. I approach data not as a neutral or inevitable force, but as a product of human choices, social structures, and contested values always in need of critical scrutiny, storytelling, and public engagement.

Cannabis data science

Cannabis legalization is one of the most profound shifts in public opinion and policy in decades, but it remains a legally precarious industry as state-level legalization initiatives since 2012 have not overturned federal drug laws. Data science is already a central component in the “digital transformation” of other industries, so the role of “new” capabilities like data science within a “new” industry like cannabis is a fascinating and important interface of social forces and technical capabilities that warrants sustained empirical attention. I coined the phrase “cannabis informatics” to refer to the emergence, mediation, and consequences of relationships among plants, people, organizations, and institutions through technologies of surveillance, analytics, and influence. I am committed to ensuring that research in this area remains grounded in public health, social equity, and transparency—using data to navigate and address the complexities of cannabis policy and practice.

As an affiliate of the Center for Research and Education Addressing Cannabis and Health (REACH), I have been a leader in the emerging field of cannabis data science. I organized a panel at CHI 2017 outlining a five-part framework of research questions and priorities for HCI researchers to explore in a post-legalization world [1]. I attended the CSU Pueblo’s Institute for Cannabis Research’s annual conference in 2018 and 2019 presenting a framework on the “social cannabinoid system” that would become cannabis informatics as well as moderating a panel on the role of data analytics in the industry. Through my CU REACH affiliation, I began to work with Dr. Daniela Vergara, a research scientist in the Department of Ecology and Evolutionary Biology and an expert in cannabis genetics. We secured an anonymized data set of chemical profile data from a testing lab and applied a variety of imputation methods to estimate the missing values [2]. My coauthors and I analyzed chemical data characterizing the combination of terpene and cannabinoid profiles for more than 80,000 strains from across the country [3]. This work has advanced our understanding of the phytochemical diversity of commercial cannabis, the reliability of strain labeling, and the social, regulatory, and public health implications of legalization. I am specifically interested in the complex assemblages of supply chain technologies and data management practices known as “seed-to-sale” or “track-and-trace” systems (like Metrc) that are implemented by regulators to document the provenance of every product across the entire market in states like California and Colorado. I am also working on a project examining the accuracy of lab testing data in the cannabis industry with Yasha Kahn.

Civic data storytelling

A core strand of my agenda is the practice and study of open and civic data storytelling. My work bridges data journalism using open data, freedom of information requests, web scraping, and visualization and computational methods like natural language processing, network analysis, and spatial data make complex data more accessible, actionable, and engaging for the broader public. I am especially invested in designing data-driven storytelling practices that foreground nurture democratic deliberation and advance progressive values like equity, transparency, and justice. I seek to ensure that civic data science works in service of the public, rather than reinforcing inequalities or institutional power.

References

  1. CHI-Nnabis: Implications of Marijuana Legalization for and from Human-Computer Interaction
    Keegan, Brian C., Cavazos-Rehg, Patricia, Nguyen, Anh Ngoc, Savage, Saiph, Kaye, Jofish, De Choudhury, Munmun, and Paul, Michael J.
    In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems 2017
  2. Modeling Cannabinoids from a Large-Scale Sample of Cannabis Sativa Chemotypes
    Vergara, Daniela, Gaudino, Reggie, Blank, Thomas, and Keegan, Brian C.
    PLOS ONE Sep 2020
  3. The Phytochemical Diversity of Commercial Cannabis in the United States
    Smith, Christiana J., Vergara, Daniela,  Keegan, Brian C., and Jikomes, Nick
    PLOS ONE May 2022