Bipartite Projection Backbones =============================== This tutorial demonstrates weighted bipartite projections (simple, hyper, ProbS, YCN) and backbone scoring with SDSM/FDSM, then extracts significant edges with filtering. What is a bipartite projection backbone? ----------------------------------------- In many real-world datasets, relationships are bipartite: people attend events, authors write papers, users rate products. To study relationships among one set of nodes (for example, people), we project the bipartite graph into a unipartite graph where two people are connected if they share an event. However, this projection often produces a dense graph with many spurious connections. Backbone methods identify which co-occurrences are statistically significant. Setup ----- :: import networkx as nx import networkx_backbone as nb # Davis Southern Women graph from NetworkX social generators B = nx.davis_southern_women_graph() # Choose the women partition as the "agent" nodes women_nodes = [n for n, d in B.nodes(data=True) if d["bipartite"] == 0] event_nodes = [n for n, d in B.nodes(data=True) if d["bipartite"] == 1] print(f"Bipartite graph: {B.number_of_nodes()} nodes, {B.number_of_edges()} edges") print(f"Women: {len(women_nodes)}, events: {len(event_nodes)}") Weighted projections -------------------- Following Coscia & Neffke (2017, arXiv:1906.09081), you can project the bipartite graph into weighted one-mode networks using different weighting schemes:: G_simple = nb.simple_projection(B, women_nodes) G_hyper = nb.hyper_projection(B, women_nodes) G_probs = nb.probs_projection(B, women_nodes) # symmetrized ProbS G_ycn = nb.ycn_projection(B, women_nodes) # symmetrized YCN print("Simple edges:", G_simple.number_of_edges()) print("Hyper edges:", G_hyper.number_of_edges()) print("ProbS edges:", G_probs.number_of_edges()) print("YCN edges:", G_ycn.number_of_edges()) You can also dispatch by name:: G = nb.bipartite_projection(B, women_nodes, method="probs") SDSM: Stochastic Degree Sequence Model ---------------------------------------- The SDSM (Neal, 2014) uses an analytical approximation (Poisson-binomial via normal distribution) to compute p-values for each co-occurrence. It returns the full projection with ``"sdsm_pvalue"`` on each edge. You can choose which projection weights to attach to those edges via ``projection=``:: H = nb.sdsm(B, agent_nodes=women_nodes, projection="hyper") backbone = nb.threshold_filter(H, "sdsm_pvalue", 0.05, mode="below") print(f"SDSM backbone: {backbone.number_of_nodes()} nodes, {backbone.number_of_edges()} edges") # Examine p-values for u, v, data in H.edges(data=True): print(f" Edge ({u}, {v}): p-value = {data['sdsm_pvalue']:.4f}") FDSM: Fixed Degree Sequence Model ----------------------------------- The FDSM (Neal et al., 2021) uses Monte Carlo simulation to compute p-values. It preserves the exact degree sequence of the bipartite graph in each random trial. It also returns the full projection:: H = nb.fdsm(B, agent_nodes=women_nodes, trials=1000, seed=42, projection="ycn") backbone = nb.threshold_filter(H, "fdsm_pvalue", 0.05, mode="below") print(f"FDSM backbone: {backbone.number_of_nodes()} nodes, {backbone.number_of_edges()} edges") # Examine p-values for u, v, data in H.edges(data=True): print(f" Edge ({u}, {v}): p-value = {data['fdsm_pvalue']:.4f}") Additional null models ---------------------- Fixed-fill, fixed-row, and fixed-column models are also available:: bb_fill = nb.fixedfill(B, women_nodes, alpha=0.05) bb_row = nb.fixedrow(B, women_nodes, alpha=0.05) bb_col = nb.fixedcol(B, women_nodes, alpha=0.05) You can also use the high-level projection wrapper:: bb = nb.backbone_from_projection( B, women_nodes, method="fixedrow", alpha=0.05, projection="probs", ) Comparing SDSM and FDSM ------------------------- - **SDSM** is faster (analytical) and works well for large networks. It uses a stochastic degree sequence model where node degrees are treated as expectations rather than fixed values. - **FDSM** is more conservative and statistically rigorous. It preserves the exact degree sequence through simulation, but requires more computation. Increase ``trials`` for more precise p-values. :: sdsm_scores = nb.sdsm(B, agent_nodes=women_nodes) fdsm_scores = nb.fdsm(B, agent_nodes=women_nodes, trials=1000, seed=42) sdsm_backbone = nb.threshold_filter(sdsm_scores, "sdsm_pvalue", 0.05, mode="below") fdsm_backbone = nb.threshold_filter(fdsm_scores, "fdsm_pvalue", 0.05, mode="below") print(f"SDSM edges: {sdsm_backbone.number_of_edges()}") print(f"FDSM edges: {fdsm_backbone.number_of_edges()}") Partition selection note ------------------------ ``nx.davis_southern_women_graph()`` includes a ``"bipartite"`` node attribute, which makes partition selection straightforward. In general, pass whichever partition you want to project as ``agent_nodes``. References ---------- - Coscia, M., & Neffke, F. M. (2017). *Network backboning with noisy data*. https://arxiv.org/abs/1906.09081