Bipartite Projection Backbones#

This tutorial demonstrates weighted bipartite projections (simple, hyper, ProbS, YCN) and backbone scoring with SDSM/FDSM, then extracts significant edges with filtering.

What is a bipartite projection backbone?#

In many real-world datasets, relationships are bipartite: people attend events, authors write papers, users rate products. To study relationships among one set of nodes (for example, people), we project the bipartite graph into a unipartite graph where two people are connected if they share an event. However, this projection often produces a dense graph with many spurious connections. Backbone methods identify which co-occurrences are statistically significant.

Setup#

import networkx as nx
import networkx_backbone as nb

# Davis Southern Women graph from NetworkX social generators
B = nx.davis_southern_women_graph()

# Choose the women partition as the "agent" nodes
women_nodes = [n for n, d in B.nodes(data=True) if d["bipartite"] == 0]
event_nodes = [n for n, d in B.nodes(data=True) if d["bipartite"] == 1]

print(f"Bipartite graph: {B.number_of_nodes()} nodes, {B.number_of_edges()} edges")
print(f"Women: {len(women_nodes)}, events: {len(event_nodes)}")

Weighted projections#

Following Coscia & Neffke (2017, arXiv:1906.09081), you can project the bipartite graph into weighted one-mode networks using different weighting schemes:

G_simple = nb.simple_projection(B, women_nodes)
G_hyper = nb.hyper_projection(B, women_nodes)
G_probs = nb.probs_projection(B, women_nodes)  # symmetrized ProbS
G_ycn = nb.ycn_projection(B, women_nodes)      # symmetrized YCN

print("Simple edges:", G_simple.number_of_edges())
print("Hyper edges:", G_hyper.number_of_edges())
print("ProbS edges:", G_probs.number_of_edges())
print("YCN edges:", G_ycn.number_of_edges())

You can also dispatch by name:

G = nb.bipartite_projection(B, women_nodes, method="probs")

SDSM: Stochastic Degree Sequence Model#

The SDSM (Neal, 2014) uses an analytical approximation (Poisson-binomial via normal distribution) to compute p-values for each co-occurrence. It returns the full projection with "sdsm_pvalue" on each edge. You can choose which projection weights to attach to those edges via projection=:

H = nb.sdsm(B, agent_nodes=women_nodes, projection="hyper")
backbone = nb.threshold_filter(H, "sdsm_pvalue", 0.05, mode="below")
print(f"SDSM backbone: {backbone.number_of_nodes()} nodes, {backbone.number_of_edges()} edges")

# Examine p-values
for u, v, data in H.edges(data=True):
    print(f"  Edge ({u}, {v}): p-value = {data['sdsm_pvalue']:.4f}")

FDSM: Fixed Degree Sequence Model#

The FDSM (Neal et al., 2021) uses Monte Carlo simulation to compute p-values. It preserves the exact degree sequence of the bipartite graph in each random trial. It also returns the full projection:

H = nb.fdsm(B, agent_nodes=women_nodes, trials=1000, seed=42, projection="ycn")
backbone = nb.threshold_filter(H, "fdsm_pvalue", 0.05, mode="below")
print(f"FDSM backbone: {backbone.number_of_nodes()} nodes, {backbone.number_of_edges()} edges")

# Examine p-values
for u, v, data in H.edges(data=True):
    print(f"  Edge ({u}, {v}): p-value = {data['fdsm_pvalue']:.4f}")

Additional null models#

Fixed-fill, fixed-row, and fixed-column models are also available:

bb_fill = nb.fixedfill(B, women_nodes, alpha=0.05)
bb_row = nb.fixedrow(B, women_nodes, alpha=0.05)
bb_col = nb.fixedcol(B, women_nodes, alpha=0.05)

You can also use the high-level projection wrapper:

bb = nb.backbone_from_projection(
    B,
    women_nodes,
    method="fixedrow",
    alpha=0.05,
    projection="probs",
)

Comparing SDSM and FDSM#

  • SDSM is faster (analytical) and works well for large networks. It uses a stochastic degree sequence model where node degrees are treated as expectations rather than fixed values.

  • FDSM is more conservative and statistically rigorous. It preserves the exact degree sequence through simulation, but requires more computation. Increase trials for more precise p-values.

sdsm_scores = nb.sdsm(B, agent_nodes=women_nodes)
fdsm_scores = nb.fdsm(B, agent_nodes=women_nodes, trials=1000, seed=42)

sdsm_backbone = nb.threshold_filter(sdsm_scores, "sdsm_pvalue", 0.05, mode="below")
fdsm_backbone = nb.threshold_filter(fdsm_scores, "fdsm_pvalue", 0.05, mode="below")

print(f"SDSM edges: {sdsm_backbone.number_of_edges()}")
print(f"FDSM edges: {fdsm_backbone.number_of_edges()}")

Partition selection note#

nx.davis_southern_women_graph() includes a "bipartite" node attribute, which makes partition selection straightforward. In general, pass whichever partition you want to project as agent_nodes.

References#