Bipartite Projection Backbones#

This tutorial demonstrates weighted bipartite projections (simple, hyper, ProbS, YCN) and backbone scoring with SDSM/FDSM, then extracts significant edges with filtering.

What is a bipartite projection backbone?#

In many real-world datasets, relationships are bipartite: people attend events, authors write papers, users rate products. To study relationships among one set of nodes (for example, people), we project the bipartite graph into a unipartite graph where two people are connected if they share an event. However, this projection often produces a dense graph with many spurious connections. Backbone methods identify which co-occurrences are statistically significant.

Setup#

import networkx as nx
import networkx_backbone as nb

# Davis Southern Women graph from NetworkX social generators
B = nx.davis_southern_women_graph()

# Choose the women partition as the "agent" nodes
women_nodes = [n for n, d in B.nodes(data=True) if d["bipartite"] == 0]
event_nodes = [n for n, d in B.nodes(data=True) if d["bipartite"] == 1]

print(f"Bipartite graph: {B.number_of_nodes()} nodes, {B.number_of_edges()} edges")
print(f"Women: {len(women_nodes)}, events: {len(event_nodes)}")

Weighted projections#

Following Coscia & Neffke (2017, arXiv:1906.09081), you can project the bipartite graph into weighted one-mode networks using different weighting schemes:

G_simple = nb.simple_projection(B, women_nodes)
G_hyper = nb.hyper_projection(B, women_nodes)
G_probs = nb.probs_projection(B, women_nodes)  # symmetrized ProbS
G_ycn = nb.ycn_projection(B, women_nodes)      # symmetrized YCN

print("Simple edges:", G_simple.number_of_edges())
print("Hyper edges:", G_hyper.number_of_edges())
print("ProbS edges:", G_probs.number_of_edges())
print("YCN edges:", G_ycn.number_of_edges())

You can also dispatch by name:

G = nb.bipartite_projection(B, women_nodes, method="probs")

SDSM: Stochastic Degree Sequence Model#

The SDSM (Neal, 2014) uses an analytical approximation (Poisson-binomial via normal distribution) to compute p-values for each co-occurrence. It returns the full projection with "sdsm_pvalue" on each edge. You can choose which projection weights to attach to those edges via projection=:

H = nb.sdsm(B, agent_nodes=women_nodes, projection="hyper")
backbone = nb.threshold_filter(H, "sdsm_pvalue", 0.05, mode="below")
print(f"SDSM backbone: {backbone.number_of_nodes()} nodes, {backbone.number_of_edges()} edges")

# Examine p-values
for u, v, data in H.edges(data=True):
    print(f"  Edge ({u}, {v}): p-value = {data['sdsm_pvalue']:.4f}")

FDSM: Fixed Degree Sequence Model#

The FDSM (Neal et al., 2021) uses Monte Carlo simulation to compute p-values. It preserves the exact degree sequence of the bipartite graph in each random trial. It also returns the full projection:

H = nb.fdsm(B, agent_nodes=women_nodes, trials=1000, seed=42, projection="ycn")
backbone = nb.threshold_filter(H, "fdsm_pvalue", 0.05, mode="below")
print(f"FDSM backbone: {backbone.number_of_nodes()} nodes, {backbone.number_of_edges()} edges")

# Examine p-values
for u, v, data in H.edges(data=True):
    print(f"  Edge ({u}, {v}): p-value = {data['fdsm_pvalue']:.4f}")

Additional null models#

Fixed-fill, fixed-row, and fixed-column models are also available:

bb_fill = nb.fixedfill(B, women_nodes, alpha=0.05)
bb_row = nb.fixedrow(B, women_nodes, alpha=0.05)
bb_col = nb.fixedcol(B, women_nodes, alpha=0.05)

You can also use the high-level projection wrapper:

bb = nb.backbone_from_projection(
    B,
    women_nodes,
    method="fixedrow",
    alpha=0.05,
    projection="probs",
)

Comparing SDSM and FDSM#

SDSM is faster (analytical) and works well for large networks. It uses a stochastic degree sequence model where node degrees are treated as expectations rather than fixed values.
FDSM is more conservative and statistically rigorous. It preserves the exact degree sequence through simulation, but requires more computation. Increase trials for more precise p-values.

sdsm_scores = nb.sdsm(B, agent_nodes=women_nodes)
fdsm_scores = nb.fdsm(B, agent_nodes=women_nodes, trials=1000, seed=42)

sdsm_backbone = nb.threshold_filter(sdsm_scores, "sdsm_pvalue", 0.05, mode="below")
fdsm_backbone = nb.threshold_filter(fdsm_scores, "fdsm_pvalue", 0.05, mode="below")

print(f"SDSM edges: {sdsm_backbone.number_of_edges()}")
print(f"FDSM edges: {fdsm_backbone.number_of_edges()}")

Partition selection note#

nx.davis_southern_women_graph() includes a "bipartite" node attribute, which makes partition selection straightforward. In general, pass whichever partition you want to project as agent_nodes.

References#

Coscia, M., & Neffke, F. M. (2017). Network backboning with noisy data. https://arxiv.org/abs/1906.09081