Statistical Backbone Extraction#
This tutorial demonstrates six statistical backbone extraction methods. Each method tests whether an edge’s weight is statistically significant under a null model.
Setup#
We use a weighted graph for this tutorial:
import networkx as nx
import networkx_backbone as nb
# Create a weighted graph
G = nx.les_miserables_graph()
print(f"Original: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
Disparity filter#
The disparity filter (Serrano et al., 2009) tests each edge against a null model where a node’s total strength is uniformly distributed across its edges. Edges with unexpectedly large weight receive low p-values:
H = nb.disparity_filter(G)
# Each edge now has a "disparity_pvalue" attribute
u, v = list(H.edges())[0]
print(f"Edge ({u}, {v}): p-value = {H[u][v]['disparity_pvalue']:.4f}")
# Filter at alpha = 0.05
backbone = nb.threshold_filter(H, "disparity_pvalue", 0.05)
print(f"Disparity backbone: {backbone.number_of_edges()} edges")
Noise-corrected filter#
The noise-corrected filter (Coscia & Neffke, 2017) uses a binomial framework to model edge weights. It produces z-scores rather than p-values – higher z-scores indicate more significant edges:
H = nb.noise_corrected_filter(G)
# Each edge now has an "nc_score" attribute (z-score)
u, v = list(H.edges())[0]
print(f"Edge ({u}, {v}): z-score = {H[u][v]['nc_score']:.4f}")
# Filter: keep edges with z-score above a threshold
backbone = nb.threshold_filter(H, "nc_score", 2.0, mode="above")
print(f"Noise-corrected backbone: {backbone.number_of_edges()} edges")
Marginal likelihood filter#
The marginal likelihood filter (Dianati, 2016) considers both endpoints in a binomial null model and treats weights as integer counts:
H = nb.marginal_likelihood_filter(G)
backbone = nb.threshold_filter(H, "ml_pvalue", 0.05)
print(f"Marginal likelihood backbone: {backbone.number_of_edges()} edges")
ECM filter#
The Enhanced Configuration Model (Gemmetto et al., 2017) uses a maximum-entropy null model that preserves both degree and strength sequences. This is the most principled null model but also the most computationally expensive:
H = nb.ecm_filter(G)
backbone = nb.threshold_filter(H, "ecm_pvalue", 0.05)
print(f"ECM backbone: {backbone.number_of_edges()} edges")
LANS filter#
Locally Adaptive Network Sparsification (Foti et al., 2011) uses nonparametric empirical CDFs instead of parametric distributions. This makes no distributional assumptions:
H = nb.lans_filter(G)
backbone = nb.threshold_filter(H, "lans_pvalue", 0.05)
print(f"LANS backbone: {backbone.number_of_edges()} edges")
Multiple linkage analysis#
Multiple linkage analysis (Van Nuffel et al., 2010; Yassin et al., 2023) extracts a backbone using local linkage significance:
H = nb.multiple_linkage_analysis(G, alpha=0.05)
backbone = nb.boolean_filter(H, "mla_keep")
print(f"MLA backbone: {backbone.number_of_edges()} edges")
Comparing all statistical methods#
Use compare_backbones() to compare the results:
backbones = {
"disparity": nb.threshold_filter(
nb.disparity_filter(G), "disparity_pvalue", 0.05
),
"noise_corrected": nb.threshold_filter(
nb.noise_corrected_filter(G), "nc_score", 2.0, mode="above"
),
"marginal_likelihood": nb.threshold_filter(
nb.marginal_likelihood_filter(G), "ml_pvalue", 0.05
),
"ecm": nb.threshold_filter(
nb.ecm_filter(G), "ecm_pvalue", 0.05
),
"lans": nb.threshold_filter(
nb.lans_filter(G), "lans_pvalue", 0.05
),
"mla": nb.boolean_filter(
nb.multiple_linkage_analysis(G, alpha=0.05), "mla_keep"
),
}
results = nb.compare_backbones(G, backbones)
for name, metrics in results.items():
ef = metrics["edge_fraction"]
nf = metrics["node_fraction"]
print(f"{name:25s}: edges={ef:.1%}, nodes={nf:.1%}")