Statistical Backbone Extraction#

This tutorial demonstrates six statistical backbone extraction methods. Each method tests whether an edge’s weight is statistically significant under a null model.

Setup#

We use a weighted graph for this tutorial:

import networkx as nx
import networkx_backbone as nb

# Create a weighted graph
G = nx.les_miserables_graph()

print(f"Original: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")

Disparity filter#

The disparity filter (Serrano et al., 2009) tests each edge against a null model where a node’s total strength is uniformly distributed across its edges. Edges with unexpectedly large weight receive low p-values:

H = nb.disparity_filter(G)

# Each edge now has a "disparity_pvalue" attribute
u, v = list(H.edges())[0]
print(f"Edge ({u}, {v}): p-value = {H[u][v]['disparity_pvalue']:.4f}")

# Filter at alpha = 0.05
backbone = nb.threshold_filter(H, "disparity_pvalue", 0.05)
print(f"Disparity backbone: {backbone.number_of_edges()} edges")

Noise-corrected filter#

The noise-corrected filter (Coscia & Neffke, 2017) uses a binomial framework to model edge weights. It produces z-scores rather than p-values – higher z-scores indicate more significant edges:

H = nb.noise_corrected_filter(G)

# Each edge now has an "nc_score" attribute (z-score)
u, v = list(H.edges())[0]
print(f"Edge ({u}, {v}): z-score = {H[u][v]['nc_score']:.4f}")

# Filter: keep edges with z-score above a threshold
backbone = nb.threshold_filter(H, "nc_score", 2.0, mode="above")
print(f"Noise-corrected backbone: {backbone.number_of_edges()} edges")

Marginal likelihood filter#

The marginal likelihood filter (Dianati, 2016) considers both endpoints in a binomial null model and treats weights as integer counts:

H = nb.marginal_likelihood_filter(G)
backbone = nb.threshold_filter(H, "ml_pvalue", 0.05)
print(f"Marginal likelihood backbone: {backbone.number_of_edges()} edges")

ECM filter#

The Enhanced Configuration Model (Gemmetto et al., 2017) uses a maximum-entropy null model that preserves both degree and strength sequences. This is the most principled null model but also the most computationally expensive:

H = nb.ecm_filter(G)
backbone = nb.threshold_filter(H, "ecm_pvalue", 0.05)
print(f"ECM backbone: {backbone.number_of_edges()} edges")

LANS filter#

Locally Adaptive Network Sparsification (Foti et al., 2011) uses nonparametric empirical CDFs instead of parametric distributions. This makes no distributional assumptions:

H = nb.lans_filter(G)
backbone = nb.threshold_filter(H, "lans_pvalue", 0.05)
print(f"LANS backbone: {backbone.number_of_edges()} edges")

Multiple linkage analysis#

Multiple linkage analysis (Van Nuffel et al., 2010; Yassin et al., 2023) extracts a backbone using local linkage significance:

H = nb.multiple_linkage_analysis(G, alpha=0.05)
backbone = nb.boolean_filter(H, "mla_keep")
print(f"MLA backbone: {backbone.number_of_edges()} edges")

Comparing all statistical methods#

Use compare_backbones() to compare the results:

backbones = {
    "disparity": nb.threshold_filter(
        nb.disparity_filter(G), "disparity_pvalue", 0.05
    ),
    "noise_corrected": nb.threshold_filter(
        nb.noise_corrected_filter(G), "nc_score", 2.0, mode="above"
    ),
    "marginal_likelihood": nb.threshold_filter(
        nb.marginal_likelihood_filter(G), "ml_pvalue", 0.05
    ),
    "ecm": nb.threshold_filter(
        nb.ecm_filter(G), "ecm_pvalue", 0.05
    ),
    "lans": nb.threshold_filter(
        nb.lans_filter(G), "lans_pvalue", 0.05
    ),
    "mla": nb.boolean_filter(
        nb.multiple_linkage_analysis(G, alpha=0.05), "mla_keep"
    ),
}

results = nb.compare_backbones(G, backbones)
for name, metrics in results.items():
    ef = metrics["edge_fraction"]
    nf = metrics["node_fraction"]
    print(f"{name:25s}: edges={ef:.1%}, nodes={nf:.1%}")