Statistical Backbone Extraction ================================ This tutorial demonstrates six statistical backbone extraction methods. Each method tests whether an edge's weight is statistically significant under a null model. Setup ----- We use a weighted graph for this tutorial:: import networkx as nx import networkx_backbone as nb # Create a weighted graph G = nx.les_miserables_graph() print(f"Original: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges") Disparity filter ---------------- The disparity filter (Serrano et al., 2009) tests each edge against a null model where a node's total strength is uniformly distributed across its edges. Edges with unexpectedly large weight receive low p-values:: H = nb.disparity_filter(G) # Each edge now has a "disparity_pvalue" attribute u, v = list(H.edges())[0] print(f"Edge ({u}, {v}): p-value = {H[u][v]['disparity_pvalue']:.4f}") # Filter at alpha = 0.05 backbone = nb.threshold_filter(H, "disparity_pvalue", 0.05) print(f"Disparity backbone: {backbone.number_of_edges()} edges") Noise-corrected filter ---------------------- The noise-corrected filter (Coscia & Neffke, 2017) uses a binomial framework to model edge weights. It produces z-scores rather than p-values -- higher z-scores indicate more significant edges:: H = nb.noise_corrected_filter(G) # Each edge now has an "nc_score" attribute (z-score) u, v = list(H.edges())[0] print(f"Edge ({u}, {v}): z-score = {H[u][v]['nc_score']:.4f}") # Filter: keep edges with z-score above a threshold backbone = nb.threshold_filter(H, "nc_score", 2.0, mode="above") print(f"Noise-corrected backbone: {backbone.number_of_edges()} edges") Marginal likelihood filter -------------------------- The marginal likelihood filter (Dianati, 2016) considers both endpoints in a binomial null model and treats weights as integer counts:: H = nb.marginal_likelihood_filter(G) backbone = nb.threshold_filter(H, "ml_pvalue", 0.05) print(f"Marginal likelihood backbone: {backbone.number_of_edges()} edges") ECM filter ---------- The Enhanced Configuration Model (Gemmetto et al., 2017) uses a maximum-entropy null model that preserves both degree and strength sequences. This is the most principled null model but also the most computationally expensive:: H = nb.ecm_filter(G) backbone = nb.threshold_filter(H, "ecm_pvalue", 0.05) print(f"ECM backbone: {backbone.number_of_edges()} edges") LANS filter ----------- Locally Adaptive Network Sparsification (Foti et al., 2011) uses nonparametric empirical CDFs instead of parametric distributions. This makes no distributional assumptions:: H = nb.lans_filter(G) backbone = nb.threshold_filter(H, "lans_pvalue", 0.05) print(f"LANS backbone: {backbone.number_of_edges()} edges") Multiple linkage analysis ------------------------- Multiple linkage analysis (Van Nuffel et al., 2010; Yassin et al., 2023) extracts a backbone using local linkage significance:: H = nb.multiple_linkage_analysis(G, alpha=0.05) backbone = nb.boolean_filter(H, "mla_keep") print(f"MLA backbone: {backbone.number_of_edges()} edges") Comparing all statistical methods ---------------------------------- Use :func:`~networkx_backbone.compare_backbones` to compare the results:: backbones = { "disparity": nb.threshold_filter( nb.disparity_filter(G), "disparity_pvalue", 0.05 ), "noise_corrected": nb.threshold_filter( nb.noise_corrected_filter(G), "nc_score", 2.0, mode="above" ), "marginal_likelihood": nb.threshold_filter( nb.marginal_likelihood_filter(G), "ml_pvalue", 0.05 ), "ecm": nb.threshold_filter( nb.ecm_filter(G), "ecm_pvalue", 0.05 ), "lans": nb.threshold_filter( nb.lans_filter(G), "lans_pvalue", 0.05 ), "mla": nb.boolean_filter( nb.multiple_linkage_analysis(G, alpha=0.05), "mla_keep" ), } results = nb.compare_backbones(G, backbones) for name, metrics in results.items(): ef = metrics["edge_fraction"] nf = metrics["node_fraction"] print(f"{name:25s}: edges={ef:.1%}, nodes={nf:.1%}")