Proximity-Based Edge Scoring#

This tutorial demonstrates the proximity methods, which score edges based on how similar the neighborhoods of their endpoints are. High-scoring edges are structurally embedded within communities, while low-scoring edges tend to be bridges.

Setup#

import networkx as nx
import networkx_backbone as nb

G = nx.les_miserables_graph()
print(f"Original: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")

Local proximity methods#

These methods use only the immediate neighborhoods of the edge endpoints.

Jaccard coefficient: Ratio of common neighbors to total neighbors:

H = nb.jaccard_backbone(G)
# Edges now have a "jaccard" attribute in [0, 1]

Dice coefficient: Similar to Jaccard but uses average degree:

H = nb.dice_backbone(G)

Cosine similarity: Geometric mean normalization:

H = nb.cosine_backbone(G)

Hub promoted / depressed indices: Normalize by the minimum or maximum degree of the endpoints:

H_hpi = nb.hub_promoted_index(G)
H_hdi = nb.hub_depressed_index(G)

Leicht-Holme-Newman index: Normalize by the product of degrees:

H = nb.lhn_local_index(G)

Preferential attachment score: Product of endpoint degrees – measures how expected an edge is under a random attachment model:

H = nb.preferential_attachment_score(G)

Adamic-Adar index: Weights common neighbors by the inverse log of their degree, giving more weight to rare shared neighbors:

H = nb.adamic_adar_index(G)

Resource allocation index: Similar to Adamic-Adar but uses inverse degree instead of inverse log-degree:

H = nb.resource_allocation_index(G)

Quasi-local methods#

These methods consider paths beyond immediate neighborhoods.

Graph distance proximity: Reciprocal of shortest-path distance:

H = nb.graph_distance_proximity(G)

Compute scores for all node pairs (not just existing edges):

H_all = nb.graph_distance_proximity(G, all_pairs=True)

Local path index: Considers paths of length 2 and 3 using adjacency matrix powers:

H = nb.local_path_index(G, epsilon=0.01)

Filtering by proximity#

Use fraction_filter() to keep the most structurally embedded edges. Since higher proximity scores indicate more embedded edges, use ascending=False:

H = nb.jaccard_backbone(G)

# Keep the top 30% most embedded edges
backbone = nb.fraction_filter(H, "jaccard", 0.3, ascending=False)
print(f"Backbone: {backbone.number_of_edges()} edges")
print(f"Edge fraction: {nb.edge_fraction(G, backbone):.1%}")

Comparing proximity methods#

methods = {
    "jaccard": ("jaccard", nb.jaccard_backbone(G)),
    "dice": ("dice", nb.dice_backbone(G)),
    "cosine": ("cosine", nb.cosine_backbone(G)),
    "adamic_adar": ("adamic_adar", nb.adamic_adar_index(G)),
    "resource_alloc": ("resource_allocation", nb.resource_allocation_index(G)),
}

for name, (attr, H) in methods.items():
    backbone = nb.fraction_filter(H, attr, 0.3, ascending=False)
    ef = nb.edge_fraction(G, backbone)
    print(f"{name:20s}: {backbone.number_of_edges()} edges ({ef:.1%})")