Skip to content

⚡️ Speed up function find_node_with_highest_degree by 3,600% #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jun 23, 2025

📄 3,600% (36.00x) speedup for find_node_with_highest_degree in src/dsa/nodes.py

⏱️ Runtime : 106 milliseconds 2.86 milliseconds (best of 244 runs)

📝 Explanation and details

Let's analyze and optimize your program.

1. Bottleneck Analysis

The line profiler result shows.

  • The major bottleneck (~99% time) is this double loop.

    for node in nodes.
        ...
        for src, targets in connections.items().
            if node in targets.
                degree += 1

    This means for every node, we loop over all nodes and their targets looking for "incoming connections".

  • if node in targets line takes about 50% of total runtime alone (since this is an O(m) scan inside an O(n) loop).

2. Suggestions for Optimization

Precompute Incoming Degree

Rather than checking "for every node, how many lists in connections contain it?", we can precompute the number of incoming connections each node has in a single pass. This avoids O(n^2) behavior and reduces to O(n+m).

Algorithm

  1. Compute outgoing degree: len(connections.get(node, []))
  2. Compute incoming degree for all nodes in one pass over all targets in all values in connections.
  3. For each node, sum outgoing + incoming, and find the node with the highest degree.

3. Review Installed Distributions

No external libraries are used. The code is pure Python.

4. Optimized Code

Explanation:

  • We build incoming_degree by looping once through all connections; this replaces the O(n^2) code.
  • Now, for each node, we only need O(1) lookups for in-degree and out-degree.

5. Result

The new code will run orders of magnitude faster—from O(n^2) to O(n + m) for a graph with n nodes and m edges, with minimal extra memory.

Summary

  • Main Optimization: Precompute all incoming degrees in advance.
  • Benefit: Reduced algorithmic complexity, dramatic performance improvement—especially for large graphs.
  • No dependencies: All code remains in pure Python, no additional libraries needed.

Let me know if you'd like to see additional tweaks or a variant!

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 44 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import random  # used in large scale random test

# imports
import pytest  # used for our unit tests
from src.dsa.nodes import find_node_with_highest_degree

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_single_node_no_connections():
    # Single node, no connections
    nodes = ['A']
    connections = {}
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 833ns -> 833ns (0.000% faster)

def test_two_nodes_one_connection():
    # Two nodes, one connection from A to B
    nodes = ['A', 'B']
    connections = {'A': ['B']}
    # A has 1 outgoing, B has 1 incoming, both degree 1, but A comes first in nodes
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.29μs -> 1.33μs (3.08% slower)

def test_two_nodes_bidirectional():
    # Two nodes, bidirectional connection
    nodes = ['A', 'B']
    connections = {'A': ['B'], 'B': ['A']}
    # Both have degree 2 (1 in, 1 out), A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.38μs -> 1.42μs (2.90% slower)

def test_three_nodes_varied_connections():
    # Three nodes, varied connections
    nodes = ['A', 'B', 'C']
    connections = {'A': ['B', 'C'], 'B': ['C']}
    # A: 2 out, 0 in => 2; B: 1 out, 1 in => 2; C: 0 out, 2 in => 2; A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.71μs -> 1.62μs (5.11% faster)

def test_multiple_nodes_different_degrees():
    # Four nodes, one with the highest degree
    nodes = ['A', 'B', 'C', 'D']
    connections = {'A': ['B'], 'B': ['C', 'D'], 'C': [], 'D': ['A']}
    # Degrees: A(1+1=2), B(2+1=3), C(0+1=1), D(1+1=2)
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 2.33μs -> 1.88μs (24.4% faster)

# ------------------------
# Edge Test Cases
# ------------------------

def test_empty_nodes():
    # No nodes at all
    nodes = []
    connections = {}
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 333ns -> 500ns (33.4% slower)

def test_nodes_with_no_connections():
    # Nodes present but no connections at all
    nodes = ['A', 'B', 'C']
    connections = {}
    # All degrees 0, A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.21μs -> 1.08μs (11.6% faster)

def test_node_with_self_loop():
    # Node with a self-loop
    nodes = ['A']
    connections = {'A': ['A']}
    # Self-loop counts as 1 out + 1 in = 2
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 958ns -> 1.12μs (14.8% slower)

def test_multiple_nodes_with_self_loops():
    nodes = ['A', 'B']
    connections = {'A': ['A'], 'B': ['B']}
    # Both have degree 2, A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.42μs -> 1.38μs (3.05% faster)

def test_node_with_multiple_incoming_and_outgoing():
    # Node with both multiple incoming and outgoing connections
    nodes = ['A', 'B', 'C']
    connections = {'A': ['B', 'C'], 'B': ['C'], 'C': ['A']}
    # Degrees: A(2+1=3), B(1+1=2), C(1+2=3); A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.83μs -> 1.71μs (7.32% faster)

def test_node_not_in_connections():
    # Node in nodes list but not in connections dict
    nodes = ['A', 'B', 'C']
    connections = {'A': ['B']}
    # A: 1 out, 0 in = 1; B: 0 out, 1 in = 1; C: 0 out, 0 in = 0; A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.54μs -> 1.42μs (8.82% faster)

def test_nodes_with_duplicate_connections():
    # Duplicate connections should be counted (if present)
    nodes = ['A', 'B']
    connections = {'A': ['B', 'B']}
    # A: 2 out, 0 in = 2; B: 0 out, 2 in = 2; A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.33μs -> 1.38μs (3.05% slower)

def test_disconnected_graph():
    # All nodes disconnected
    nodes = ['A', 'B', 'C']
    connections = {'A': [], 'B': [], 'C': []}
    # All degrees 0, A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.54μs -> 1.33μs (15.7% faster)

def test_node_with_large_incoming_only():
    # One node with many incoming edges only
    nodes = ['A', 'B', 'C']
    connections = {'B': ['A'], 'C': ['A']}
    # A: 0 out, 2 in = 2; B: 1 out, 0 in = 1; C: 1 out, 0 in = 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.67μs -> 1.54μs (8.11% faster)

def test_node_with_large_outgoing_only():
    # One node with many outgoing edges only
    nodes = ['A', 'B', 'C']
    connections = {'A': ['B', 'C']}
    # A: 2 out, 0 in = 2; B: 0 out, 1 in = 1; C: 0 out, 1 in = 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.54μs -> 1.50μs (2.80% faster)

def test_tied_highest_degree_not_first():
    # Highest degree node is not the first in the list
    nodes = ['X', 'Y', 'Z']
    connections = {'X': ['Y'], 'Y': ['Z'], 'Z': ['X', 'Y']}
    # X: 1+1=2, Y:1+2=3, Z:2+1=3; Y comes before Z in nodes, so Y is returned
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.88μs -> 1.71μs (9.71% faster)

def test_connections_to_nonexistent_nodes():
    # Connections to nodes not in the nodes list should not affect the result
    nodes = ['A', 'B']
    connections = {'A': ['B', 'C'], 'B': ['D']}
    # Only consider A and B for degree calculation
    # A: 2 out (B and C), 0 in = 2; B: 1 out (D), 1 in (A->B) = 2; A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.38μs -> 1.46μs (5.76% slower)

def test_nodes_with_empty_string_names():
    # Nodes with empty string names
    nodes = ['', 'A']
    connections = {'': ['A'], 'A': ['']}
    # '': 1 out, 1 in = 2; 'A': 1 out, 1 in = 2; '' comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.38μs -> 1.38μs (0.000% faster)

# ------------------------
# Large Scale Test Cases
# ------------------------

def test_large_complete_graph():
    # Complete directed graph of 100 nodes (each node connects to all others)
    n = 100
    nodes = [str(i) for i in range(n)]
    connections = {str(i): [str(j) for j in range(n) if j != i] for i in range(n)}
    # Each node has n-1 outgoing, n-1 incoming = 2*(n-1)
    # All degrees tied, first node returned
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 6.97ms -> 800μs (771% faster)

def test_large_sparse_graph():
    # Sparse graph: 1000 nodes, only first node connects to the next 10
    n = 1000
    nodes = [str(i) for i in range(n)]
    connections = {'0': [str(i) for i in range(1, 11)]}
    # '0': 10 out, 0 in = 10; nodes 1-10: 0 out, 1 in = 1; others: 0
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 249μs -> 115μs (117% faster)

def test_large_chain_graph():
    # Chain graph: 1000 nodes, each node connects to the next
    n = 1000
    nodes = [str(i) for i in range(n)]
    connections = {str(i): [str(i+1)] for i in range(n-1)}
    # First node: 1 out, 0 in = 1; last node: 0 out, 1 in = 1; others: 1 out, 1 in = 2
    # Node '1' is the first with degree 2
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 29.5ms -> 246μs (11872% faster)

def test_large_random_graph():
    # Large random graph, 500 nodes, random connections, ensure function runs and returns valid node
    n = 500
    nodes = [str(i) for i in range(n)]
    connections = {}
    random.seed(42)
    for i in range(n):
        # Each node connects to up to 5 random other nodes (no self-loop)
        targets = random.sample([str(j) for j in range(n) if j != i], k=random.randint(0, 5))
        if targets:
            connections[str(i)] = targets
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 15.6ms -> 209μs (7387% faster)

def test_large_graph_with_duplicate_edges():
    # Large graph with duplicate edges
    n = 200
    nodes = [str(i) for i in range(n)]
    connections = {}
    for i in range(n-1):
        # Each node connects twice to the next node
        connections[str(i)] = [str(i+1), str(i+1)]
    # Node '198' has 2 out (to 199), 2 in (from 197), so degree 4
    # Node '0' has 2 out, 0 in = 2; node '199' has 0 out, 2 in = 2
    # Node '1' has 2 out, 2 in = 4; so node '1' is first with degree 4
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.64ms -> 68.3μs (2307% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
from src.dsa.nodes import find_node_with_highest_degree

# unit tests

# ----------------------
# Basic Test Cases
# ----------------------

def test_single_node_no_connections():
    # Single node, no connections
    nodes = ['A']
    connections = {}
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 792ns -> 833ns (4.92% slower)

def test_two_nodes_one_connection():
    # Two nodes, one connection A->B
    nodes = ['A', 'B']
    connections = {'A': ['B']}
    # A: 1 outgoing, 0 incoming = 1
    # B: 0 outgoing, 1 incoming = 1
    # Both have same degree, but A comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.38μs -> 1.29μs (6.42% faster)

def test_three_nodes_varied_connections():
    # Three nodes, different degrees
    nodes = ['A', 'B', 'C']
    connections = {'A': ['B', 'C'], 'B': ['C']}
    # A: 2 outgoing, 0 incoming = 2
    # B: 1 outgoing, 1 incoming (from A) = 2
    # C: 0 outgoing, 2 incoming (from A, B) = 2
    # All tie, return first: 'A'
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.75μs -> 1.62μs (7.69% faster)

def test_clear_highest_degree():
    # One node clearly has highest degree
    nodes = ['X', 'Y', 'Z']
    connections = {'X': ['Y'], 'Y': ['X', 'Z'], 'Z': []}
    # X: 1 out, 1 in (from Y) = 2
    # Y: 2 out, 1 in (from X) = 3
    # Z: 0 out, 1 in (from Y) = 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.83μs -> 1.67μs (9.96% faster)

def test_disconnected_nodes():
    # All nodes disconnected
    nodes = ['A', 'B', 'C']
    connections = {}
    # All degrees 0, return first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.21μs -> 1.08μs (11.5% faster)

# ----------------------
# Edge Test Cases
# ----------------------

def test_empty_nodes_list():
    # No nodes at all
    nodes = []
    connections = {'A': ['B']}
    # Should return None
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 292ns -> 792ns (63.1% slower)

def test_connections_to_nonexistent_nodes():
    # Connections point to nodes not in list
    nodes = ['A', 'B']
    connections = {'A': ['B', 'C'], 'B': ['D']}
    # Only consider A and B for degree
    # A: 2 out (B, C), 0 in = 2
    # B: 1 out (D), 1 in (from A) = 2
    # Both tie, return 'A'
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.42μs -> 1.46μs (2.81% slower)

def test_self_loops():
    # Node with self-loop
    nodes = ['A', 'B']
    connections = {'A': ['A', 'B'], 'B': ['A']}
    # A: 2 out (A, B), 2 in (A self-loop, B) = 4
    # B: 1 out (A), 1 in (A) = 2
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.38μs -> 1.50μs (8.33% slower)

def test_multiple_nodes_same_degree():
    # Multiple nodes with same highest degree, order matters
    nodes = ['X', 'Y', 'Z']
    connections = {'X': ['Y'], 'Y': ['Z'], 'Z': ['X']}
    # All have 1 out, 1 in = 2
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.79μs -> 1.62μs (10.3% faster)

def test_node_with_only_incoming():
    # Node has only incoming, none outgoing
    nodes = ['A', 'B', 'C']
    connections = {'A': ['B'], 'B': ['C']}
    # A: 1 out, 0 in = 1
    # B: 1 out, 1 in = 2
    # C: 0 out, 1 in = 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.71μs -> 1.50μs (13.9% faster)

def test_node_with_only_outgoing():
    # Node has only outgoing, none incoming
    nodes = ['A', 'B', 'C']
    connections = {'A': ['B', 'C']}
    # A: 2 out, 0 in = 2
    # B: 0 out, 1 in = 1
    # C: 0 out, 1 in = 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.62μs -> 1.50μs (8.33% faster)

def test_nodes_with_no_connections_entry():
    # Some nodes not present in connections dict
    nodes = ['A', 'B', 'C']
    connections = {'A': ['B']}
    # B, C not keys in connections
    # A: 1 out, 0 in = 1
    # B: 0 out, 1 in = 1
    # C: 0 out, 0 in = 0
    # A and B tie, return 'A'
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.58μs -> 1.42μs (11.7% faster)

def test_duplicate_connections():
    # Duplicate connections should be counted
    nodes = ['A', 'B']
    connections = {'A': ['B', 'B']}
    # A: 2 out, 0 in = 2
    # B: 0 out, 2 in (from A, both times) = 2
    # Both tie, return 'A'
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.33μs -> 1.33μs (0.075% slower)

def test_large_number_of_self_loops():
    # Many self-loops
    nodes = ['A']
    connections = {'A': ['A']*10}
    # A: 10 out, 10 in (all self-loop) = 20
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 958ns -> 1.58μs (39.5% slower)

# ----------------------
# Large Scale Test Cases
# ----------------------

def test_large_fully_connected_graph():
    # Fully connected graph of 100 nodes
    n = 100
    nodes = [f'N{i}' for i in range(n)]
    connections = {node: [f'N{j}' for j in range(n) if f'N{j}' != node] for node in nodes}
    # Each node: (n-1) out, (n-1) in = 2*(n-1)
    # All tie, return first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 7.58ms -> 787μs (862% faster)

def test_large_sparse_graph():
    # 1000 nodes, only first connects to all others
    n = 1000
    nodes = [f'N{i}' for i in range(n)]
    connections = {'N0': [f'N{i}' for i in range(1, n)]}
    # N0: n-1 out, 0 in = n-1
    # N1..N999: 0 out, 1 in = 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 5.77ms -> 201μs (2768% faster)

def test_large_chain_graph():
    # 1000 nodes in a chain
    n = 1000
    nodes = [f'N{i}' for i in range(n)]
    connections = {f'N{i}': [f'N{i+1}'] for i in range(n-1)}
    # N0: 1 out, 0 in = 1
    # N1..N{n-2}: 1 out, 1 in = 2
    # N{n-1}: 0 out, 1 in = 1
    # N1 has degree 2, but so do all in the middle; return first: N1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 28.3ms -> 237μs (11808% faster)

def test_large_graph_with_self_loops_and_duplicates():
    # 500 nodes, each with self-loop and duplicate connections
    n = 500
    nodes = [f'N{i}' for i in range(n)]
    connections = {node: [node]*2 for node in nodes}
    # Each node: 2 out (self-loops), 2 in (self-loops) = 4
    # All tie, return first
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 9.86ms -> 123μs (7904% faster)

def test_large_graph_with_missing_nodes_in_connections():
    # 100 nodes, only half present in connections
    n = 100
    nodes = [f'N{i}' for i in range(n)]
    connections = {f'N{i}': [f'N{(i+1)%n}'] for i in range(0, n, 2)}
    # Even nodes: 1 out, 1 in (from previous even node, modulo n)
    # Odd nodes: 0 out, 1 in (from previous even node)
    # Even nodes have degree 2, odd nodes 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 168μs -> 20.1μs (740% faster)

# ----------------------
# Additional Edge Cases
# ----------------------

def test_connection_dict_with_extra_nodes():
    # connections dict has extra keys not in nodes
    nodes = ['A', 'B']
    connections = {'A': ['B'], 'B': ['A'], 'C': ['A', 'B']}
    # Only A and B considered
    # A: 1 out, 2 in (from B, C) = 3
    # B: 1 out, 2 in (from A, C) = 3
    # Both tie, return 'A'
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.54μs -> 1.58μs (2.59% slower)

def test_nodes_with_empty_connection_lists():
    # Nodes with empty connection lists
    nodes = ['A', 'B', 'C']
    connections = {'A': [], 'B': []}
    # All have 0 out, 0 in = 0
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.50μs -> 1.29μs (16.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.dsa.nodes import find_node_with_highest_degree

def test_find_node_with_highest_degree():
    find_node_with_highest_degree(['\x01', '\x01\x00'], {'\x00\x00': [], '\x00\x00\x00': ['\x01'], '\x00': [], '\x02': []})

To edit these changes git checkout codeflash/optimize-find_node_with_highest_degree-mc8r96cx and push.

Codeflash

Let's analyze and optimize your program.

## 1. Bottleneck Analysis

The line profiler result shows.

- The major bottleneck (~99% time) is this double loop.
    ```python
    for node in nodes.
        ...
        for src, targets in connections.items().
            if node in targets.
                degree += 1
    ```
  This means **for every node**, we **loop over all nodes and their targets** looking for "incoming connections".

- `if node in targets` line takes about **50% of total runtime alone** (since this is an O(m) scan inside an O(n) loop).

## 2. Suggestions for Optimization

### Precompute Incoming Degree

Rather than checking "for every node, how many lists in connections contain it?", we can **precompute the number of incoming connections each node has** in a single pass. This avoids O(n^2) behavior and reduces to O(n+m).

### Algorithm
1. Compute outgoing degree: `len(connections.get(node, []))`
2. Compute incoming degree for all nodes in one pass over all targets in all values in `connections`.
3. For each node, sum outgoing + incoming, and find the node with the highest degree.

### 3. Review Installed Distributions

No external libraries are used. The code is pure Python.

### 4. Optimized Code



**Explanation:**  
- We build `incoming_degree` by looping once through all connections; this replaces the O(n^2) code.
- Now, for each `node`, we only need O(1) lookups for in-degree and out-degree.

### 5. Result

The new code will run **orders of magnitude faster**—from O(n^2) to O(n + m) for a graph with n nodes and m edges, with minimal extra memory.

## Summary

- **Main Optimization:** Precompute all incoming degrees in advance.
- **Benefit:** Reduced algorithmic complexity, dramatic performance improvement—especially for large graphs.
- **No dependencies:** All code remains in pure Python, no additional libraries needed.

Let me know if you'd like to see additional tweaks or a variant!
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 23, 2025
@codeflash-ai codeflash-ai bot requested a review from KRRT7 June 23, 2025 07:08
@KRRT7 KRRT7 closed this Jun 23, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-find_node_with_highest_degree-mc8r96cx branch June 23, 2025 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant