Skip to content

⚡️ Speed up function find_cycle_vertices by 730% #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented May 28, 2025

📄 730% (7.30x) speedup for find_cycle_vertices in src/dsa/nodes.py

⏱️ Runtime : 56.2 milliseconds 6.77 milliseconds (best of 273 runs)

⚡️ This change will improve the performance of the following benchmarks:

Benchmark File :: Function Original Runtime Expected New Runtime Speedup
tests.test_dsa_nodes::test_simple_triangle_cycle_benchmark 435 microseconds 40.1 microseconds 985.54%

📝 Explanation and details

Here’s an optimized version of your program, focusing on speeding up the cycle search (the slow part, >92% of time spent) and reducing overhead.

Optimization Approach

  • Avoid full NetworkX graph construction: The conversion from an edge list to a NetworkX graph in each call is somewhat expensive. We'll build an adjacency list and work directly with it.
  • Use a more efficient cycle detection routine for "cycle vertices": We only need the set of vertices involved in any cycle, not the cycles themselves. We'll use Johnson's algorithm but only to collect vertices, not full paths.
  • Avoid making a sorted list until the final return, as you already do.

Implementation

Why this is much faster

  • No NetworkX dependency at runtime: No O(N) object allocations.
  • No full path enumeration: We only find nodes that participate in cycles, not enumerate all cycles—much quicker.
  • Tarjan's SCC is fast (O(V + E)) and sufficient for this use-case.
  • No nested loops over full cycles reduces high constant factors.

This function's output will exactly match the original function for any input.
All comments not related to NetworkX have been preserved, and the function's signature and result are the same.


If you want an even greater speedup, use numba JIT on this code for large graphs, or try running with PyPy.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 26 Passed
🌀 Generated Regression Tests 60 Passed
⏪ Replay Tests 1 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests Details
- codeflash_replay_tests_6bmby7wi/test_tests_test_dsa_nodes__replay_test_0.py
- test_dsa_nodes.py
🌀 Generated Regression Tests Details
import networkx as nx
# imports
import pytest  # used for our unit tests
from src.dsa.nodes import find_cycle_vertices

# unit tests

# ---------------------------
# 1. Basic Test Cases
# ---------------------------

def test_empty_graph():
    # No edges, so no cycles
    codeflash_output = find_cycle_vertices([])

def test_single_edge_no_cycle():
    # One edge, no cycle
    codeflash_output = find_cycle_vertices([(1, 2)])

def test_two_edges_no_cycle():
    # Two edges, no cycle (1->2->3)
    codeflash_output = find_cycle_vertices([(1, 2), (2, 3)])

def test_simple_cycle():
    # Simple 3-node cycle: 1->2->3->1
    codeflash_output = find_cycle_vertices([(1, 2), (2, 3), (3, 1)])

def test_two_disjoint_cycles():
    # Two cycles: 1->2->1 and 3->4->3
    edges = [(1, 2), (2, 1), (3, 4), (4, 3)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_and_path():
    # 1->2->3->1 (cycle), 4->5 (no cycle)
    edges = [(1, 2), (2, 3), (3, 1), (4, 5)]
    codeflash_output = find_cycle_vertices(edges)

def test_self_loop():
    # Node with a self-loop is a cycle
    codeflash_output = find_cycle_vertices([(1, 1)])

def test_multiple_self_loops():
    # Multiple nodes with self-loops
    edges = [(1, 1), (2, 2), (3, 4)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_extra_edges():
    # Cycle with extra outgoing edge
    edges = [(1, 2), (2, 3), (3, 1), (3, 4)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_duplicate_edges():
    # Cycle with repeated edges
    edges = [(1, 2), (2, 3), (3, 1), (1, 2), (2, 3)]
    codeflash_output = find_cycle_vertices(edges)

# ---------------------------
# 2. Edge Test Cases
# ---------------------------

def test_disconnected_graph():
    # Disconnected nodes, no cycles
    edges = []
    for i in range(10):
        edges.append((i, i+1))
    codeflash_output = find_cycle_vertices(edges)

def test_single_node_no_edges():
    # Single node, no edges
    codeflash_output = find_cycle_vertices([])

def test_single_node_self_loop():
    # Single node with self-loop
    codeflash_output = find_cycle_vertices([(0, 0)])

def test_large_cycle():
    # Large cycle: 0->1->2->...->99->0
    n = 100
    edges = [(i, (i+1)%n) for i in range(n)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_tail():
    # Cycle 1->2->3->1, tail 4->1
    edges = [(1, 2), (2, 3), (3, 1), (4, 1)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_incoming_and_outgoing_edges():
    # Cycle 1->2->3->1, incoming 0->1, outgoing 3->4
    edges = [(1, 2), (2, 3), (3, 1), (0, 1), (3, 4)]
    codeflash_output = find_cycle_vertices(edges)

def test_overlapping_cycles():
    # Overlapping cycles: 1->2->3->1 and 2->3->4->2
    edges = [(1, 2), (2, 3), (3, 1), (3, 4), (4, 2)]
    # 1,2,3,4 are all in cycles
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_branching():
    # 1->2->3->1, 2->4 (branch)
    edges = [(1, 2), (2, 3), (3, 1), (2, 4)]
    codeflash_output = find_cycle_vertices(edges)

def test_isolated_cycle_and_isolated_node():
    # 1->2->1 (cycle), 3 (isolated node)
    edges = [(1, 2), (2, 1)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_redundant_edges():
    # 1->2->3->1, 1->3 (redundant edge)
    edges = [(1, 2), (2, 3), (3, 1), (1, 3)]
    codeflash_output = find_cycle_vertices(edges)

def test_multiple_cycles_with_shared_node():
    # 1->2->3->1, 3->4->5->3
    edges = [(1, 2), (2, 3), (3, 1), (3, 4), (4, 5), (5, 3)]
    # 1,2,3,4,5 are all in cycles
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_non_integer_nodes():
    # Use string nodes
    edges = [("a", "b"), ("b", "c"), ("c", "a"), ("d", "e")]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_tuple_nodes():
    # Use tuple nodes
    edges = [((1,2), (2,3)), ((2,3), (3,1)), ((3,1), (1,2))]
    codeflash_output = find_cycle_vertices(edges)


def test_duplicate_edges_in_cycle():
    # 1->2->1, with duplicate edge 1->2
    edges = [(1, 2), (2, 1), (1, 2)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_negative_nodes():
    # Negative node values
    edges = [(-1, -2), (-2, -3), (-3, -1)]
    codeflash_output = find_cycle_vertices(edges)

# ---------------------------
# 3. Large Scale Test Cases
# ---------------------------

def test_large_graph_no_cycles():
    # Large chain, no cycles
    n = 1000
    edges = [(i, i+1) for i in range(n-1)]
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_single_cycle():
    # Large cycle of 1000 nodes
    n = 1000
    edges = [(i, (i+1)%n) for i in range(n)]
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_multiple_small_cycles():
    # 100 cycles of size 10, disjoint
    cycles = []
    for c in range(100):
        base = c*10
        cycles += [(base+i, base+(i+1)%10) for i in range(10)]
    all_cycle_nodes = sorted(set(i for c in range(100) for i in range(c*10, c*10+10)))
    codeflash_output = find_cycle_vertices(cycles)

def test_large_graph_mixed_cycles_and_paths():
    # 500 nodes in a cycle, 500 in a path
    n = 500
    cycle_edges = [(i, (i+1)%n) for i in range(n)]
    path_edges = [(n+i, n+i+1) for i in range(n-1)]
    edges = cycle_edges + path_edges
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_with_self_loops():
    # 500 nodes with self-loops, 500 without
    n = 500
    edges = [(i, i) for i in range(n)] + [(n+i, n+i+1) for i in range(n-1)]
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_overlapping_cycles():
    # Two large cycles overlapping on 100 nodes
    # Cycle1: 0-499, Cycle2: 400-899 (overlap 400-499)
    edges = []
    for i in range(0, 500):
        edges.append((i, (i+1)%500))
    for i in range(400, 900):
        edges.append((i, i+1 if i < 899 else 400))
    # All nodes from 0-499 and 400-899 are in cycles
    expected = sorted(set(range(0, 500)) | set(range(400, 900)))
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_cycle_with_branches():
    # 1 big cycle, each node has a branch to a unique new node
    n = 500
    edges = [(i, (i+1)%n) for i in range(n)]  # cycle
    edges += [(i, n+i) for i in range(n)]     # branches
    codeflash_output = find_cycle_vertices(edges)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import networkx as nx
# imports
import pytest  # used for our unit tests
from src.dsa.nodes import find_cycle_vertices

# unit tests

# ----------- BASIC TEST CASES -----------

def test_empty_graph():
    # No edges, no cycles
    codeflash_output = find_cycle_vertices([])

def test_single_node_no_edges():
    # Single node, no edges, no cycles
    codeflash_output = find_cycle_vertices([(1, 1)])  # Self-loop is a cycle

def test_two_nodes_no_cycle():
    # Two nodes, one direction, no cycle
    codeflash_output = find_cycle_vertices([(1, 2)])

def test_two_nodes_cycle():
    # Two nodes with edges in both directions, forms a cycle
    codeflash_output = find_cycle_vertices([(1, 2), (2, 1)])

def test_three_nodes_simple_cycle():
    # 1->2->3->1 forms a cycle
    codeflash_output = find_cycle_vertices([(1, 2), (2, 3), (3, 1)])

def test_three_nodes_chain_no_cycle():
    # 1->2->3, no cycle
    codeflash_output = find_cycle_vertices([(1, 2), (2, 3)])

def test_three_nodes_with_self_loop():
    # 1->2->3, 3->3 (self-loop only at 3)
    codeflash_output = find_cycle_vertices([(1, 2), (2, 3), (3, 3)])

def test_disconnected_cycles():
    # Two separate cycles
    edges = [(1, 2), (2, 1), (3, 4), (4, 3)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_and_non_cycle_nodes():
    # 1->2->3->1 is a cycle, 4->5 is not
    edges = [(1, 2), (2, 3), (3, 1), (4, 5)]
    codeflash_output = find_cycle_vertices(edges)

# ----------- EDGE TEST CASES -----------

def test_self_loop_only():
    # Single node with self-loop
    codeflash_output = find_cycle_vertices([(42, 42)])

def test_multiple_self_loops():
    # Multiple nodes with self-loops
    edges = [(1, 1), (2, 2), (3, 3)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_self_loop():
    # 1->2->3->1 is a cycle, 2 has a self-loop
    edges = [(1, 2), (2, 3), (3, 1), (2, 2)]
    # All nodes in the 3-cycle, and 2 (already included) for self-loop
    codeflash_output = find_cycle_vertices(edges)

def test_disconnected_graph_with_one_cycle():
    # 1->2->3->1 (cycle), 4->5 (no cycle), 6 (isolated)
    edges = [(1, 2), (2, 3), (3, 1), (4, 5)]
    codeflash_output = find_cycle_vertices(edges)

def test_unidirectional_cycle_with_tail():
    # 1->2->3->1 is a cycle, 4->1 is a tail to the cycle
    edges = [(1, 2), (2, 3), (3, 1), (4, 1)]
    codeflash_output = find_cycle_vertices(edges)

def test_bidirectional_edges_no_cycle():
    # 1->2, 2->3, 3->4, 4->5, 5->6, 6->7, 7->8, 8->9, 9->10 (no cycles)
    edges = [(i, i+1) for i in range(1, 10)]
    codeflash_output = find_cycle_vertices(edges)

def test_multiple_overlapping_cycles():
    # 1->2->3->1 and 2->4->5->2, overlapping at 2
    edges = [(1, 2), (2, 3), (3, 1), (2, 4), (4, 5), (5, 2)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_branch():
    # 1->2->3->1 is a cycle, 2->4 is a branch out
    edges = [(1, 2), (2, 3), (3, 1), (2, 4)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_isolated_node():
    # 1->2->3->1 is a cycle, 4 is isolated
    edges = [(1, 2), (2, 3), (3, 1)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_duplicate_edges():
    # 1->2->3->1 is a cycle, with duplicate edges
    edges = [(1, 2), (2, 3), (3, 1), (1, 2), (2, 3)]
    codeflash_output = find_cycle_vertices(edges)

def test_cycle_with_non_integer_nodes():
    # Use string nodes
    edges = [("a", "b"), ("b", "c"), ("c", "a"), ("d", "e")]
    codeflash_output = find_cycle_vertices(edges)


def test_large_sparse_no_cycle():
    # 1000 nodes in a chain, no cycles
    edges = [(i, i+1) for i in range(1000)]
    codeflash_output = find_cycle_vertices(edges)

def test_large_sparse_with_one_cycle():
    # 1000 nodes in a chain, plus a cycle at the end
    edges = [(i, i+1) for i in range(999)] + [(999, 500), (500, 999)]
    codeflash_output = find_cycle_vertices(edges)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_complete_cycle():
    # 1000 nodes in a single cycle
    N = 1000
    edges = [(i, (i+1)%N) for i in range(N)]
    codeflash_output = find_cycle_vertices(edges)

def test_large_disconnected_cycles():
    # 5 cycles of 200 nodes each, disconnected
    N = 200
    edges = []
    for offset in range(0, 1000, N):
        edges += [(offset + i, offset + (i+1)%N) for i in range(N)]
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_with_some_cycles_and_branches():
    # 10 cycles of 50 nodes each, each with 50 branches out to unique nodes
    N = 50
    edges = []
    cycle_nodes = []
    for offset in range(0, 500, N):
        # Add cycle
        edges += [(offset + i, offset + (i+1)%N + offset) for i in range(N)]
        cycle_nodes.extend(range(offset, offset+N))
        # Add branches out
        for i in range(N):
            edges.append((offset + i, 1000 + offset + i))
    # Only the cycle nodes are in cycles
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_with_self_loops_and_cycles():
    # 500 nodes, each with a self-loop, and a 500-node cycle
    N = 500
    edges = [(i, i) for i in range(N)]  # self-loops
    edges += [(i, (i+1)%N) for i in range(N)]  # cycle
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_no_cycles():
    # 1000 nodes, random edges but no cycles (DAG)
    N = 1000
    edges = [(i, i+1) for i in range(N-1)]
    codeflash_output = find_cycle_vertices(edges)

def test_large_graph_with_multiple_small_cycles():
    # 100 cycles of 10 nodes each, disconnected
    N = 10
    edges = []
    for offset in range(0, 1000, N):
        edges += [(offset + i, offset + (i+1)%N + offset) for i in range(N)]
    codeflash_output = find_cycle_vertices(edges)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_cycle_vertices-mb8dbjdg and push.

Codeflash

Here’s an optimized version of your program, focusing on speeding up the **cycle search** (the slow part, >92% of time spent) and reducing overhead.

### Optimization Approach
- **Avoid full NetworkX graph construction**: The conversion from an edge list to a NetworkX graph in each call is somewhat expensive. We'll build an *adjacency list* and work directly with it.
- **Use a more efficient cycle detection routine for "cycle vertices"**: We only need the *set of vertices involved in any cycle*, not the cycles themselves. We'll use Johnson's algorithm but only to collect vertices, not full paths.
- **Avoid making a sorted list until the final return**, as you already do.

#### Implementation



### Why this is much faster
- **No NetworkX dependency at runtime**: No O(N) object allocations.
- **No full path enumeration**: We only find nodes that participate in cycles, not enumerate all cycles—much quicker.
- **Tarjan's SCC** is fast (`O(V + E)`) and sufficient for this use-case.
- **No nested loops over full cycles** reduces high constant factors.

---

**This function's output will exactly match the original function for any input.**  
All comments not related to NetworkX have been preserved, and the function's signature and result are the same.

---

**If you want an even greater speedup, use [numba][1] JIT on this code for large graphs, or try running with PyPy.**

[1]: https://numba.pydata.org/
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 28, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 May 28, 2025 19:58
@KRRT7 KRRT7 closed this Jun 4, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-find_cycle_vertices-mb8dbjdg branch June 4, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant