Skip to content

Conversation

@PRAteek-singHWY
Copy link
Contributor

@PRAteek-singHWY PRAteek-singHWY commented Jan 15, 2026

🚀 Prune Gap Analysis Search to Save Time and Memory

This PR implements a tiered pruning strategy for Gap Analysis to significantly reduce execution time and memory usage during map analysis.
The change directly addresses Issue #506 and aligns with the original design discussion around stopping early when strong or medium links are found.


🧠 Problem

Gap analysis currently performs an expensive wildcard Neo4j traversal:

MATCH p = allShortestPaths((BaseStandard)-[*..20]-(CompareStandard))

This approach:

  • Traverses all relationship types
  • Generates a large number of weakly relevant paths
  • Consumes large amounts of memory
  • Can take days to complete on large datasets
  • Runs even when direct or strong links already exist

In practice, we are only interested in the strongest connections between standards.


✅ Solution: Tiered Pruning Strategy

The search is now executed in three tiers, with early exit once results are found.

Tier 1 – Strong Links

Executed first. If any paths are found, the search stops immediately.

Relationships included:

  • LINKED_TO
  • AUTOMATICALLY_LINKED_TO
  • SAME

These correspond to the strongest connections (penalty = 0) and include equivalence (SAME) relationships.


Tier 2 – Medium Links

Executed only if Tier 1 returns no results.

Relationships included:

  • LINKED_TO
  • AUTOMATICALLY_LINKED_TO
  • SAME
  • CONTAINS

This captures hierarchical relationships without falling back to a full wildcard traversal.


Tier 3 – Fallback (Wildcard)

Executed only if Tier 1 and Tier 2 return no paths.

[*..20]

This preserves existing behavior as a fallback to ensure no loss of coverage.


🧪 Testing

A new unit test has been added to verify pruning behavior:

  • Confirms that Tier 3 is not executed when Tier 1 returns results
  • Uses mocking to detect which Neo4j queries are executed
  • Protects against future regressions in pruning logic

Test command:

python3 -m unittest application/tests/gap_analysis_db_test.py

All existing gap analysis tests continue to pass.


📈 Impact


🔗 Related Issue

Prune map analysis search to save time and memory
Fixes #506


📝 Notes

  • Path scoring logic is unchanged
  • Relationship semantics are preserved
  • This PR focuses strictly on backend query pruning
  • Frontend categorization changes are intentionally deferred to a follow-up PR (Stage 2)

- Introduce tiered gap analysis queries (strong → medium → wildcard)
- Stop traversal early when strong or medium paths exist
- Preserve existing scoring and semantics
- Add unit test to verify Tier-3 traversal is skipped when not needed

Fixes OWASP#506
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prune map analysis search to save time and memory

1 participant