FEAT implement batching for memory interface by maifeeulasad · Pull Request #1325 · microsoft/PyRIT

maifeeulasad · 2026-01-25T09:59:11Z

Description

Closes #845

When MemoryInterface query methods (get_scores, get_message_pieces, get_attack_results, get_scenario_results) receive large IN-clause parameter lists (e.g., 1000+ prompt IDs), the generated SQL exceeds SQLite's per-statement bind variable limit of 999, causing the query to fail. This PR adds transparent batching: large IN-clause lists are split into chunks, executed as separate queries, and the results are merged and deduplicated.

What changed

Core batching infrastructure (memory_interface.py):

_MAX_BIND_VARS class attribute on MemoryInterface (default: 500). Subclasses can override — AzureSQLMemory sets it to 2000 since Azure SQL supports 2100 params.
_execute_batched_query — Low-level helper that splits a single IN-clause into batches, runs separate queries, and deduplicates results by primary key. Accepts an optional batch_size override for callers that need a smaller batch.
_query_with_list_params — Higher-level helper used by get_message_pieces and get_attack_results. Classifies list parameters as "small" (fit in one query) or "large" (need batching), adds small params to SQL conditions, batches on the first large param, and filters remaining large params in Python. Computes an effective batch size that accounts for bind variables consumed by small params, preventing cumulative overflow.
_dedup_attack_entries — Static helper extracted from get_attack_results to deduplicate AttackResultEntry rows by conversation_id (legacy data integrity workaround). Applied consistently across all code paths (batched and non-batched).

Method changes:

get_scores — Batches score_ids via _execute_batched_query. Adds early return for score_ids=[] for consistency with other methods.
get_message_pieces — Routes list params (prompt_ids, original_values, converted_values, converted_value_sha256) through _query_with_list_params.
get_attack_results — Routes attack_result_ids and objective_sha256 through _query_with_list_params. Preserves dedup-by-conversation-id on all paths.
get_scenario_results — Batches scenario_result_ids and inner all_conversation_ids via _execute_batched_query.

azure_sql_memory.py:

Overrides _MAX_BIND_VARS = 2000 so Azure SQL uses larger batches (its limit is 2100).

Why this approach

Multiple separate queries instead of OR-ing IN clauses — An earlier iteration tried combining batches with OR(col.IN(batch1), col.IN(batch2), ...) in a single statement, but this still binds all parameters in one statement and doesn't solve the limit.
Conservative default of 500 — SQLite's actual limit is 999. Using 500 leaves headroom for bind variables from other conditions in the same query.
Python-side filtering for multiple large params — When multiple list params exceed the limit simultaneously, only the first is batched in SQL. The rest are filtered in Python after fetching. This avoids nested batching complexity while handling an uncommon edge case correctly.

Tests and Documentation

New test file: tests/unit/memory/memory_interface/test_batching_scale.py (586 lines, 25 tests)

Covers:

get_message_pieces with many prompt_ids, original_values, sha256 hashes (single and multiple large params)
get_scores with many score_ids and empty list edge case
get_attack_results with many IDs, many objective_sha256, and empty list edge case
_execute_batched_query with small lists (single query), large lists (multiple queries), exact batch boundary, custom batch_size override
Effective batch size reduction when small + large params are combined
Filter preservation (role filter works correctly through batched queries)
Deduplication correctness
Query count verification via _query_entries spy

All 220 tests/unit/memory/memory_interface/ tests pass. Pre-commit hooks (ruff, mypy strict) clean.

Copilot

Pull request overview

This PR aims to address scaling limits in the PyRIT MemoryInterface by introducing “batched” filtering logic for large IN (...) queries, along with unit tests intended to validate behavior when many IDs/values are passed.

Changes:

Added _SQLITE_MAX_BIND_VARS and _batched_in_condition helper in MemoryInterface and applied it to several query filters.
Updated multiple retrieval methods (e.g., scores, message pieces, attack results, scenario results) to use the new helper for IN filtering.
Added a new unit test module to exercise “batching” behavior and large-input queries against the SQLite-backed memory instance.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`pyrit/memory/memory_interface.py`	Introduces a helper intended to avoid bind-variable limits and applies it across several query paths.
`tests/unit/memory/memory_interface/test_batching_scale.py`	Adds tests intended to validate that large ID/value inputs don’t break memory queries.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

maifeeulasad · 2026-02-07T11:46:02Z

@romanlutz eagerly waiting for your feedback!

romanlutz · 2026-04-24T05:45:01Z

@romanlutz eagerly waiting for your feedback!

I apologize for taking ages. There were A LOT of PRs to work through.

Applies PR microsoft#1325 changes onto latest main with conflict resolution: - Add _SQLITE_MAX_BIND_VARS constant and _execute_batched_query helper - Add batching to get_scores, get_message_pieces, get_attack_results, get_scenario_results - Preserve main's dedup-by-conversation-id in get_attack_results (all paths) - Preserve main's identifier_filters, attack_class, converter_classes params - Fix test helper: use constructor role param instead of mutating _role post-creation - Extract _dedup_attack_entries helper to avoid mypy no-redef errors Co-authored-by: maifeeulasad <maifeeulasad@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Move _SQLITE_MAX_BIND_VARS to class attribute _MAX_BIND_VARS on MemoryInterface - Override _MAX_BIND_VARS to 2000 in AzureSQLMemory (Azure SQL supports 2100) - Add early return for score_ids=[] in get_scores (consistency with other methods) - Add tests for get_attack_results batching (many IDs, many sha256, empty list) - Add test for get_scores empty list handling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Fix dedup bug in _execute_batched_query: str(None) produced truthy 'None' string, making the no-id fallback dead code. Use explicit None check instead. - Simplify get_scenario_results: remove duplicated elif/else branches, let _execute_batched_query own threshold logic for both scenario_result_ids and all_conversation_ids. - Fix Optional -> pipe syntax in _execute_batched_query signature. - Document bind-variable headroom limitation in docstring. - Fix conditional test assertion that could pass vacuously. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Extract shared list_params batching logic from get_message_pieces and get_attack_results into _query_with_list_params helper (~40 duplicated lines removed from each caller). - Add batch_size parameter to _execute_batched_query so callers can reduce the batch size when other conditions consume bind variables. - _query_with_list_params computes effective_batch_size by subtracting small_param bind count from _MAX_BIND_VARS, preventing cumulative overflow of the per-statement bind variable limit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Test that batch size is reduced when small IN-clause params consume bind variables (effective_batch_size = _MAX_BIND_VARS - small_binds). Verifies the correct number of batched queries are issued. - Test that _execute_batched_query respects the batch_size override parameter, using batch_size=30 to verify 4 queries for 100 items. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…e-maifee

romanlutz reviewed Jan 25, 2026

View reviewed changes

Comment thread pyrit/memory/memory_interface.py Outdated

romanlutz reviewed Jan 25, 2026

View reviewed changes

Comment thread pyrit/memory/memory_interface.py Outdated

romanlutz changed the title ~~implement batching for memory interface~~ FEAT implement batching for memory interface Jan 25, 2026

maifeeulasad marked this pull request as draft January 26, 2026 07:08

maifeeulasad marked this pull request as ready for review January 26, 2026 11:26

maifeeulasad requested a review from romanlutz January 26, 2026 11:26

romanlutz requested a review from Copilot February 6, 2026 21:39

Copilot started reviewing on behalf of romanlutz February 6, 2026 21:39 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

Comment thread pyrit/memory/memory_interface.py Outdated

Comment thread pyrit/memory/memory_interface.py Outdated

Comment thread tests/unit/memory/memory_interface/test_batching_scale.py Outdated

maifeeulasad requested a review from Copilot February 7, 2026 09:06

Copilot started reviewing on behalf of maifeeulasad February 7, 2026 09:07 View session

Copilot AI reviewed Feb 7, 2026

View reviewed changes

Comment thread pyrit/memory/memory_interface.py Outdated

Comment thread pyrit/memory/memory_interface.py

Comment thread pyrit/memory/memory_interface.py Outdated

Comment thread pyrit/memory/memory_interface.py Outdated

romanlutz force-pushed the fix/memory-interface-batching-scale-maifee branch from 683acd7 to 3420468 Compare April 24, 2026 06:53

romanlutz and others added 7 commits April 23, 2026 23:54

Merge remote-tracking branch 'origin/main' into pr-1325-updated

99b96b5

Merge branch 'main' into fix/memory-interface-batching-scale-maifee

7b5462e

Merge branch 'microsoft:main' into fix/memory-interface-batching-scal…

db2bed2

…e-maifee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT implement batching for memory interface#1325

FEAT implement batching for memory interface#1325
maifeeulasad wants to merge 8 commits intomicrosoft:mainfrom
maifeeulasad:fix/memory-interface-batching-scale-maifee

maifeeulasad commented Jan 25, 2026 •

edited by romanlutz

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maifeeulasad commented Feb 7, 2026

Uh oh!

romanlutz commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maifeeulasad commented Jan 25, 2026 • edited by romanlutz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What changed

Why this approach

Tests and Documentation

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maifeeulasad commented Feb 7, 2026

Uh oh!

romanlutz commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maifeeulasad commented Jan 25, 2026 •

edited by romanlutz

Loading