Skip to content

Add deep AI search to rerank all components in selected sources#2430

Open
Mbeaulne wants to merge 1 commit into
06-18-build_broader_ai_candidate_pools_for_component_searchfrom
06-18-improve_ai_rerank_payload_for_component_search
Open

Add deep AI search to rerank all components in selected sources#2430
Mbeaulne wants to merge 1 commit into
06-18-build_broader_ai_candidate_pools_for_component_searchfrom
06-18-improve_ai_rerank_payload_for_component_search

Conversation

@Mbeaulne

@Mbeaulne Mbeaulne commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds a Deep AI search button alongside the existing AI (smart) search button. While the standard AI search sends a limited, curated set of candidate components to the reranker, Deep AI search sends the entire searchable index — lexical hits first so truncating providers still see the most likely matches early — allowing the model to rerank across every available component in the selected sources.

Key changes:

  • Introduces buildDeepAiCandidateMatches, which builds a candidate pool from all indexed components (no cap), ordered with lexical matches first and remaining components appended alphabetically.
  • Exposes canDeepRerank and deepRerank from useComponentSearchV2State and wires them into both the Dashboard and Editor search UIs.
  • Enriches RerankCandidate with richer I/O metadata (type and description, not just names) and a source field, giving the model more signal when ranking.
  • Refactors the shared startAiSearch / startRerank helpers to accept an arbitrary candidate list, removing duplication between the standard and deep paths.
  • Updates componentReferenceToCandidate to accept an optional source argument and serialize I/O types (including complex object types) via JSON.stringify.

Related Issue and Pull requests

Type of Change

  • Bug fix
  • New feature
  • Improvement
  • Cleanup/Refactor
  • Breaking change
  • Documentation update

Checklist

  • I have tested this does not break current pipelines / runs functionality
  • I have tested the changes on staging

Screenshots (if applicable)

Test Instructions

  1. Open the component search panel (Dashboard or Editor).
  2. Enter a non-empty query.
  3. Confirm the Deep AI search button appears and is enabled when an AI provider is configured.
  4. Click Deep AI search and verify that results are reranked across all components in the selected sources, not just the top lexical candidates.
  5. Confirm the existing AI search (sparkles) button still behaves as before.
  6. Verify the button is disabled when the query is empty or no AI provider is configured.

Additional Comments

scoreAllCandidates is intentionally set to false for deep rerank to avoid scoring overhead across the full index on every result row. The standard AI search retains scoreAllCandidates: true so relevance percentages continue to appear on displayed results.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

🎩 Preview

A preview build has been created at: 06-18-improve_ai_rerank_payload_for_component_search/817441b

@Mbeaulne Mbeaulne changed the title Improve AI rerank payload for component search Add deep AI search to rerank all components in selected sources Jun 18, 2026
@Mbeaulne Mbeaulne marked this pull request as ready for review June 18, 2026 17:59
@Mbeaulne Mbeaulne requested a review from a team as a code owner June 18, 2026 17:59
candidates,
seenDigests,
lexicalSearch(index, trimmedQuery, {
limit: index.length,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

[HIGH] Deep search builds an unbounded candidate pool: lexicalSearch(index, ..., { limit: index.length }) followed by appending every remaining entry with Number.MAX_SAFE_INTEGER, then JSON.stringify-ing each candidate into a billed LLM rerank prompt. Output is bounded (scoreAllCandidates: false → ≤20) but input is not — there is no cap, truncation, or confirmation. Cap the deep pool at a high-but-finite N (and/or surface a confirmation when the pool is very large). At minimum, make Number.MAX_SAFE_INTEGER a deliberate, documented bound rather than effectively unlimited.

const candidates: LexicalMatch[] = [];
const seenDigests = new Set<string>();
const allLexicalMatches = lexicalSearch(filteredIndex, query, {
limit: filteredIndex.length,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

[HIGH] Same unbounded-deep-pool concern on the Dashboard surface: lexicalSearch(filteredIndex, ..., { limit: filteredIndex.length }) then every remaining sortedIndex entry is appended, and the whole pool is JSON.stringify-ed into a billed LLM rerank with no input cap or confirmation. Bound the deep pool to a finite N or confirm before sending a very large pool; document the worst-case prompt size.

</Button>
<Button
variant="outline"
onClick={handleDeepAiSearch}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

[MEDIUM] This "Deep AI search" Button gives no busy/in-progress feedback while isReranking, unlike the sibling Sparkles button which sets a dynamic aria-label. Add e.g. aria-label={isReranking ? "Deep AI search in progress" : "Deep AI search"} (and consider a busy indicator) so the disabled-while-reranking state is announced to assistive tech.

};

const handleSmartSearch = () => {
const startAiSearch = (matches: LexicalMatch[]) => {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

[LOW] startAiSearch passes no scoreAllCandidates for either smart or deep, so both default to false here, whereas the Editor uses true for smart / false for deep. Routing the new deep button through this shared helper cements a smart/deep behavior divergence between the two surfaces. Consider passing a flag (as the Editor’s startRerank does) for parity.

const trimmedQuery = query.trim();
const lexicalMatches = buildLexicalMatches(index, trimmedQuery);
const aiCandidateMatches = buildAiCandidateMatches(index, trimmedQuery);
const deepAiCandidateMatches = buildDeepAiCandidateMatches(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

[LOW] buildDeepAiCandidateMatches (full-index lexicalSearch + full [...index].sort()) runs every render though it is only needed at click time. React Compiler memoizes this, so it is not a correctness bug. Optionally, compute only .length/enabled-state during render and build the ordered pool lazily inside the deep handler.

@Mbeaulne

Copy link
Copy Markdown
Collaborator Author

🤖 This is an AI-generated code review comment.

[MEDIUM] src/hooks/useNaturalLanguageComponentSearch.ts ~lines 38-50 — The rerank mutation passes no AbortSignal, so a now-much-heavier deep rerank cannot be cancelled if the user retypes. Pre-existing, but materially amplified by this feature. Follow-up: thread signal into the mutation so a new query aborts the in-flight deep call.

(Posted as a PR-level comment because this file has no changed lines in the diff to anchor an inline comment to.)

@Mbeaulne Mbeaulne force-pushed the 06-18-improve_ai_rerank_payload_for_component_search branch from 7d30372 to c443c7a Compare June 18, 2026 19:12
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from d363ca7 to 60b076d Compare June 18, 2026 19:12
@Mbeaulne Mbeaulne force-pushed the 06-18-improve_ai_rerank_payload_for_component_search branch from c443c7a to 9fdd3d5 Compare June 18, 2026 20:28
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 60b076d to 455266e Compare June 18, 2026 20:28
@Mbeaulne Mbeaulne force-pushed the 06-18-improve_ai_rerank_payload_for_component_search branch from 9fdd3d5 to d9e254e Compare June 18, 2026 20:49
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch 2 times, most recently from 4a246ee to 8cc6222 Compare June 18, 2026 21:02
@Mbeaulne Mbeaulne force-pushed the 06-18-improve_ai_rerank_payload_for_component_search branch from d9e254e to 1351eea Compare June 18, 2026 21:02
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 8cc6222 to 88f3546 Compare June 18, 2026 21:16
@Mbeaulne Mbeaulne force-pushed the 06-18-improve_ai_rerank_payload_for_component_search branch from 1351eea to 817441b Compare June 18, 2026 21:16
@camielvs

Copy link
Copy Markdown
Collaborator

🤖 Code review — Add deep AI search to rerank all components in selected sources

This is the most consequential PR in the stack — it adds a "Deep AI" button that reranks the entire library, plus a richer candidate payload (per-IO name/type/description + source). The startRerank/startAiSearch refactor with the scoreAllCandidates flag is clean, the richer payload should genuinely help model judgment, and the deep-pool test (lexical hits first, then the rest) pins the ordering. But a few things need attention before this ships.

Main concerns

  • Deep search sends an unbounded candidate set to the LLM in one request. buildDeepAiCandidateMatches returns every searchable component (limit: index.length, appended with Number.MAX_SAFE_INTEGER), and the new per-IO payload (name + type + description) inflates each candidate vs the old name-only arrays. scoreAllCandidates: false bounds the output (1500 tokens / top-20) but the input is uncapped. For a few hundred components this is a large-but-survivable prompt; for a big registered library it risks blowing the model/provider context window — which surfaces as a hard error or, worse, silent truncation. The "lexical hits first so truncating providers see likely matches early" comment helps ordering but doesn't prevent the failure. Recommend a hard candidate cap (or a token-budget estimate with chunked passes), and deciding what the UX does when the library exceeds it. Right now nothing communicates a ceiling.

  • Post-deep-rerank the full reordered library is rendered with no virtualization. After a deep rerank, displayedMatches/displayedResults becomes every component reordered (Editor rerankedMatches, Dashboard mergeRerankIntoLexical over rerankBaseMatches = deep pool). ComponentSearchResults does a plain results.map(...) — no windowing, no cap — and the header prints Search Results ({results.length}). Deep search on a large library renders hundreds/thousands of DOM rows. Confirm the list is virtualized or cap the displayed set.

  • The two search surfaces have diverged into parallel implementations. Deep-pool construction is duplicated: the Editor uses buildDeepAiCandidateMatches from componentSearchV2Logic, while DashboardComponentsV2View reimplements it inline as an IIFE. This compounds existing divergence — the Dashboard's aiCandidateMatches is the older simple broad || sampleEvenly form and never received Add source-diverse AI rerank candidate pool #2429's source-diversity tiering, and the two use different merge helpers (rerankedMatches vs mergeRerankIntoLexical). These will drift in behavior and bugs. Strongly consider extracting the shared selection/merge logic so both surfaces stay consistent.

  • scoreAllCandidates differs between surfaces. In the Editor, smart = true (every result badged with a relevance %), deep = false. In the Dashboard, both smart and deep go through startAiSearchrerank({ query, candidates }) with no scoreAllCandidates at all (so it defaults false). So Dashboard smart search behaves differently from Editor smart search. If that's intentional, a comment would help; if not, align them.

Minor

  • source.label (user-controlled for registered libraries) now flows into the candidate payload. It's inside the <candidates> block already framed as untrusted in the system prompt, so this is consistent with the existing IO/description fields — just noting the new field is equally attacker-influenceable and relies on that same framing.
  • The deep pool sorts the whole index alphabetically ([...index].sort(...)) on every render the button is enabled; with React Compiler memoization this is probably fine, but it's O(n log n) per render for a list that only matters on click — could be lazily computed in the handler.

Solid direction and a genuinely useful feature; the input-size ceiling and the surface duplication are the two I'd want resolved before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants