Expand component search indexing fields by Mbeaulne · Pull Request #2423 · TangleML/tangle-ui

Mbeaulne · 2026-06-18T16:52:34Z

Description

Expands the component search index to include richer input/output details and a new metadata match field.

Previously, the io searchable field only contained input and output names. It now includes descriptions, types, and annotations for each input and output spec. A new metadata field has been added that indexes component-level metadata annotations (with a blocklist for noisy keys like python_original_code, editor state, and similar large/irrelevant blobs) as well as the source label and published_by value from the component reference.

The MatchField type and all related scoring, labeling, and UI display logic have been updated to include metadata alongside the existing fields. Annotation values longer than 500 characters are excluded from indexing to avoid polluting search with large blobs.

Related Issue and Pull requests

Type of Change

Checklist

I have tested this does not break current pipelines / runs functionality
I have tested the changes on staging

Screenshots (if applicable)

Test Instructions

Open the component dashboard and search for a term that appears in a component's metadata annotations (e.g. a framework name like sklearn or lightgbm).
Verify the result surfaces with metadata listed as a matched field.
Search for a publisher email address and confirm the matching component appears.
Search for a term that exists only in python_original_code or other excluded annotation keys and confirm it does not return results.
Search for an input/output description or type (e.g. parquet, artifact) and confirm results appear with io as the matched field.

Additional Comments

The annotation exclusion list (ANNOTATION_KEYS_EXCLUDED_FROM_SEARCH) and the 500-character value length cap are the primary mechanisms for keeping the metadata index clean. These can be extended as new noisy annotation keys are identified.

github-actions · 2026-06-18T16:52:47Z

🎩 Preview

A preview build has been created at: 06-18-expand_component_search_indexing_fields/327681f

Mbeaulne · 2026-06-18T16:52:47Z

camielvs · 2026-06-19T22:02:12Z

🤖 Code review — Expand component search indexing fields

Reviewed as the base of the AI-search stack (#2423→#2433). Overall this is a clean, well-tested expansion: input/output descriptions + types + annotations now feed the io field, and a new metadata field indexes component-level annotations, the source label, and published_by. Excluded-key set and the 500-char per-value cap sensibly keep large blobs (python_original_code, editor positions) out. The extractAnnotationsText / stringifySearchValue helpers are tidy and the test coverage matches the new behavior well.

A few small things worth a look — none blocking:

source.label indexed into metadata makes generic tokens match broadly. Because the source label is folded into the searchable metadata text, typing user, standard, or published now matches every component from that source (at weight 1). On short queries this can inject noise. The ComponentSearchSource.id doc comment already anticipates "future filter chips / URL state" — source feels more like a filter facet than a free-text token. Worth confirming this is the intended UX.
No aggregate cap on annotation text. extractAnnotationsText caps each value at MAX_ANNOTATION_TEXT_LENGTH (500) but there's no bound on the number of annotations concatenated. A component with many sub-500-char annotations could produce a large searchable string. Bounded in practice for real components, but a total-length guard would make the index size predictable.
published_by is an email and is now searchable. It's already surfaced in the UI (ComponentHistoryTimeline, ComponentItem), so this is consistent rather than a new disclosure — just flagging that searching by author email is now possible by design.
Question: python_dependencies is in the excluded-keys set. Users sometimes search by library/dependency ("tensorflow", "lightgbm"). Implementation text (image + command) covers some of this, but was excluding dependencies a deliberate signal/size tradeoff?

Nice cleanup folding the duplicated name/type guards into isNonEmptyString.

Expand component search indexing fields

327681f

Mbeaulne mentioned this pull request Jun 18, 2026

Normalize component search tokens for better matching #2424

Open

8 tasks

Mbeaulne marked this pull request as ready for review June 18, 2026 16:57

Mbeaulne requested a review from a team as a code owner June 18, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand component search indexing fields#2423

Expand component search indexing fields#2423
Mbeaulne wants to merge 1 commit into
masterfrom
06-18-expand_component_search_indexing_fields

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

camielvs commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue and Pull requests

Type of Change

Checklist

Screenshots (if applicable)

Test Instructions

Additional Comments

Uh oh!

github-actions Bot commented Jun 18, 2026

🎩 Preview

Uh oh!

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camielvs commented Jun 19, 2026

🤖 Code review — Expand component search indexing fields

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Mbeaulne commented Jun 18, 2026 •

edited

Loading