diff --git a/SPEC.md b/SPEC.md index 9a03d54..cc85418 100644 --- a/SPEC.md +++ b/SPEC.md @@ -1504,7 +1504,7 @@ becomes available. ### Known Platform Limits | Platform | Truncation Limit | Source | Confidence | Notes | -|----------|-----------------|--------|------------|-------| +| ---------- | ----------------- | -------- | ------------ | ------- | | Claude Code | ~100,000 chars | [Reverse engineering](https://giuseppegurgone.com/claude-webfetch) | High | Trusted sites serving `text/markdown` under 100K chars bypass summarization model entirely. Content over this threshold goes through a summarization model that may lose information. | | MCP Fetch (reference server) | 5,000 chars (default) | [Official docs](https://pypi.org/project/mcp-server-fetch/) | High | Default `max_length` is 5,000 chars. Configurable up to 1,000,000. Supports chunked reading via `start_index`. | | Claude API (web_fetch tool) | ~20,700 chars - default, unset | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | Medium | Optional `max_content_tokens` parameter can cap content length, but no default truncation limit is documented. Distinct implementation from Claude Code client-side tool. Default truncation ~20,700 chars when unset - ended mid-word. `max_content_tokens` is approximate — setting 5,000 returned 17,186 chars. Truncation occurs mid-token. CSS stripped effectively unlike Claude Code. HTML boilerplate 81–97.5% before first heading; Markdown reduces content 77%. JS-rendered pages return static shell only. | @@ -1512,7 +1512,7 @@ becomes available. | OpenAI (web search) | Unknown | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | Medium | 128K token context window for web search. `search_context_size` parameter (low/medium/high) controls context amount but no per-page truncation limit is surfaced; when the tool invokes, any truncation of retrieved source content occurs before the model generates a response and isn't observable via the APIs. Consistent latency lever in Chat Completions API track, high ~1.5–1.7× slower, inconsistent in Responses API track. Source count stable at 12 regardless of context size. Tool invocation conditional and deterministic: static facts and trivial math don't invoke the tool. Domain filtering documented but non-functional via Python SDK — allow-list worked once on `web_search_preview`, never on `web_search`; block-list never succeeded across 6 runs, 2 tool types, 2 models. `search_queries_issued` appends training-era year strings despite running in 2026. Tested on `gpt-4o` + `gpt-4o-mini-search-preview` - behavior may vary across supported models. | | Cursor | Method-dependent | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | High | No documented truncation limit, behavior varies between backend methods `WebFetch MCP` ~28KB, `urllib` ~72KB, other routes 240KB+; `Auto` agent routing opaque; Cursor autonomously selects fetch mechanism. On timeout, falls back to `curl` (unfiltered HTML, 16MB+ observed). Requests `text/markdown` via `Accept` header. No token limit detected (tested 6.68M tokens). Perfect reproducibility for same URL; high variance for small files across sessions. | | GitHub Copilot | No fixed ceiling detected | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | Medium | No documented web fetch or truncation details; tool selection is non-deterministic and not controllable by prompt. `fetch_webpage` identified through logs only; performs relevance-ranked semantic excerpts with `...` elision markers in HTML-to-Markdown transformation with chunk-based reassembly; output order doesn't always reflect page reading order. No size limit detected across 55 runs; `curl` substitution delivers full retrieval, raw bytes in server format with no transformation layer. `Auto` model routing dispatches across multiple models with no documented routing logic. Tested on `Claude Haiku 4.5`, `Claude Sonnet 4.6`, `GPT-5.3-Codex`, `GPT-5.4`, `Grok Code Fast 1`, `Raptor mini (Preview)`. | -| Windsurf | Unknown | -- | -- | Docs state it "chunks up web pages" and "skims to the section we want." No specific limits documented. | +| Windsurf Cascade | No fixed ceiling detected at retrieval stage, but agent-dependent write ceiling | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | High | Two-stage pipeline `read_url_content` returns chunk index with summaries, metadata, requires sequential `view_content_chunk` calls. Full retrieval agent, doc size dependent. Full retrieval doesn't guarantee full content delivery. Agents often retrieve fully ~<14 chunks, spotty ~35, sparse sampling 50+. Includes per-chunk trucation, some chunk summaries include byte-count loss notices. CSS-heavy, SPAs often retrieve ~20-35% expected rendered size. `@web` syntax redundant with URL. Read-write asymmetry: agents that self-report full retrieval frequently fail to reproduce semantically-meaningful content with `curl` HTML/JS shells, false completions, cross-agent file reuse. | **Thank you to contributors!** @@ -1521,6 +1521,7 @@ becomes available. - GitHub Copilot limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/) - Google Gemini (URL context) limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/) - OpenAI (web search) limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/) +- Windsurf Cascade limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/) ### What This Means for Threshold Selection