Feat/web search engine#338
Conversation
There was a problem hiding this comment.
Thanks for tackling this — Mojeek 403s are a real problem for users behind certain networks, so a configurable backend is welcome. Mojeek staying as default means zero risk for the existing path, which is the right shape.
Before this can land, a few things to clean up:
Blockers
-
bun.lockshouldn't be in the diff — repo uses npm (package-lock.json). Looks like you ranbun installlocally. Pleasegit rm bun.lockand add it to.gitignoreso it doesn't sneak back. -
CodeQL: 1 high + 1 medium, both on the new
searchSearxng:- high (
js/polynomial-redos,src/tools/web.ts:89) —replace(/\/+$/, "")on a user-controlled endpoint. Practical risk is low (input is short and end-anchored), but the rule fires. - medium (
js/file-access-to-http,src/tools/web.ts:95) —fetch(url)where url comes from~/.reasonix/config.json. The threat model isn't really an external attacker (writing the config = local takeover already), but invalid / non-http URLs in the config could behave unpredictably.
Both fix with one change — swap the manual normalization for the WHATWG URL parser + a protocol allowlist:
function normalizeSearxngEndpoint(raw: string): string { let url: URL; try { url = new URL(raw.includes("://") ? raw : `http://${raw}`); } catch { throw new Error(`web_search: invalid SearXNG endpoint "${raw}"`); } if (url.protocol !== "http:" && url.protocol !== "https:") { throw new Error(`web_search: SearXNG endpoint must be http(s), got ${url.protocol}`); } return url.origin; }
Use it at the top of
searchSearxngand reusebaseUrleverywhere. The polynomial regex is gone, and the protocol check acts as a sanitizer that CodeQL recognizes — both alerts should clear. - high (
-
Tests for
parseSearxngHtmlResultsplease. HTML scraping breaks silently when upstream tweaks markup, so this is exactly the shape that pays back tests. Use the existingparseMojeekResultstests as the template — fixture HTML in, expectedSearchResult[]out, plus an empty-results case.
Cleanups
-
The dynamic
await import("../config.js")insideweb_search'sfn(aroundweb.ts:410) is unnecessary — the file already has the static import at the top, andreadConfig()re-reads from disk on every call, so the static getters give you the same "picks up changes immediately" behavior with no overhead. Drop the dynamic import. -
/web-search-engineis verbose for a slash. The alias system landed in 0.30.0 yesterday — suggest renaming to/search-engineand declaring a short alias on the spec, something likealiases: ["wse"]oraliases: ["engine"]. Seesrc/cli/ui/slash/commands.tsfor examples (/exithasaliases: ["quit", "q"]).
Optional
- The README pitch "93+ engines including Google, Bing, DuckDuckGo, Brave" advertises SearXNG's upstream coverage rather than what we provide. Consider rewording to something like "SearXNG (self-hosted; aggregates whatever upstream engines your instance is configured for)" — more honest about who's doing the work.
Happy to merge once the CodeQL alerts, bun.lock, and tests are addressed. Thanks again for the contribution.
…command - Add webSearchEngine/webSearchEndpoint to ReasonixConfig (persists to ~/.reasonix/config.json) - Refactor webSearch() to dispatch between searchMojeek (default) and searchSearxng - searchSearxng uses HTML format (JSON API often blocked by SearXNG limiter) - Auto-normalizes endpoint protocol (localhost:8075 → http://localhost:8075) - SearXNG unreachable → model informs user to install SearXNG - Add /web-search-engine slash command to switch engines at runtime - Tool re-reads config on each call so changes take effect immediately
- Add Web search section to README with engine switching and SearXNG setup - Update ARCHITECTURE.md web.ts entry to note multi-engine support
582056f to
adaf412
Compare
…t, /search-engine alias, tests, docs
adaf412 to
b78a586
Compare
|
thanks for your review. I have modify the branch. pls review later. |
esengine
left a comment
There was a problem hiding this comment.
Every blocker addressed cleanly — the normalizeSearxngEndpoint shape, the parseSearxngHtmlResults tests, the dynamic-import removal, the slash rename. CodeQL is green. Merging now. Thanks for sticking with the review and for a well-scoped first contribution.
Summary
Adds SearXNG as an alternative web_search backend, switchable via /web-search-engine slash command or ~/.reasonix/config.json. Mojeek remains the default.
Background
web_search was hardcoded to scrape Mojeek's HTML. Mojeek works for most setups, but some environments get persistent 403s — including mine.
SearXNG is a self-hosted metasearch engine. Running it locally works for my use case. It doesn't magically avoid blocks (SearXNG can also hit 403s from upstream engines), but having a configurable backend means users can pick what works in their environment. Mojeek stays the default.
Changes by layer
Config (src/config.ts): Two new fields on ReasonixConfig (webSearchEngine and webSearchEndpoint) plus getters.
Search dispatch (src/tools/web.ts):
Slash command (src/cli/ui/slash/handlers/web-search-engine.ts — new):
Wiring:
Docs:
Not changed