Skip to content

Feat/web search engine#338

Merged
esengine merged 3 commits intoesengine:mainfrom
dacec354:feat/web-search-engine
May 7, 2026
Merged

Feat/web search engine#338
esengine merged 3 commits intoesengine:mainfrom
dacec354:feat/web-search-engine

Conversation

@dacec354
Copy link
Copy Markdown
Contributor

@dacec354 dacec354 commented May 6, 2026

Summary

Adds SearXNG as an alternative web_search backend, switchable via /web-search-engine slash command or ~/.reasonix/config.json. Mojeek remains the default.

Background

web_search was hardcoded to scrape Mojeek's HTML. Mojeek works for most setups, but some environments get persistent 403s — including mine.

SearXNG is a self-hosted metasearch engine. Running it locally works for my use case. It doesn't magically avoid blocks (SearXNG can also hit 403s from upstream engines), but having a configurable backend means users can pick what works in their environment. Mojeek stays the default.

Changes by layer

Config (src/config.ts): Two new fields on ReasonixConfig (webSearchEngine and webSearchEndpoint) plus getters.

Search dispatch (src/tools/web.ts):

  • Original webSearch body moved to searchMojeek().
  • New searchSearxng() fetches SearXNG HTML and parses with existing node-html-parser.
  • webSearch() dispatches by opts.engine.
  • Protocol auto-normalization (localhost:8080 → http://...).
  • Unreachable SearXNG → clear install message. Dynamic import re-reads config each call so /web-search-engine takes effect immediately.

Slash command (src/cli/ui/slash/handlers/web-search-engine.ts — new):

  • /web-search-engine shows current engine.
  • Subcommands to switch to mojeek or searxng with optional URL. Persists to ~/.reasonix/config.json.

Wiring:

  • chat.tsx passes config to registerWebTools().
  • dispatch.ts and commands.ts register the handler.

Docs:

  • README has a new Web search section.
  • ARCHITECTURE.md updated.

Not changed

  • web_fetch untouched.
  • Default engine still Mojeek.
  • No new npm dependencies.
  • No MCP, no API keys.

Comment thread src/tools/web.ts Fixed
Comment thread src/tools/web.ts Fixed
Copy link
Copy Markdown
Owner

@esengine esengine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — Mojeek 403s are a real problem for users behind certain networks, so a configurable backend is welcome. Mojeek staying as default means zero risk for the existing path, which is the right shape.

Before this can land, a few things to clean up:

Blockers

  • bun.lock shouldn't be in the diff — repo uses npm (package-lock.json). Looks like you ran bun install locally. Please git rm bun.lock and add it to .gitignore so it doesn't sneak back.

  • CodeQL: 1 high + 1 medium, both on the new searchSearxng:

    • high (js/polynomial-redos, src/tools/web.ts:89) — replace(/\/+$/, "") on a user-controlled endpoint. Practical risk is low (input is short and end-anchored), but the rule fires.
    • medium (js/file-access-to-http, src/tools/web.ts:95) — fetch(url) where url comes from ~/.reasonix/config.json. The threat model isn't really an external attacker (writing the config = local takeover already), but invalid / non-http URLs in the config could behave unpredictably.

    Both fix with one change — swap the manual normalization for the WHATWG URL parser + a protocol allowlist:

    function normalizeSearxngEndpoint(raw: string): string {
      let url: URL;
      try {
        url = new URL(raw.includes("://") ? raw : `http://${raw}`);
      } catch {
        throw new Error(`web_search: invalid SearXNG endpoint "${raw}"`);
      }
      if (url.protocol !== "http:" && url.protocol !== "https:") {
        throw new Error(`web_search: SearXNG endpoint must be http(s), got ${url.protocol}`);
      }
      return url.origin;
    }

    Use it at the top of searchSearxng and reuse baseUrl everywhere. The polynomial regex is gone, and the protocol check acts as a sanitizer that CodeQL recognizes — both alerts should clear.

  • Tests for parseSearxngHtmlResults please. HTML scraping breaks silently when upstream tweaks markup, so this is exactly the shape that pays back tests. Use the existing parseMojeekResults tests as the template — fixture HTML in, expected SearchResult[] out, plus an empty-results case.

Cleanups

  • The dynamic await import("../config.js") inside web_search's fn (around web.ts:410) is unnecessary — the file already has the static import at the top, and readConfig() re-reads from disk on every call, so the static getters give you the same "picks up changes immediately" behavior with no overhead. Drop the dynamic import.

  • /web-search-engine is verbose for a slash. The alias system landed in 0.30.0 yesterday — suggest renaming to /search-engine and declaring a short alias on the spec, something like aliases: ["wse"] or aliases: ["engine"]. See src/cli/ui/slash/commands.ts for examples (/exit has aliases: ["quit", "q"]).

Optional

  • The README pitch "93+ engines including Google, Bing, DuckDuckGo, Brave" advertises SearXNG's upstream coverage rather than what we provide. Consider rewording to something like "SearXNG (self-hosted; aggregates whatever upstream engines your instance is configured for)" — more honest about who's doing the work.

Happy to merge once the CodeQL alerts, bun.lock, and tests are addressed. Thanks again for the contribution.

dacec354 added 2 commits May 7, 2026 16:07
…command

- Add webSearchEngine/webSearchEndpoint to ReasonixConfig (persists to ~/.reasonix/config.json)
- Refactor webSearch() to dispatch between searchMojeek (default) and searchSearxng
- searchSearxng uses HTML format (JSON API often blocked by SearXNG limiter)
- Auto-normalizes endpoint protocol (localhost:8075 → http://localhost:8075)
- SearXNG unreachable → model informs user to install SearXNG
- Add /web-search-engine slash command to switch engines at runtime
- Tool re-reads config on each call so changes take effect immediately
- Add Web search section to README with engine switching and SearXNG setup
- Update ARCHITECTURE.md web.ts entry to note multi-engine support
@dacec354 dacec354 force-pushed the feat/web-search-engine branch 2 times, most recently from 582056f to adaf412 Compare May 7, 2026 08:33
@dacec354 dacec354 force-pushed the feat/web-search-engine branch from adaf412 to b78a586 Compare May 7, 2026 08:33
@dacec354
Copy link
Copy Markdown
Contributor Author

dacec354 commented May 7, 2026

thanks for your review. I have modify the branch. pls review later.

Copy link
Copy Markdown
Owner

@esengine esengine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every blocker addressed cleanly — the normalizeSearxngEndpoint shape, the parseSearxngHtmlResults tests, the dynamic-import removal, the slash rename. CodeQL is green. Merging now. Thanks for sticking with the review and for a well-scoped first contribution.

@esengine esengine merged commit 12998d2 into esengine:main May 7, 2026
3 checks passed
@dacec354 dacec354 deleted the feat/web-search-engine branch May 7, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants