Support locally hosted website crawler/scraper during search. #10474
lukolszewski
started this conversation in
Feature Requests & Suggestions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, While a simple web retrieval mechanism is already supported by MCP it would be nice if a user had an option of using a local running scraper instead of a paid service like firecrawl and serper during the search as well.
I've implemented such a feature locally using the /md endpoint of Crawl4AI and was going to send a PR, but I saw the contribution guidelines about creating a discussion first. So here I'm creating a discussion about it.
The intention is for the functionality to be exactly the same for the user as serper/firecrawl, the only difference being setting the scraper type to crawl4ai, setting the URL and optionally a key (crawl4ai can be used without auth locally).
A potential enhancement could be to use the /llm endpoint of Crawl4AI which can receive a query that gets processed by an LLM before returning.
The feature requires modification of both agents and LibreChat. I've implemented it in my fork here: https://github.com/lukolszewski/LibreChat/tree/feature/add-crawl4ai-local-scraping and here: https://github.com/lukolszewski/agents/tree/feature/add-crawl4ai-local-scraping
Click to expand - suggestions how to test
Here is how I suggest one could test it (the feature works, but the specific way to run it will depend on environment): - please note you have to build agents locally and I pack it into a tarball (we assume both agents and LibreChat are in subfolders) for example:this will create the tarball which you can then copy to LibreChat
Then you update the reference in package.json in LibreChat
Change: "@librechat/agents": "^3.0.5"
To: "@librechat/agents": "file:librechat-agents-3.0.13.tgz"
and install from tarball:
Then you build normally (If you're rebuilding you may need to do more, we keep the same version and replace the file so hashes may need to be replaced)
To configure this add the following to your .env file (assuming you're running on same docker host, with host network):
Add to librechat.yaml for example(here we use searxng for search):
Then use LibreChat normally
This assumes your Crawl4AI is running locally.
Beta Was this translation helpful? Give feedback.
All reactions