Skip to content

Conversation

laetitia-wilhelm
Copy link
Contributor

Websearch on RAG output

Input:
Output file from the RAG model

Output:

  • Original query
  • Summary of the RAG output
  • Brief answer to the query
  • Detailed answer

How to Run

python -m mmore websearch --config-file examples/websearch/config.yaml

Key Parameters

  • n_loops: Number of search loops
  • max_searches: Maximum number of sources retrieved per web search

Pipeline Overview

  1. Load Input Data

    • Extract the original query and the initial answer generated by the RAG model.
  2. Generate Initial Summary

    • Summarize the RAG answer with respect to the original query.
  3. Iterative Search and Analysis (Repeated n_loops times)

    • Generate Search Query:
      Formulate a refined search query by combining the original query, current knowledge, and previous findings (if any).
    • Perform Web Search:
      Retrieve relevant web results using DuckDuckGo.
    • Analyze Search Results:
      Use the language model (LLM) to integrate new web information with existing knowledge, updating the summary accordingly.
    • Update the current knowledge and previous analysis for the next iteration.
  4. Save Final Results

    • Store the final combined summary derived from both web search and RAG output.

Copy link
Collaborator

@fabnemEPFL fabnemEPFL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far, sounds promising. Some changes needed.
I have to go so there will be a follow-up review later today

Copy link
Collaborator

@fabnemEPFL fabnemEPFL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional comments

Copy link
Collaborator

@fabnemEPFL fabnemEPFL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make soon the changes related to the few additional comments I added

n_loops: int = 2
max_searches: int = 10
llm_config: Dict[str, Any] = field(
default_factory=lambda: {"llm_name": "gpt-4", "max_new_tokens": 1200}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a field of type LLMConfig

@fabnemEPFL fabnemEPFL merged commit 9d0af5c into swiss-ai:master Sep 4, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants