Skip to content

πŸ”₯ AI-powered deep research tool that breaks down complex queries, validates answers, and provides cited comprehensive results using Firecrawl and LangGraph

Notifications You must be signed in to change notification settings

mendableai/firesearch

Repository files navigation

Firesearch - AI-Powered Deep Research Tool

Firesearch Demo

Comprehensive web research powered by Firecrawl and LangGraph

Technologies

  • Firecrawl: Multi-source web content extraction
  • OpenAI GPT-4o: Search planning and follow-up generation
  • Next.js 15: Modern React framework with App Router

Deploy with Vercel

Setup

Required API Keys

Service Purpose Get Key
Firecrawl Web scraping and content extraction firecrawl.dev/app/api-keys
OpenAI Search planning and summarization platform.openai.com/api-keys

Quick Start

  1. Clone this repository
  2. Create a .env.local file with your API keys:
    FIRECRAWL_API_KEY=your_firecrawl_key
    OPENAI_API_KEY=your_openai_key
    
  3. Install dependencies: npm install or yarn install
  4. Run the development server: npm run dev or yarn dev

How It Works

Architecture Overview

flowchart TB
    Query["'Compare Samsung Galaxy S25<br/>and iPhone 16'"]:::query
    
    Query --> Break
    
    Break["πŸ” Break into Sub-Questions"]:::primary
    
    subgraph SubQ["🌐 Search Queries"]
        S1["iPhone 16 Pro specs features"]:::search
        S2["Samsung Galaxy S25 Ultra specs"]:::search
        S3["iPhone 16 vs Galaxy S25 comparison"]:::search
    end
    
    Break --> SubQ
    
    subgraph FC["πŸ”₯ Firecrawl API Calls"]
        FC1["Firecrawl /search API<br/>Query 1"]:::firecrawl
        FC2["Firecrawl /search API<br/>Query 2"]:::firecrawl
        FC3["Firecrawl /search API<br/>Query 3"]:::firecrawl
    end
    
    S1 --> FC1
    S2 --> FC2
    S3 --> FC3
    
    subgraph Sources["πŸ“„ Sources Found"]
        R1["Apple.com βœ“<br/>The Verge βœ“<br/>CNET βœ“"]:::source
        R2["GSMArena βœ“<br/>TechRadar βœ“<br/>Samsung.com βœ“"]:::source
        R3["AndroidAuth βœ“<br/>TomsGuide βœ“"]:::source
    end
    
    FC1 --> R1
    FC2 --> R2
    FC3 --> R3
    
    subgraph Valid["βœ… Answer Validation"]
        V1["iPhone 16 specs βœ“ (0.95)"]:::good
        V2["S25 specs βœ“ (0.9)"]:::good
        V3["S25 price ❌ (0.3)"]:::bad
    end
    
    Sources --> Valid
    
    Valid --> Retry
    
    Retry{"Need info:<br/>S25 pricing?"}:::check
    
    subgraph Strat["🧠 Alternative Strategy"]
        Original["Original: 'Galaxy S25 price'<br/>❌ No specific pricing found"]:::bad
        NewTerms["Try: 'Galaxy S25 MSRP cost'<br/>'Samsung S25 pricing leak'<br/>'S25 vs S24 price comparison'"]:::strategy
    end
    
    Retry -->|Yes| Strat
    
    subgraph Retry2["πŸ”„ Retry Searches"]
        Alt1["Galaxy S25 MSRP retail"]:::search
        Alt2["Samsung S25 pricing leak"]:::search
        Alt3["S25 vs S24 price comparison"]:::search
    end
    
    Strat --> Retry2
    
    subgraph FC2G["πŸ”₯ Retry API Calls"]
        FC4["Firecrawl /search API<br/>Alt Query 1"]:::firecrawl
        FC5["Firecrawl /search API<br/>Alt Query 2"]:::firecrawl
        FC6["Firecrawl /search API<br/>Alt Query 3"]:::firecrawl
    end
    
    Alt1 --> FC4
    Alt2 --> FC5
    Alt3 --> FC6
    
    Results2["SamMobile βœ“ ($899 leak)<br/>9to5Google βœ“ ($100 more)<br/>PhoneArena βœ“ ($899)"]:::source
    
    FC4 --> Results2
    FC5 --> Results2
    FC6 --> Results2
    
    Final["All answers found βœ“<br/>S25 price: $899"]:::good
    
    Results2 --> Final
    
    Synthesis["LLM synthesizes response"]:::synthesis
    
    Final --> Synthesis
    
    FollowUp["Generate follow-up questions"]:::primary
    
    Synthesis --> FollowUp
    
    Citations["List citations [1-10]"]:::primary
    
    FollowUp --> Citations
    
    Answer["Complete response delivered"]:::answer
    
    Citations --> Answer
    
    %% No path - skip retry and go straight to synthesis
    Retry -->|No| Synthesis
    
    classDef query fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff
    classDef subq fill:#ffd4b3,stroke:#ff6b1a,stroke-width:1px,color:#333
    classDef search fill:#ff8c42,stroke:#ff6b1a,stroke-width:2px,color:#fff
    classDef source fill:#3a4a5c,stroke:#2c3a47,stroke-width:2px,color:#fff
    classDef check fill:#ffeb3b,stroke:#fbc02d,stroke-width:2px,color:#333
    classDef good fill:#4caf50,stroke:#388e3c,stroke-width:2px,color:#fff
    classDef bad fill:#f44336,stroke:#d32f2f,stroke-width:2px,color:#fff
    classDef strategy fill:#9c27b0,stroke:#7b1fa2,stroke-width:2px,color:#fff
    classDef synthesis fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff
    classDef answer fill:#3a4a5c,stroke:#2c3a47,stroke-width:3px,color:#fff
    classDef firecrawl fill:#ff6b1a,stroke:#ff4500,stroke-width:3px,color:#fff
    classDef label fill:none,stroke:none,color:#666,font-weight:bold
Loading

Process Flow

  1. Break Down - Complex queries split into focused sub-questions
  2. Search - Multiple searches via Firecrawl API for comprehensive coverage
  3. Extract - Markdown content extracted from web sources
  4. Validate - Check if sources actually answer the questions (0.7+ confidence)
  5. Retry - Alternative search terms for unanswered questions (max 2 attempts)
  6. Synthesize - GPT-4o combines findings into cited answer

Key Features

  • Smart Search - Breaks complex queries into multiple focused searches
  • Answer Validation - Verifies sources contain actual answers (0.7+ confidence)
  • Auto-Retry - Alternative search terms for unanswered questions
  • Real-time Progress - Live updates as searches complete
  • Full Citations - Every fact linked to its source
  • Context Memory - Follow-up questions maintain conversation context

Configuration

Customize search behavior by modifying lib/config.ts:

export const SEARCH_CONFIG = {
  // Search Settings
  MAX_SEARCH_QUERIES: 12,        // Maximum number of search queries to generate
  MAX_SOURCES_PER_SEARCH: 4,     // Maximum sources to return per search query
  MAX_SOURCES_TO_SCRAPE: 3,      // Maximum sources to scrape for additional content
  
  // Content Processing
  MIN_CONTENT_LENGTH: 100,       // Minimum content length to consider valid
  SUMMARY_CHAR_LIMIT: 100,       // Character limit for source summaries
  
  // Retry Logic
  MAX_RETRIES: 2,                // Maximum retry attempts for failed operations
  MAX_SEARCH_ATTEMPTS: 2,        // Maximum attempts to find answers via search
  MIN_ANSWER_CONFIDENCE: 0.7,    // Minimum confidence (0-1) that a question was answered
  
  // Timeouts
  SCRAPE_TIMEOUT: 15000,         // Timeout for scraping operations (ms)
} as const;

Firecrawl API Integration

Firesearch leverages Firecrawl's powerful /search endpoint:

/search - Web Search with Content

  • Purpose: Finds relevant URLs AND extracts markdown content in one call
  • Usage: Each decomposed query is sent to find 6-8 relevant sources with content
  • Response: Returns URLs with titles, snippets, AND full markdown content
  • Key Feature: The scrapeOptions parameter enables content extraction during search
  • Example:
    POST /search
    {
      "query": "iPhone 16 specs pricing",
      "limit": 8,
      "scrapeOptions": {
        "formats": ["markdown"]
      }
    }
    

Search Strategies

When initial results are insufficient, the system automatically tries:

  • Broaden Keywords: Removes specific terms for wider results
  • Narrow Focus: Adds specific terms to target missing aspects
  • Synonyms: Uses alternative terms and phrases
  • Rephrase: Completely reformulates the query
  • Decompose: Breaks complex queries into sub-questions
  • Academic: Adds scholarly terms for research-oriented results
  • Practical: Focuses on tutorials and how-to guides

Example Queries

  • "Who are the founders of Firecrawl?"
  • "When did NVIDIA release the RTX 4080 Super?"
  • "Compare the latest iPhone, Samsung Galaxy, and Google Pixel flagship features"

License

MIT License

About

πŸ”₯ AI-powered deep research tool that breaks down complex queries, validates answers, and provides cited comprehensive results using Firecrawl and LangGraph

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published