Skip to content

uditjainstjis/Project-Clandestine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

27 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Project Clandestine: AI Information Verification System

๐ŸŽฏ Project Overview

Mission: Building a comprehensive AI information verification system to combat the growing issue of AI-generated misinformation polluting the web and creating feedback loops of false information.

The Problem

AI is slowly poisoning its own well. With 1200+ AI-generated content sites and a dangerous feedback loop where:

  1. AI/LLM gives wrong info โ†’
  2. Someone publishes that info on reputed sites โ†’
  3. Content gets views and engagement โ†’
  4. Next AI models see high-engagement content as "must be true" โ†’
  5. Misinformation becomes a trusted source โ†’ cycle repeats

Our Solution

A verification system that doesn't just fetch information but actively verifies and proves it through trusted sources and cross-referencing.

๐Ÿ—๏ธ System Architecture

Core Flow

User prompt โ†’ Router (classifier) โ†’ Crawler โ†’ Knowledge Base โ†’ LLM (summarizer)

Key Components

1. Parameter Selection System

  • Define what we consider "correct" (research papers, authoritative sources)
  • Domain-specific parameter selection
  • Dynamic trust scoring for sources

2. Verification Pipeline

  • Option A: Use Firecrawler API โ†’ scrape research papers โ†’ LLM parsing
  • Option B: Custom search โ†’ LLM-generated search queries โ†’ web scraping โ†’ relevance filtering

3. Trust Scoring Algorithm

Dynamic Trust Score (0-1 scale)

Factor Description Scoring Rule
Domain Reputation Authority level 0.9 for .edu/.gov; 0.8 for .org
Source Verification Cross-citations +0.05 per citation
Fact-Check Record Misinformation history -0.2 for false reports
Recency Data freshness -0.1 for >2 years old
Consistency Cross-source matching +0.2 for high similarity

Final Score Calculation:

FinalScore = 0.4D + 0.3C + 0.2R + 0.1F

Where: D=Domain trust, C=Cross-source consistency, R=Recency, F=Fact-check verification

Trust Levels:

  • โ‰ฅ0.75: "Trusted"
  • 0.5-0.75: "Partially Trusted"
  • <0.5: "Untrusted"

4. Cross-Verification System

  • Search 3-5 trusted sources for same claim
  • Compare entities, dates, numbers
  • Use NLP techniques (BERT embeddings, RoBERTa-large-mnli)
  • Textual entailment analysis

๐Ÿ“Š Example Verification Flow

User Query: "NASA confirms alien life on Mars."

  1. Router: Classifies as "Science News"
  2. Crawler: Searches NASA, Reuters, BBC Science
  3. Knowledge Base: No official NASA report found
  4. Cross-check: Reuters/BBC report "No official confirmation"
  5. Fact-check API: Snopes lists as false
  6. Result: Trust score = 0.18 โ†’ FAKE

๐ŸŽฏ Precision vs Recall Strategy

High Precision (Minimize False Positives)

  • When false alarms cause serious harm
  • Examples: Reputation damage, legal action, wrongful content takedown

High Recall (Minimize False Negatives)

  • When missing real issues is dangerous
  • Examples: Health misinformation, safety-critical information

๐ŸŒ Blue Tick User System

  • Verified trustworthy users across the internet
  • Training data based on their contributions
  • Continuous knowledge base refinement

๐Ÿ”— Supporting Evidence

Academic Research on AI Content Pollution

Open Letters from Academia

Related Video

๐Ÿ‘ฅ Team & Responsibilities

Current Team

  • Udit: Research Lead - Core problem analysis and solution design
  • Divya: Research - Trust scoring and verification algorithms
  • Rishik: Research - Cross-verification and NLP implementation

๐Ÿ“ Repository Structure

/
โ”œโ”€โ”€ README.md                    # This file
โ”œโ”€โ”€ docs/                        # Documentation
โ”‚   โ”œโ”€โ”€ api-reference.md        # API documentation
โ”‚   โ”œโ”€โ”€ architecture.md         # System architecture details
โ”‚   โ”œโ”€โ”€ contributing.md         # Contribution guidelines
โ”‚   โ””โ”€โ”€ use-cases.md            # Detailed use cases
โ”œโ”€โ”€ research/                    # Research materials
โ”‚   โ”œโ”€โ”€ papers/                 # Academic papers and references
โ”‚   โ”œโ”€โ”€ experiments/            # Research experiments
โ”‚   โ””โ”€โ”€ findings/               # Research findings and notes
โ”œโ”€โ”€ src/                         # Source code
โ”‚   โ”œโ”€โ”€ crawler/                # Web crawling modules
โ”‚   โ”œโ”€โ”€ classifier/             # Content classification
โ”‚   โ”œโ”€โ”€ verifier/               # Verification engine
โ”‚   โ””โ”€โ”€ api/                    # API endpoints
โ”œโ”€โ”€ daily-progress/             # Daily progress tracking
โ”‚   โ”œโ”€โ”€ templates/              # Log templates
โ”‚   โ””โ”€โ”€ logs/                   # Individual daily logs
โ””โ”€โ”€ tests/                      # Test suites
    โ”œโ”€โ”€ unit/                   # Unit tests
    โ””โ”€โ”€ integration/            # Integration tests

๐Ÿš€ Getting Started

Prerequisites

  • Python 3.8+
  • Node.js 16+
  • Docker (optional)

Installation

git clone https://github.com/uditjainstjis/Project-Clandestine.git
cd Project-Clandestine
pip install -r requirements.txt

Quick Start

# Run the verification system
python src/main.py --query "your query here"

# Start development server
npm run dev

๐Ÿ“ˆ Progress Tracking

Phase 1: Research & Foundation (Current)

  • Problem identification and analysis
  • Core architecture design
  • Trust scoring algorithm design
  • Initial prototype development

Phase 2: Development

  • Core verification engine
  • API development
  • Web interface
  • Testing suite

Phase 3: Enhancement

  • Advanced NLP integration
  • Real-time verification
  • Browser extension
  • Mobile app

๐Ÿค Contributing

See CONTRIBUTING.md for detailed contribution guidelines.

Daily Progress Logging

All team members should log daily progress in daily-progress/logs/YYYY-MM-DD-[name].md

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ”ฎ Future Vision

Building the next generation of information verification - a system that doesn't just compete with Perplexity but sets the standard for verified, trustworthy AI-powered information retrieval.


Note: This is an active research project. The system is currently in the research and development phase. Contributions and feedback are welcome!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •