Skip to content

SEMalytics/ai-source-hygiene

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AI Source Hygiene

Your AI research tools are pulling from AI-generated garbage. Here's how to fix it.

The Problem

AI assistants with web search are surfacing AI-generated "encyclopedias" that have documented manipulation, conspiracy promotion, and ideological bias. These sources poison your research without you knowing it. The owner of the content often also controls the AI surfacing it—a closed loop of self-citation.

Quick Fix (30 seconds)

Claude

Paste into User Preferences → Settings → User Preferences:

Never cite these sources (documented reliability issues):
infowars.com, grokipedia.com, vdare.com, thegatewaypundit.com,
revolver.news, infogalactic.com, conservapedia.com, rt.com,
sputniknews.com, oann.com, zerohedge.com, theepochtimes.com,
breitbart.com, telesurtv.net, presstv.ir, thegrayzone.com,
mintpressnews.com, globalresearch.ca, newsmax.com, dailymail.co.uk,
dailycaller.com, dailywire.com, theblaze.com, occupydemocrats.com,
palmerreport.com, bipartisanreport.com, dailykos.com

Note when excluding sources. Prefer Wikipedia, academic sources, primary sources.

ChatGPT

Paste into Custom Instructions → "How would you like ChatGPT to respond?":

Never cite these sources: infowars.com, grokipedia.com, vdare.com,
thegatewaypundit.com, revolver.news, infogalactic.com, conservapedia.com,
rt.com, sputniknews.com, oann.com, zerohedge.com, theepochtimes.com,
breitbart.com, telesurtv.net, presstv.ir, thegrayzone.com,
mintpressnews.com, globalresearch.ca, newsmax.com, dailymail.co.uk,
dailycaller.com, dailywire.com, theblaze.com, occupydemocrats.com,
palmerreport.com, bipartisanreport.com, dailykos.com

Skip in search results. Note when excluding unreliable sources.

Perplexity

Prefix your queries with:

Exclude: infowars.com, grokipedia.com, vdare.com, thegatewaypundit.com, rt.com, sputniknews.com, oann.com, breitbart.com, newsmax.com, dailymail.co.uk (documented reliability issues).

[Your question here]

That's it. For complete configuration, see platform-specific guides.

What We Block (and Why)

27 sources across 3 tiers, based on legal judgments, platform bans, FARA registrations, and Wikipedia deprecation. Sources blocked regardless of political leaning—same criteria applied to all.

Tier Sources Criteria
1: Demonstrably Harmful InfoWars, VDare, Gateway Pundit, Revolver News, Grokipedia, Infogalactic, Conservapedia $1.5B+ legal judgments, hate group designations, platform bans
2: Propaganda/State Media RT, Sputnik, TeleSUR, Press TV, OAN, Zero Hedge, Epoch Times, Breitbart, The Grayzone, MintPress News, Global Research FARA registrations, EU/US sanctions, state ownership, NATO/State Dept flagged
3: Highly Partisan Newsmax, Daily Mail, Daily Caller, Daily Wire, The Blaze, Occupy Democrats, Palmer Report, Bipartisan Report, Daily Kos Wikipedia deprecated, defamation settlements, low credibility ratings

Full blocklist with evidence →

Machine-readable version: blocklist/sources.yaml

Platform Guides

Platform Instructions
Claude platforms/claude.md
ChatGPT platforms/chatgpt.md
Perplexity platforms/perplexity.md
Microsoft Copilot platforms/copilot.md
Google Gemini platforms/gemini.md
Grok platforms/grok.md ⚠️ see conflict of interest note
Any AI platforms/generic.md

Contributing

Found a bad source? Have a fix for another platform?

See CONTRIBUTING.md for standards.

FAQ

Isn't this censorship?

No. It's source verification—standard practice in journalism, academia, and professional research. You're free to use whatever sources you want. This helps you avoid sources with documented reliability issues.

Why not block all AI-generated content?

Because not all AI content is problematic. The issue is AI content without editorial oversight, with documented bias, or with conflicts of interest (like the owner controlling both content creation AND the AI that surfaces it).

This seems political.

Source quality isn't political. A source that promotes conspiracy theories is unreliable regardless of which conspiracy theories. A source controlled by a single individual with editorial intervention is risky regardless of their politics. We apply the same standards to everyone—see our blocking criteria.

Why these specific sources?

These were the first documented cases of AI-generated or ideologically-manipulated encyclopedias being surfaced by AI research tools. The blocklist grows based on evidence—report sources that meet our criteria.

License

CC0 / Public Domain

Copy it, adapt it, share it. Information hygiene over attribution.

About

Block AI-generated garbage sources from your research. Copy-paste fixes for Claude, ChatGPT, Perplexity. Community-maintained blocklist.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors