🔎 Research Agent v6.0

A lightweight, intelligent research assistant that fetches, ranks, and explains recent AI papers from arXiv.

It runs fully in Streamlit and acts as a "Universal Brain" for research—you can plug in OpenAI, Google Gemini, or Groq to handle the reasoning, or run it entirely offline with a Free Local Model.

It produces ranked tables, citation impact scores, and "Plain English" summaries for the top papers.

🌐 Try the Live App: https://research-aiagent.streamlit.app/

⚾ The "Moneyball" Ranking Engine

We have completely overhauled how papers are ranked. We stopped asking LLMs to guess "Will this paper be famous?" (which turned out to be unreliable) and built a Moneyball Algorithm.

Hard Data (84% Weight): The agent now queries Semantic Scholar to analyze the "Author Velocity" (recent citation momentum) of the paper's authors.
Soft Data (16% Weight): The LLM analyzes the abstract for "Market Fit" (Is this a trending topic?) and "Novelty."

The Result: A 6x increase in precision@10, meaning the papers at the top of your list are statistically much more likely to be influential.

Note on New Papers

Papers less than 5 days old often lack citation data in Semantic Scholar. These are marked as "Too new for impact score" and ranked purely by their relevance to your query.

⚡ Groq Integration (Llama 3.3)

You can now use Groq as your intelligence provider. This allows you to run high-performance open-source models (like Llama 3.3 70B) with blazing fast inference speeds and free API access.

Note: In Groq mode, the agent uses local embeddings for search (all-MiniLM-L6-v2) and sends only the filtered candidates to Groq for analysis.

🎯 Features

Semantic Search: Finds papers conceptually related to your query, not just keyword matches.
Plain English Summaries: Translates academic jargon into clear bullet points.
"Moneyball" Impact Scores: Predicts 1-year citation impact using real author data.
Multi-Provider Support: OpenAI, Gemini, Groq, or Local (No API Key).
Export to ZIP: Download all data (JSONs, Markdown report) for offline use.

🗂️ Categories Supported

We capture all major Computer Science subcategories. You can select specific ones to narrow your search:

Artificial Intelligence (cs.AI)
Machine Learning (cs.LG)
Human-Computer Interaction (cs.HC)
Computation and Language (cs.CL)
Computer Vision (cs.CV)
Robotics (cs.RO)
Information Retrieval (cs.IR)
Neural and Evolutionary Computing (cs.NE)
Software Engineering (cs.SE)
Cryptography and Security (cs.CR)
Data Structures and Algorithms (cs.DS)
Databases (cs.DB)
Social and Information Networks (cs.SI)
Multimedia (cs.MM)
Information Theory (cs.IT)
Performance (cs.PF)
Multiagent Systems (cs.MA)

📦 Installation (Local)

1. Clone the repository:

git clone https://github.com/nurtekinsavasai/arxiv-ai-agent-v2.git
cd arxiv-ai-agent-v2

2. Create a virtual environment:

python3 -m venv .venv
source .venv/bin/activate
# Windows: .venv\Scripts\activate

3. Install dependencies:

pip install -r requirements.txt
# Ensure you install groq if it wasn't automatically picked up
pip install groq

4. Run the app:

streamlit run app.py

Your browser will open automatically at http://localhost:8501.

🛠️ For Developers

app.py: The entire application logic resides here. It is a single-file Streamlit app designed for portability.
Modifying Prompt Logic: Look for the Moneyball scoring functions and classify_papers_with_llm in app.py to change how the AI judges papers.
Adding Providers: The code uses a LLMConfig dataclass. To add a new provider (e.g., Anthropic), add the client initialization in call_llm and the specific embedding logic in select_embedding_candidates.

📊 Comparison Table

Option	Requirements	Best For
OpenAI	API Key	Highest quality summaries and narrative analysis. (GPT-5.2, etc.)
Gemini	API Key (Google AI Studio)	Speed and large context windows. (Gemini 3 Pro, etc.)
Groq	API Key (Free)	Blazing fast open-source models (Llama 3.3).
Free Local	None	Offline usage, zero cost. Uses heuristics instead of LLM analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.streamlit		.streamlit
data_pipeline		data_pipeline
docs		docs
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
retrain_brain.md		retrain_brain.md
train_and_deploy.py		train_and_deploy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔎 Research Agent v6.0

⚾ The "Moneyball" Ranking Engine

Note on New Papers

⚡ Groq Integration (Llama 3.3)

🎯 Features

🗂️ Categories Supported

📦 Installation (Local)

1. Clone the repository:

2. Create a virtual environment:

3. Install dependencies:

4. Run the app:

🛠️ For Developers

📊 Comparison Table

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔎 Research Agent v6.0

⚾ The "Moneyball" Ranking Engine

Note on New Papers

⚡ Groq Integration (Llama 3.3)

🎯 Features

🗂️ Categories Supported

📦 Installation (Local)

1. Clone the repository:

2. Create a virtual environment:

3. Install dependencies:

4. Run the app:

🛠️ For Developers

📊 Comparison Table

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages