RAG Feeder

Monorepo for PDF ingestion, bibliography extraction, metadata enrichment, and download queueing.

Repository Layout

frontend/: Svelte + Vite app
backend/: Node.js API and orchestration routes
backend/scripts/daemon/worker.py: queue consumer daemon
dl_lit_project/: canonical Python pipeline package (dl_lit)
dl_lit/: legacy scripts (not canonical runtime)

Current Runtime Architecture

The app is DB-first and queue-first.

Backend writes jobs to pipeline_jobs in dl_lit_project/data/literature.db.
rag_feeder_worker polls pipeline_jobs and executes jobs.
Worker writes completion/failure payloads back to pipeline_jobs.result_json.

Supported daemon job types in current code:

enrich
download
pipeline_tick (mark -> enrich -> download)

Data Model

Primary runtime tables:

works: canonical work records, including metadata/download status and file info
corpus_works: corpus membership join table

Status lives directly on works:

metadata_status: pending | in_progress | matched | failed
download_status: not_requested | queued | in_progress | downloaded | failed

Services (docker compose)

rag_feeder_frontend on http://localhost:5175
rag_feeder_backend on http://localhost:4000
rag_feeder_worker (no HTTP port)

Quick Start

Set .env values (at least GOOGLE_API_KEY, optional OPENALEX_API_KEY).
Start stack:
- docker compose up -d
Open:
- http://localhost:5175

The production site does not hot reload. Rebuild the frontend when you want changes on the live stack.

Live Deploys Behind Caddy

Do not patch tracked compose files on the server. Keep the live host and Caddy labels in an untracked docker-compose.override.yml instead so a git pull cannot wipe them.

Add the live host to .env:
- RAG_FEEDER_PUBLIC_HOST=corpus4uol.university-of-labour.de
- RAG_FEEDER_PROXY_NETWORK=reverse_proxy
Create a local override once:
- cp docker-compose.override.example.yml docker-compose.override.yml
Deploy normally after pulls:
- docker compose up -d

docker compose loads docker-compose.override.yml automatically, so the frontend keeps its caddy labels and host overrides without requiring a special command.

Paths You Actually Use

SQLite DB: dl_lit_project/data/literature.db
Uploaded PDFs inside container: /usr/src/app/uploads
Upload volume: rag_feeder_uploads
Logs volume: rag_feeder_logs
Pipeline log file: /usr/src/app/logs/backend-pipeline.log

Important Implementation Note

/api/ingest/process-marked, /api/downloads/worker/start, and /api/downloads/worker/run-once queue real jobs immediately.

/api/pipeline/worker/start and /api/pipeline/worker/pause currently update in-memory API state; continuous interval scheduling is transitional in current implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
backend		backend
dl_lit		dl_lit
dl_lit_project		dl_lit_project
dl_topics_keywords		dl_topics_keywords
frontend-legacy		frontend-legacy
frontend		frontend
ocr		ocr
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.windsurfrules		.windsurfrules
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
Pflichtenheft_Korpus_Builder.md		Pflichtenheft_Korpus_Builder.md
README.md		README.md
architecture_refactor_plan.md		architecture_refactor_plan.md
cline.json		cline.json
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.override.example.yml		docker-compose.override.example.yml
docker-compose.proxy.yml		docker-compose.proxy.yml
docker-compose.yml		docker-compose.yml
get_bibs		get_bibs
phase1_plan.md		phase1_plan.md
requirements.txt		requirements.txt
seed_corpus_refactor_plan.md		seed_corpus_refactor_plan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Feeder

Repository Layout

Current Runtime Architecture

Data Model

Services (docker compose)

Quick Start

Live Deploys Behind Caddy

Paths You Actually Use

Important Implementation Note

Read More

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Feeder

Repository Layout

Current Runtime Architecture

Data Model

Services (docker compose)

Quick Start

Live Deploys Behind Caddy

Paths You Actually Use

Important Implementation Note

Read More

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages