UCSD Library Chatbot - Agentic RAG & API System

A GenAI chatbot designed to support UC San Diego librarians by asynchronously answering student questions through an agentic two-path workflow: RAG retrieval for knowledge-based queries and API integration for catalog searches.

System Overview

Architecture: Next.js web application with intelligent query routing

Two Processing Paths:

RAG Pipeline - LibAnswers FAQ retrieval via Qdrant vector DB + Cohere reranking
API Integration - Query reformulation + Ex Libris Primo VE catalog search + reasoning

Core Technologies: OpenAI (routing & generation), Qdrant (vector DB), Cohere (reranking)

Core Components

Query Router - Routes to RAG or API path
RAG Pipeline - Semantic search + reranking for FAQs
API Module - Reformulates queries for Primo VE
Reasoning Model - Synthesizes catalog results
Response Generator - Assembles final student-facing answers

Data Sources

LibAnswers FAQ Database

Library policies & procedures
Computing resources how-to guides
Billing information
Building details

Primo VE Discovery Platform

Local and consortia library resources

Data Processing

FAQ Preparation

Consolidate question + details fields
Strip HTML from answer
Remove unnecessary fields (owner, short_answer)
Preserve: questions, answers, topics, keywords, links

Primo VE Processing

Reformulate natural language → structured discovery query
Restructure API response (title, author, description, availability)
Pass clean data to reasoning model

Chunking Strategy

Current Approach: No chunking (FAQ entries sufficiently concise)

Fallback Plan: Semantic chunking for entries >700 words (pending validation)

Data Schemas

FAQ Schema (LibAnswers)

interface FAQ {
  faqid: number;
  group_id: number;
  question: string;
  details: string;
  answer: string;
  topics: Array;
  keywords: Array;
  url: {
    public: string;
    admin: string;
  };
  totalhits: number;
  created: string;
  updated: string;
  votes: {
    yes: number;
    no: number;
  };
  links: Array;
  files: Array;
}

Primo VE API Response Schema

interface PrimoVEResponse {
  resDetails: {
    totalResultsLocal: number;
    totalResultsPC: number;
    total: number;
    first: number;
    last: number;
  };
  docInfo: Array;
}

Processed Primo VE Document (for Reasoning Model)

interface ProcessedPrimoDoc {
  title: string;
  type: string;
  publisher?: string;
  subjects: string[];
  contents?: string;
  permalink: string;
  recordId: string;
  hasFullText: boolean;
}

Goal: Reduce librarian workload while delivering accurate, context-aware responses to diverse student inquiries.

Prerequisites

Before getting started, you'll need to set up the following services:

Required API Keys

OpenAI API Key (https://platform.openai.com/api-keys)
- You'll need at least $5 in credits on your OpenAI account
- Used for embeddings, chat completions, routing logic, and reasoning
Qdrant API Key (https://cloud.qdrant.io/)
- Free tier available
- Used for vector database storage and semantic search
Cohere API Key (https://cohere.com/)
- Free tier available
- Used for reranking retrieved FAQ results
Ex Libris Primo VE API Key (https://developers.exlibrisgroup.com/)
- Institutional access required
- Used for catalog search and resource discovery
Springshare LibAnswers API Key (https://springshare.com/libanswers/)
- Institutional access required
- Used for retrieving FAQ data from LibAnswers
Helicone API Key (https://www.helicone.ai/)
- Free tier available
- Used for LLM observability and monitoring

Create a .env file in the root directory with these keys:

OPENAI_API_KEY=your_openai_key_here
QDRANT_API_KEY=your_qdrant_key_here
QDRANT_URL=your_qdrant_cluster_url
COHERE_API_KEY=your_cohere_key_here
PRIMO_VE_API_KEY=your_primo_api_key_here
LIBANSWERS_API_KEY=your_libanswers_key_here
HELICONE_API_KEY=your_helicone_key_here
OPENAI_FINETUNED_MODEL=your_finetuned_model_id (optional)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
app		app
public		public
.gitignore		.gitignore
.nvmrc		.nvmrc
PDQConnectAgent.pkg		PDQConnectAgent.pkg
README.md		README.md
embeddings-cache.json		embeddings-cache.json
eslint.config.mjs		eslint.config.mjs
jest.config.js		jest.config.js
jest.setup.js		jest.setup.js
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCSD Library Chatbot - Agentic RAG & API System

System Overview

Core Components

Data Sources

LibAnswers FAQ Database

Primo VE Discovery Platform

Data Processing

FAQ Preparation

Primo VE Processing

Chunking Strategy

Data Schemas

FAQ Schema (LibAnswers)

Primo VE API Response Schema

Processed Primo VE Document (for Reasoning Model)

Prerequisites

Required API Keys

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UCSD Library Chatbot - Agentic RAG & API System

System Overview

Core Components

Data Sources

LibAnswers FAQ Database

Primo VE Discovery Platform

Data Processing

FAQ Preparation

Primo VE Processing

Chunking Strategy

Data Schemas

FAQ Schema (LibAnswers)

Primo VE API Response Schema

Processed Primo VE Document (for Reasoning Model)

Prerequisites

Required API Keys

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages