Add RAG Q&A system evaluation use case example by yttrium400 · Pull Request #750 · uptrain-ai/uptrain

yttrium400 · 2025-12-06T13:09:10Z

Pull Request Template

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Description

This PR adds a comprehensive use case example demonstrating how to evaluate a RAG-based question-answering system using UpTrain.

The notebook provides a complete, end-to-end workflow showing developers how to:

Evaluate context quality (relevance and conciseness)
Assess response quality (completeness, relevance, conciseness, consistency)
Verify factual accuracy to catch hallucinations
Implement safety checks (prompt injection and jailbreak detection)
Analyze results and identify issues

The example uses a realistic scenario: a Python programming Q&A system that retrieves documentation and generates responses. It includes intentionally problematic examples (e.g., irrelevant context, hallucinated responses) to demonstrate how UpTrain's evaluations catch these issues.

Fixes #522

Checklist

I have read the CONTRIBUTING document.
My code follows the code style (BLACK) of this project.
I have added tests to cover my changes.
All new and existing tests passed.
I have updated the documentation accordingly.
I have added an appropriate CHANGELOG entry.

Author's Note

This is my first contribution to UpTrain! I created this use case example to help newcomers understand how to evaluate RAG systems comprehensively. The notebook:

Files added:

examples/use_cases/rag_qa_system_evaluation.ipynb - Complete RAG evaluation workflow
examples/use_cases/README.md - Documentation for the use_cases directory

Key features:

Real-world scenario with Python Q&A system
Sample data with both good and problematic examples
Multiple evaluation types (9 different checks demonstrated)
Analysis section showing how to identify issues
Clear structure following existing UpTrain examples

Note on testing:
The notebook doesn't include automated tests as it's an example/tutorial. It follows the pattern of other notebooks in the examples/ directory which are meant to be run interactively.

Note on CHANGELOG:
I noticed the repository doesn't have a CHANGELOG file, so I haven't added an entry there.

Looking forward to feedback!

Closes uptrain-ai#522 - Created a comprehensive notebook showing how to evaluate a RAG-based Q&A system with multiple checks and analysis.

Use JailbreakDetection() class instead of non-existent Evals constant and correct the score field name to score_jailbreak_attempted

Swastik Lohchab added 2 commits December 6, 2025 22:54

Add RAG Q&A system use case example

8f48653

Closes uptrain-ai#522 - Created a comprehensive notebook showing how to evaluate a RAG-based Q&A system with multiple checks and analysis.

Fix jailbreak detection usage

42e0dbe

Use JailbreakDetection() class instead of non-existent Evals constant and correct the score field name to score_jailbreak_attempted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RAG Q&A system evaluation use case example#750

Add RAG Q&A system evaluation use case example#750
yttrium400 wants to merge 2 commits intouptrain-ai:mainfrom
yttrium400:add-rag-qa-use-case-example

yttrium400 commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yttrium400 commented Dec 6, 2025

Pull Request Template

Type of Change

Description

Checklist

Author's Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant