Skip to content

Add RAG Q&A system evaluation use case example#750

Open
yttrium400 wants to merge 2 commits intouptrain-ai:mainfrom
yttrium400:add-rag-qa-use-case-example
Open

Add RAG Q&A system evaluation use case example#750
yttrium400 wants to merge 2 commits intouptrain-ai:mainfrom
yttrium400:add-rag-qa-use-case-example

Conversation

@yttrium400
Copy link
Copy Markdown

Pull Request Template

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Description

This PR adds a comprehensive use case example demonstrating how to evaluate a RAG-based question-answering system using UpTrain.

The notebook provides a complete, end-to-end workflow showing developers how to:

  • Evaluate context quality (relevance and conciseness)
  • Assess response quality (completeness, relevance, conciseness, consistency)
  • Verify factual accuracy to catch hallucinations
  • Implement safety checks (prompt injection and jailbreak detection)
  • Analyze results and identify issues

The example uses a realistic scenario: a Python programming Q&A system that retrieves documentation and generates responses. It includes intentionally problematic examples (e.g., irrelevant context, hallucinated responses) to demonstrate how UpTrain's evaluations catch these issues.

Fixes #522

Checklist

  • I have read the CONTRIBUTING document.
  • My code follows the code style (BLACK) of this project.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • I have updated the documentation accordingly.
  • I have added an appropriate CHANGELOG entry.

Author's Note

This is my first contribution to UpTrain! I created this use case example to help newcomers understand how to evaluate RAG systems comprehensively. The notebook:

Files added:

  • examples/use_cases/rag_qa_system_evaluation.ipynb - Complete RAG evaluation workflow
  • examples/use_cases/README.md - Documentation for the use_cases directory

Key features:

  • Real-world scenario with Python Q&A system
  • Sample data with both good and problematic examples
  • Multiple evaluation types (9 different checks demonstrated)
  • Analysis section showing how to identify issues
  • Clear structure following existing UpTrain examples

Note on testing:
The notebook doesn't include automated tests as it's an example/tutorial. It follows the pattern of other notebooks in the examples/ directory which are meant to be run interactively.

Note on CHANGELOG:
I noticed the repository doesn't have a CHANGELOG file, so I haven't added an entry there.

Looking forward to feedback!

Swastik Lohchab added 2 commits December 6, 2025 22:54
Closes uptrain-ai#522 - Created a comprehensive notebook showing how to evaluate
a RAG-based Q&A system with multiple checks and analysis.
Use JailbreakDetection() class instead of non-existent Evals constant
and correct the score field name to score_jailbreak_attempted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a Use Case Example

1 participant