An AI-powered tutor for Python YouTube videos — ask, watch, learn, and code.
An interactive chatbot that lets you ask natural language questions and searches for answers in youtube videos — powered by LangChain, OpenAI, and ChromaDB. Specialised in Youtube videos for learning python programming.
- 🔗 Paste a YouTube video URL about Python coding and embed it directly in the app
- 🧠 Ask questions about the content using natural language
- 📖 Vector search over transcript chunks with timestamp + chapter metadata
- 📺 Plays part of the video that answers the question
- 📟 Additional explanations and coding challenge
- 🤖 Powered by Llama3 8B + LangChain RetrievalQA
📊 View Project Presentation Slides
| Component | Description |
|---|---|
| Streamlit | Frontend UI for chat, video player, and code interaction |
| LangChain | Retrieval-Augmented Generation (RAG) orchestration |
| LangSmith | Tracing and debugging of LLM chains and prompts |
| ChatGroq (LLaMA3-8B) | LLM used for answering questions and generating challenges |
| OpenAIEmbeddings | Converts transcript chunks into vector representations |
| ChromaDB | Local vector database for storing per-video embeddings |
| pytubefix | Downloads captions and extracts video metadata |
| GPT-4 (optional) | Evaluates the quality of LLaMA3 responses post-hoc |
-
Clone the repo
git clone https://github.com/KJanzon/youtube-qa-chatbot.git cd youtube-qa-chatbot -
Set up virtual environment
python -m venv venv source venv/bin/activate # or .\venv\Scripts\activate on Windows
-
Install dependencies
pip install -r requirements.txt
-
Add your API key Create a
.envfile with:OPENAI_API_KEY=your_openai_key_here LANGCHAIN_API_KEY HUGGINGFACEHUB_API_TOKEN GROQ_API_KEY -
Run the app
streamlit run interfaces/streamlit_chat.py
├── app/ # Video processing + transcript embedding
├── data/ # Downloaded caption files (.srt)
├── interfaces/ # Streamlit front-end
├── utils/ # Helpers (e.g., clean_srt, time utils, chapter ranker)
├── vectorstore/ # ChromaDB persistent store (per-video)
├── .env # API key config (excluded from Git)
- Multi-video querying (cross-video RAG)
- Reference official python tutorial to ensure correct answers (https://docs.python.org/3/tutorial/index.html)