This project aims to generate structured PDF reports from podcast interviews, highlighting key takeaways, quotes, and insights. The goal is to create shareable and accessible summaries for a broader audience.
- Summarization using LLMs
- Search & Retrieval
- PDF Report Generation
- Web UI (Streamlit) for user interaction
- Clone the repository:
git clone https://github.com/DataTalksClub/podcast-summary-generation.git cd podcast-summaries
- Install dependencies:
pip install -r requirements.txt
Before running the application, ensure you have your OPEN_API_KEY
and GROK_API_KEY
configured.
You can do this using either of the following methods:
Create a .env
file in the root of your project with the following content:
OPEN_API_KEY = sample-value-here
GROK_API_KEY = sample-value-here
Create a file named secrets.toml
inside the .streamlit
folder with the following content:
OPEN_API_KEY = sample-value-here
GROK_API_KEY = sample-value-here
To run the application, do the following:
# Start the backend services (if needed)
docker-compose up -d
# Generate a podcast summary using the OpenAI API key.
python main_openai.py --input episode.md --output episode_summary_openai.md
# Run the Streamlit application
python run_streamlit_app.py
- LLM Processing → Summarization, Extracting Key Insights
- Storage & Retrieval → Search Engine (ElasticSearch/In-memory DB)
- PDF Generation → Formatted Report
- Web UI → User Interaction & Downloads
- Open an issue before working on any feature.
- Use feature branches for development.
- Submit PRs with at least 2 approvals before merging.
🚀 Let's build something great together! 🎙️📄