Built in 8 hours at a hackathon, StoryScape transforms any PDF book into a living adventure with AI-generated visuals, adaptive soundscapes, and consistent characters.
Instead of reading static text, you can walk alongside characters, see their world unfold, and hear the emotions of every scene.
- PDF Input → Upload any book in PDF format.
- Metadata Extraction (Gemini) → Detects characters, sections, and key story events.
- Visual Consistency (Nano Banana) →
- Generates base images for each character once.
- Reuses them for consistency across all visuals.
- Adaptive Soundscapes (ElevenLabs) → Short audio tracks matched to the mood of each scene.
- Fast and Simple → Single Python script, no complex infra, minimal API calls.
- Lightweight Storage → JSON + local images/audio files.
- google-genai → Gemini for text analysis and placement extraction
- Nano Banana (Gemini Flash 2.5 image generation) → Character portraits and consistent visuals
- ElevenLabs → Dynamic audio/music generation
- PyPDF2 → PDF parsing
- requests → API calls
- python-dotenv → API key management
StoryScape/ ┣ script.py # Main system script (all logic in one file) ┣ outputs/ # Generated images and audio ┃ ┣ base_characters/ # Reference portraits ┃ ┣ visuals/ # Scene visuals ┃ ┗ music/ # Scene-specific audio ┣ characters.json # Extracted characters + references ┣ sections.json # Extracted sections + summaries ┗ requirements.txt # Dependencies
- Extract PDF text → Page-wise text stored in memory
- Metadata Extraction (Gemini) → Characters and sections in JSON
- Base Image Generation (Nano Banana) → Consistent character portraits
- Placement Detection (Gemini) → Where to add visuals and music
- Scene Visuals (Nano Banana) → Images using base portraits
- Music Generation (ElevenLabs) → Scene-specific audio tracks
- Compile Outputs → Per page: text, visuals, and music in structured JSON
- Clone repo and install dependencies:
Add your API keys to .env:
- GEMINI_API_KEY=your_key_here
- NANO_BANANA_API_KEY=your_key_here
- ELEVENLABS_API_KEY=your_key_here
- Run the script with your PDF: python script.py --pdf your_book.pdf
Check outputs/ for generated images, audio, and the final compiled.json.
- google-genai==1.33.0
- PyPDF2==3.0.1
- requests==2.32.5
- python-dotenv==1.1.1
Hackathon Philosophy
- All logic in one file (script.py)
- Minimal steps, end-to-end in a single run
- API calls only where needed
- Focused on being demo-ready in under 48 hours