AI-powered children's book and movie production pipeline.
FableFlow turns a one-page input spec (characters, theme, learning objectives) into a complete book and animated movie. Every artifact is generated by real models — there are no placeholders.
input.json
│
▼ Phase 1 — Book
draft → edit → proof
│
▼ chapters → illustration plan → illustration rendering (FLUX / SDXL)
│
▼ reflection (per chapter) → experiment page → biography (if `dedicated_to`)
│
▼ book_content.json → PDF + EPUB
│
▼ Phase 2 — Movie
scene extraction → per-scene production
(Kokoro TTS → image → HunyuanVideo I2V → MusicGen → subtitles → composite)
│
▼ ffmpeg concat → final MP4
For a single input file, FableFlow writes to output/<input_name>/:
| File / dir | Source |
|---|---|
draft_story.txt, edited_story.txt, final_story.txt |
three story-development agents (LLM) |
book_content.json |
structured book (chapters, illustrations, reflection questions, experiment, biography) |
illustrations/chapter_N_ill_M.png |
rendered with FLUX-dev (or your configured image model) |
<title>.pdf, <title>.epub |
publication-ready book in both formats |
scene_manifest.json |
per-scene narration + timing |
scenes/<scene>_narration.m4a, <scene>_image.png, <scene>_video.mp4, <scene>_music.mp3, <scene>.srt, <scene>_final.mp4 |
per-scene multimedia |
<title>.mp4 |
final movie (ffmpeg-concatenated scenes) |
movie_manifest.json |
chapter markers + timecodes for the final movie |
Every chapter ends with 3 reflection questions ("Think About It").
The back matter has a "Try This at Home" experiment and, if dedicated_to is set, a "Meet ..." one-page biography.
The story agents enforce three chapter-level requirements, and the drafter uses chapter-by-chapter generation (one LLM call per chapter) when a chapter_outline is in the input spec. This is the right architecture for content-rich science books: a single mega-prompt asking for 8 deep chapters at once usually terminates after chapter 1; bounded per-chapter calls reliably produce all of them.
-
Each chapter is 700–1100 words (3–4 illustrated picture-book pages) — substantial enough for depth, achievable in one focused LLM call.
-
STEM / educational content is explained by an adult character, not the child. The draft prompt looks at your input spec, picks the adult characters (supporting/protagonist roles aged ≥18 —
_adult_authority_names), names them in each per-chapter prompt, and asks the LLM to route explanations through them: a concrete analogy, then the mechanism in 2–3 connected sentences, then a tie-back to what the child just observed. The child asks; the adult teaches. Hallucinated science is explicitly disallowed in the prompt ("if you are not certain a fact is accurate, do not include it"). -
Each chapter ends with one thematic poem. Forms allowed: rhyming quatrain, haiku, free verse, or couplet — the drafter picks what fits the moment. The chapter structurer extracts poems into
chapter.poem; the publishers render them centered + italic between✦ ✦ ✦sparkle markers, between prose and reflection questions.
Each per-chapter call also receives:
- A rolling synopsis of earlier chapters (so chapter 5 knows what happened in 1–4).
- One rotated recurring motif to weave through this chapter (so each motif lands in multiple chapters).
- A small vocabulary slice — the first N chapters introduce one new word each.
- The featured-moment block (e.g. a verbatim recorded message) if the moment lives in this chapter.
- The corresponding chunk of the user's
draft_story(if provided), located byCHAPTER Nmarkers.
After all chapters are drafted, the editor agent does a single cross-cutting polish pass for motif coverage, transitions, and any chapter still missing a poem.
Character consistency (the same character looking the same across every illustration in a book) is the hardest part of multi-image generation. FableFlow handles it with three layers:
-
Strict identity anchors in prompts. Every illustration prompt opens with a verbatim identity block per scene character (heritage → skin tone → hair → eyes → distinctive features → clothing). The string is identical across calls so the model latches onto the same identity. See
agents/_image_prompts.py. -
Canonical reference images. At the start of Phase 1,
CharacterReferenceAgentrenders one portrait + one full-body reference per major character. These land inoutput/<name>/character_refs/and form a visual canon you can inspect. -
Visual conditioning on the reference. Every chapter illustration and scene image is generated with the canonical portrait fed in as image conditioning. The mechanism depends on the base model:
- FLUX 2 (default): native
image=parameter onFlux2Pipeline. No extra weights to download. - FLUX 1 (legacy): IP-Adapter via
XLabs-AI/flux-ip-adapter-v2(note the "v2" is the IP-Adapter version, not the FLUX version — there is no FLUX 2 IP-Adapter because FLUX 2 doesn't need one). - SDXL: IP-Adapter via
h94/IP-Adapter(ip-adapter-plus_sdxl_vit-h). - SD3: text-only fallback (no stable IP-Adapter yet).
- FLUX 2 (default): native
Model choices (in config/default.yaml):
| Modality | Default | Alternatives |
|---|---|---|
| Image | black-forest-labs/FLUX.2-dev (~24GB, native image conditioning) |
FLUX.2-klein-9B (~18GB), FLUX.2-klein-4B (~10GB), FLUX.1-dev (legacy, IP-Adapter), stable-diffusion-3.5-large, stable-diffusion-xl-base-1.0 |
| Video | Wan-AI/Wan2.1-I2V-14B-720P-Diffusers (~25GB) |
Wan-AI/Wan2.1-I2V-14B-480P-Diffusers (smaller), hunyuanvideo-community/HunyuanVideo-I2V |
| Music | facebook/musicgen-small |
— |
| TTS | kokoro (af_heart voice) |
— |
Switching models is a one-line edit in the config — no code changes needed. EnhancedImageModel and EnhancedVideoModel dispatch on the model name and pick the right diffusers pipeline.
FableFlow loads the LLM (via vLLM), an image model (FLUX 2 by default), TTS, MusicGen, and a video model. The LLM typically lives on a separate machine via the OpenAI-compatible endpoint; the rest run locally. The local models are sized for a single 80 GB GPU as long as you use CPU offload — which the loader does automatically.
Two things to know:
-
Don't disable the CPU offload. FLUX 2-dev's transformer + text encoder don't fit on an 80 GB card if loaded fully. The image and video model loaders use diffusers'
enable_model_cpu_offloadand do not callpipeline.to("cuda")alongside it. (Doing both is a common mistake that loads everything onto GPU first and then OOMs — see_place_on_deviceinmodels/image.py.) -
If you still hit OOM, lower the VRAM footprint via
model.image_generation.cpu_offload:
| Strategy | cpu_offload |
VRAM (FLUX.2-dev) | Speed |
|---|---|---|---|
| Model offload (default) | "model" |
~30 GB peak | normal |
| Sequential offload | "sequential" |
~8 GB peak | ~3× slower |
| Full GPU (no offload) | "none" |
~60+ GB | fastest |
Or switch to a smaller variant — all set via model.image_generation.model in config/default.yaml:
| Model | VRAM (with "model" offload) |
Notes |
|---|---|---|
FLUX.2-dev |
~30 GB | default, full quality |
FLUX.2-klein-9B |
~22 GB | smaller FLUX 2 |
FLUX.2-klein-9b-fp8 |
~10 GB | FP8 quantized; best size/quality ratio for shared GPUs |
FLUX.2-klein-9b-nvfp4 |
~6 GB | NVFP4 quantized; smallest FLUX 2 |
FLUX.1-dev |
~12 GB | legacy, uses XLabs IP-Adapter for character refs |
If vLLM runs on the same machine (same GPU) as fable-flow, split GPUs via CUDA_VISIBLE_DEVICES:
CUDA_VISIBLE_DEVICES=0 make vllm-serve VLLM_MODEL=... # vLLM on GPU 0
CUDA_VISIBLE_DEVICES=1 make run INPUT=... # fable-flow on GPU 1| Required | |
|---|---|
| Python | 3.13 (managed by uv automatically) |
| uv | 0.9+ — install: curl -LsSf https://astral.sh/uv/install.sh | sh |
| ffmpeg | needed for scene composition + final concat |
| NVIDIA GPU | recommended for image, music, video, and serving the LLM. A single A100 80GB runs everything. |
| NVIDIA driver | R525.60.13+ (CUDA 12.6 compat). For driver R555+ you can bump to vllm 0.20+ — see Upgrading vllm below. |
sudo apt install ffmpeg # if not already present
nvidia-smi # confirm driver + GPUgit clone <repo> fable-flow && cd fable-flow
make install # = uv sync — creates .venv with Python 3.13, installs from uv.lockThat single command downloads Python 3.13 (if missing), creates .venv, and installs all pinned dependencies including torch 2.10.0+cu126, vllm 0.19.1, diffusers, transformers, kokoro, reportlab, etc.
FableFlow's LLM client speaks the OpenAI wire protocol, so it works with any OpenAI-compatible server. Point it at one with three env vars.
For your setup — google/gemma-4-31B-it on http://0.0.0.0:8000/v1:
cp .env.example .env.env:
export MODEL_SERVER_URL="http://0.0.0.0:8000/v1"
export MODEL_API_KEY="dev-api-key" # vLLM accepts anything if --api-key not set
export DEFAULT_MODEL="google/gemma-4-31B-it"Source it before running:
source .envconfig/default.yaml controls model defaults, generation parameters, and styling. Only override what you need:
model:
temperature: 0.8 # 0.5 for proof agent, 0.4 for illustration plan (already overridden in code)
max_tokens: 64000 # cap per LLM call
continuation:
enabled: true # chain calls to cover long stories
max_continuations: 5
image_generation:
model: "black-forest-labs/FLUX.1-dev" # or "FLUX.1-schnell" for speed
video_generation:
model: "hunyuanvideo-community/HunyuanVideo-I2V"
num_frames: 129
num_inference_steps: 50
style:
illustration_style: "digital watercolor and ink, soft lighting, warm colors"
music_style: "gentle children's storybook music, soft electronic and orchestral blend"
book:
author: "FableFlow AI"
publisher: "FableFlow Publishing"
publisher_location: "Sydney, Australia"The input is a single JSON file matching FableFlowInput. Required: project, characters (≥1 with role=protagonist), story_seed. Optional: production_config, dedicated_to.
Full schema (validated at load time — Pydantic will tell you exactly what's wrong):
Two ready-to-run examples live in examples/:
cassie_beach_adventure_input.json— short adventure dedicated to Rachel Carsoncassie_stephen_hawking_input.json— longer biography-style book
Validate one before generating (cheap, no model calls):
make validate INPUT=examples/cassie_stephen_hawking_input.jsonIn one terminal:
make vllm-serve VLLM_MODEL=google/gemma-4-31B-it VLLM_PORT=8000Wait until you see INFO ... Started server process and Uvicorn running on http://0.0.0.0:8000. Sanity check:
curl http://0.0.0.0:8000/v1/modelsIn another terminal:
source .env
make run INPUT=examples/cassie_stephen_hawking_input.jsonmake run is shorthand for fable-flow generate <input>. Override the output dir or model on the fly:
make run INPUT=my_book.json OUTPUT=output/my_book MODEL=google/gemma-4-31B-itSkip phases when iterating:
fable-flow generate examples/cassie_beach_adventure_input.json --book-only
fable-flow generate examples/cassie_beach_adventure_input.json --skip-book-publishResume after a crash — reuses every artifact already on disk and only produces what's missing:
fable-flow generate examples/cassie_beach_adventure_input.json --resumeResume granularity is fine-grained: per story stage (draft/edit/proof), per illustration file, per scene sub-step (narration → image → video → music → subtitles → composite), per PDF/EPUB, per final MP4. The scene manifest is persisted after every completed sub-step, so a mid-scene crash continues exactly where it stopped on the next --resume run.
Final summary prints all generated paths.
After make run INPUT=examples/cassie_stephen_hawking_input.json:
output/cassie_stephen_hawking_input/
├── draft_story.txt
├── edited_story.txt
├── final_story.txt
├── book_content.json
├── illustrations/
│ ├── chapter_1_ill_1.png
│ ├── chapter_2_ill_1.png
│ └── ...
├── cassie_meets_stephen_hawking.pdf
├── cassie_meets_stephen_hawking.epub
├── scene_manifest.json
├── scenes/
│ ├── ch1_s01_narration.m4a
│ ├── ch1_s01_image.png
│ ├── ch1_s01_video.mp4
│ ├── ch1_s01_music.mp3
│ ├── ch1_s01.srt
│ ├── ch1_s01_final.mp4
│ └── ...
├── concat_list.txt
├── movie_manifest.json
└── cassie_meets_stephen_hawking.mp4
| Target | What it does |
|---|---|
make install / make sync |
sync .venv from uv.lock |
make lock |
resolve uv.lock |
make lock-upgrade |
resolve uv.lock with --upgrade |
make validate INPUT=... |
schema-check an input JSON, no model calls |
make run INPUT=... [OUTPUT=...] [MODEL=...] |
run the full pipeline |
make vllm-serve VLLM_MODEL=... [VLLM_PORT=8000] |
start a local vLLM OpenAI server |
make test |
pytest, no coverage |
make cov |
pytest with coverage report |
make fmt |
ruff format + autofix |
make lint |
ruff format-check + check |
make check |
lint + mypy |
make clean |
remove caches, build artifacts |
make help |
show all targets |
libcudart.so.13: cannot open shared object file — your driver doesn't support CUDA 13. Either upgrade the driver to R580+ and bump vllm>=0.22 in pyproject.toml (also re-pin the torch index from cu126 to cu130), or stay on the cu126 stack which is the default here.
vLLM out of memory — pass --gpu-memory-utilization 0.85 or --max-model-len 8192 to vllm serve. Update the Makefile target if you want these by default.
Phase 2 hits GPU memory — SceneProductionCoordinator lazy-releases the image model before HunyuanVideo loads, but they still both need to fit alongside the cached LLM. If you're sharing the GPU with vLLM, run vLLM on a separate device (CUDA_VISIBLE_DEVICES=0 vllm serve ...) and the pipeline on another (CUDA_VISIBLE_DEVICES=1 fable-flow generate ...).
LLM JSON parsing fails — every agent uses a multi-strategy parser (agents/_json_parse.py) that handles plain JSON, fenced blocks, and prose-embedded JSON. If a specific model frequently produces broken output, lower temperature for that agent or switch models.
Audio duration mismatch / pydub warnings on Python 3.13 — audioop was removed from stdlib in 3.13; audioop-lts is pulled in automatically as a conditional dependency.
{ "project": { "title": "Cassie Meets Stephen Hawking", "series": "Curious Cassie", // optional "volume": 2, // optional, 1+ "target_age": 7, // 3–12 "genre": "educational biography" }, "characters": [ { "name": "Cassie", "age": 6, // optional, 1–100 "role": "protagonist", // protagonist | supporting | antagonist | minor "personality": "curious, adventurous, kind", "appearance": { "heritage": "Indian Australian", "skin_tone": "warm honey-brown", "hair": "shoulder-length wavy black", "eyes": "bright curious brown", "distinctive_features": ["wide eyes", "infectious smile"] }, "typical_clothing": "colorful sundresses or overalls", "relationship": null // optional } // ... at least one character with role=protagonist ], "story_seed": { "theme": "learning about Stephen Hawking and the universe", "setting": "Sydney Science Museum", "learning_objectives": [ // optional but recommended "Stephen Hawking studied black holes and the universe", "He had ALS but never gave up on his dreams", "Science is about asking questions" ], "draft_story": null // optional: a seed paragraph to expand }, "production_config": { // optional — sensible defaults apply "book": { "format": ["pdf", "epub"], // either or both "page_count_target": 24, // 8–100 "illustration_style": "digital watercolor blend" }, "movie": { "enabled": true, "include_narration": true, "include_subtitles": true, "music_style": "gentle orchestral" } }, "dedicated_to": { // optional — adds the biography back-matter page "name": "Stephen Hawking", "field": "theoretical physics and cosmology", "notable_for": "A Brief History of Time, black hole thermodynamics" } }