Skip to content

suneeta-mall/fable-flow

Repository files navigation

FableFlow

FableFlow Logo

AI-powered children's book and movie production pipeline.

FableFlow turns a one-page input spec (characters, theme, learning objectives) into a complete book and animated movie. Every artifact is generated by real models — there are no placeholders.

input.json
   │
   ▼  Phase 1 — Book
draft  →  edit  →  proof
   │
   ▼  chapters  →  illustration plan  →  illustration rendering (FLUX / SDXL)
   │
   ▼  reflection (per chapter)  →  experiment page  →  biography (if `dedicated_to`)
   │
   ▼  book_content.json  →  PDF  +  EPUB
   │
   ▼  Phase 2 — Movie
scene extraction  →  per-scene production
                       (Kokoro TTS → image → HunyuanVideo I2V → MusicGen → subtitles → composite)
   │
   ▼  ffmpeg concat  →  final MP4

What you get

For a single input file, FableFlow writes to output/<input_name>/:

File / dir Source
draft_story.txt, edited_story.txt, final_story.txt three story-development agents (LLM)
book_content.json structured book (chapters, illustrations, reflection questions, experiment, biography)
illustrations/chapter_N_ill_M.png rendered with FLUX-dev (or your configured image model)
<title>.pdf, <title>.epub publication-ready book in both formats
scene_manifest.json per-scene narration + timing
scenes/<scene>_narration.m4a, <scene>_image.png, <scene>_video.mp4, <scene>_music.mp3, <scene>.srt, <scene>_final.mp4 per-scene multimedia
<title>.mp4 final movie (ffmpeg-concatenated scenes)
movie_manifest.json chapter markers + timecodes for the final movie

Every chapter ends with 3 reflection questions ("Think About It"). The back matter has a "Try This at Home" experiment and, if dedicated_to is set, a "Meet ..." one-page biography.


Chapter writing quality

The story agents enforce three chapter-level requirements, and the drafter uses chapter-by-chapter generation (one LLM call per chapter) when a chapter_outline is in the input spec. This is the right architecture for content-rich science books: a single mega-prompt asking for 8 deep chapters at once usually terminates after chapter 1; bounded per-chapter calls reliably produce all of them.

  1. Each chapter is 700–1100 words (3–4 illustrated picture-book pages) — substantial enough for depth, achievable in one focused LLM call.

  2. STEM / educational content is explained by an adult character, not the child. The draft prompt looks at your input spec, picks the adult characters (supporting/protagonist roles aged ≥18 — _adult_authority_names), names them in each per-chapter prompt, and asks the LLM to route explanations through them: a concrete analogy, then the mechanism in 2–3 connected sentences, then a tie-back to what the child just observed. The child asks; the adult teaches. Hallucinated science is explicitly disallowed in the prompt ("if you are not certain a fact is accurate, do not include it").

  3. Each chapter ends with one thematic poem. Forms allowed: rhyming quatrain, haiku, free verse, or couplet — the drafter picks what fits the moment. The chapter structurer extracts poems into chapter.poem; the publishers render them centered + italic between ✦ ✦ ✦ sparkle markers, between prose and reflection questions.

Each per-chapter call also receives:

  • A rolling synopsis of earlier chapters (so chapter 5 knows what happened in 1–4).
  • One rotated recurring motif to weave through this chapter (so each motif lands in multiple chapters).
  • A small vocabulary slice — the first N chapters introduce one new word each.
  • The featured-moment block (e.g. a verbatim recorded message) if the moment lives in this chapter.
  • The corresponding chunk of the user's draft_story (if provided), located by CHAPTER N markers.

After all chapters are drafted, the editor agent does a single cross-cutting polish pass for motif coverage, transitions, and any chapter still missing a poem.


Image & video quality

Character consistency (the same character looking the same across every illustration in a book) is the hardest part of multi-image generation. FableFlow handles it with three layers:

  1. Strict identity anchors in prompts. Every illustration prompt opens with a verbatim identity block per scene character (heritage → skin tone → hair → eyes → distinctive features → clothing). The string is identical across calls so the model latches onto the same identity. See agents/_image_prompts.py.

  2. Canonical reference images. At the start of Phase 1, CharacterReferenceAgent renders one portrait + one full-body reference per major character. These land in output/<name>/character_refs/ and form a visual canon you can inspect.

  3. Visual conditioning on the reference. Every chapter illustration and scene image is generated with the canonical portrait fed in as image conditioning. The mechanism depends on the base model:

    • FLUX 2 (default): native image= parameter on Flux2Pipeline. No extra weights to download.
    • FLUX 1 (legacy): IP-Adapter via XLabs-AI/flux-ip-adapter-v2 (note the "v2" is the IP-Adapter version, not the FLUX version — there is no FLUX 2 IP-Adapter because FLUX 2 doesn't need one).
    • SDXL: IP-Adapter via h94/IP-Adapter (ip-adapter-plus_sdxl_vit-h).
    • SD3: text-only fallback (no stable IP-Adapter yet).

Model choices (in config/default.yaml):

Modality Default Alternatives
Image black-forest-labs/FLUX.2-dev (~24GB, native image conditioning) FLUX.2-klein-9B (~18GB), FLUX.2-klein-4B (~10GB), FLUX.1-dev (legacy, IP-Adapter), stable-diffusion-3.5-large, stable-diffusion-xl-base-1.0
Video Wan-AI/Wan2.1-I2V-14B-720P-Diffusers (~25GB) Wan-AI/Wan2.1-I2V-14B-480P-Diffusers (smaller), hunyuanvideo-community/HunyuanVideo-I2V
Music facebook/musicgen-small
TTS kokoro (af_heart voice)

Switching models is a one-line edit in the config — no code changes needed. EnhancedImageModel and EnhancedVideoModel dispatch on the model name and pick the right diffusers pipeline.


Fitting on hardware

FableFlow loads the LLM (via vLLM), an image model (FLUX 2 by default), TTS, MusicGen, and a video model. The LLM typically lives on a separate machine via the OpenAI-compatible endpoint; the rest run locally. The local models are sized for a single 80 GB GPU as long as you use CPU offload — which the loader does automatically.

Two things to know:

  1. Don't disable the CPU offload. FLUX 2-dev's transformer + text encoder don't fit on an 80 GB card if loaded fully. The image and video model loaders use diffusers' enable_model_cpu_offload and do not call pipeline.to("cuda") alongside it. (Doing both is a common mistake that loads everything onto GPU first and then OOMs — see _place_on_device in models/image.py.)

  2. If you still hit OOM, lower the VRAM footprint via model.image_generation.cpu_offload:

Strategy cpu_offload VRAM (FLUX.2-dev) Speed
Model offload (default) "model" ~30 GB peak normal
Sequential offload "sequential" ~8 GB peak ~3× slower
Full GPU (no offload) "none" ~60+ GB fastest

Or switch to a smaller variant — all set via model.image_generation.model in config/default.yaml:

Model VRAM (with "model" offload) Notes
FLUX.2-dev ~30 GB default, full quality
FLUX.2-klein-9B ~22 GB smaller FLUX 2
FLUX.2-klein-9b-fp8 ~10 GB FP8 quantized; best size/quality ratio for shared GPUs
FLUX.2-klein-9b-nvfp4 ~6 GB NVFP4 quantized; smallest FLUX 2
FLUX.1-dev ~12 GB legacy, uses XLabs IP-Adapter for character refs

If vLLM runs on the same machine (same GPU) as fable-flow, split GPUs via CUDA_VISIBLE_DEVICES:

CUDA_VISIBLE_DEVICES=0 make vllm-serve VLLM_MODEL=...    # vLLM on GPU 0
CUDA_VISIBLE_DEVICES=1 make run INPUT=...                 # fable-flow on GPU 1

Prerequisites

Required
Python 3.13 (managed by uv automatically)
uv 0.9+ — install: curl -LsSf https://astral.sh/uv/install.sh | sh
ffmpeg needed for scene composition + final concat
NVIDIA GPU recommended for image, music, video, and serving the LLM. A single A100 80GB runs everything.
NVIDIA driver R525.60.13+ (CUDA 12.6 compat). For driver R555+ you can bump to vllm 0.20+ — see Upgrading vllm below.
sudo apt install ffmpeg          # if not already present
nvidia-smi                       # confirm driver + GPU

Install

git clone <repo> fable-flow && cd fable-flow
make install      # = uv sync — creates .venv with Python 3.13, installs from uv.lock

That single command downloads Python 3.13 (if missing), creates .venv, and installs all pinned dependencies including torch 2.10.0+cu126, vllm 0.19.1, diffusers, transformers, kokoro, reportlab, etc.


Configure

FableFlow's LLM client speaks the OpenAI wire protocol, so it works with any OpenAI-compatible server. Point it at one with three env vars.

For your setup — google/gemma-4-31B-it on http://0.0.0.0:8000/v1:

cp .env.example .env

.env:

export MODEL_SERVER_URL="http://0.0.0.0:8000/v1"
export MODEL_API_KEY="dev-api-key"      # vLLM accepts anything if --api-key not set
export DEFAULT_MODEL="google/gemma-4-31B-it"

Source it before running:

source .env

Tuning (optional)

config/default.yaml controls model defaults, generation parameters, and styling. Only override what you need:

model:
  temperature: 0.8           # 0.5 for proof agent, 0.4 for illustration plan (already overridden in code)
  max_tokens: 64000          # cap per LLM call
  continuation:
    enabled: true            # chain calls to cover long stories
    max_continuations: 5
  image_generation:
    model: "black-forest-labs/FLUX.1-dev"    # or "FLUX.1-schnell" for speed
  video_generation:
    model: "hunyuanvideo-community/HunyuanVideo-I2V"
    num_frames: 129
    num_inference_steps: 50

style:
  illustration_style: "digital watercolor and ink, soft lighting, warm colors"
  music_style: "gentle children's storybook music, soft electronic and orchestral blend"

book:
  author: "FableFlow AI"
  publisher: "FableFlow Publishing"
  publisher_location: "Sydney, Australia"

Create the input data

The input is a single JSON file matching FableFlowInput. Required: project, characters (≥1 with role=protagonist), story_seed. Optional: production_config, dedicated_to.

Full schema (validated at load time — Pydantic will tell you exactly what's wrong):

{
  "project": {
    "title": "Cassie Meets Stephen Hawking",
    "series": "Curious Cassie",       // optional
    "volume": 2,                      // optional, 1+
    "target_age": 7,                  // 3–12
    "genre": "educational biography"
  },

  "characters": [
    {
      "name": "Cassie",
      "age": 6,                       // optional, 1–100
      "role": "protagonist",          // protagonist | supporting | antagonist | minor
      "personality": "curious, adventurous, kind",
      "appearance": {
        "heritage": "Indian Australian",
        "skin_tone": "warm honey-brown",
        "hair": "shoulder-length wavy black",
        "eyes": "bright curious brown",
        "distinctive_features": ["wide eyes", "infectious smile"]
      },
      "typical_clothing": "colorful sundresses or overalls",
      "relationship": null            // optional
    }
    // ... at least one character with role=protagonist
  ],

  "story_seed": {
    "theme": "learning about Stephen Hawking and the universe",
    "setting": "Sydney Science Museum",
    "learning_objectives": [          // optional but recommended
      "Stephen Hawking studied black holes and the universe",
      "He had ALS but never gave up on his dreams",
      "Science is about asking questions"
    ],
    "draft_story": null               // optional: a seed paragraph to expand
  },

  "production_config": {              // optional — sensible defaults apply
    "book": {
      "format": ["pdf", "epub"],      // either or both
      "page_count_target": 24,        // 8–100
      "illustration_style": "digital watercolor blend"
    },
    "movie": {
      "enabled": true,
      "include_narration": true,
      "include_subtitles": true,
      "music_style": "gentle orchestral"
    }
  },

  "dedicated_to": {                   // optional — adds the biography back-matter page
    "name": "Stephen Hawking",
    "field": "theoretical physics and cosmology",
    "notable_for": "A Brief History of Time, black hole thermodynamics"
  }
}

Two ready-to-run examples live in examples/:

  • cassie_beach_adventure_input.json — short adventure dedicated to Rachel Carson
  • cassie_stephen_hawking_input.json — longer biography-style book

Validate one before generating (cheap, no model calls):

make validate INPUT=examples/cassie_stephen_hawking_input.json

Run end-to-end

1. Serve the LLM

In one terminal:

make vllm-serve VLLM_MODEL=google/gemma-4-31B-it VLLM_PORT=8000

Wait until you see INFO ... Started server process and Uvicorn running on http://0.0.0.0:8000. Sanity check:

curl http://0.0.0.0:8000/v1/models

2. Run the pipeline

In another terminal:

source .env
make run INPUT=examples/cassie_stephen_hawking_input.json

make run is shorthand for fable-flow generate <input>. Override the output dir or model on the fly:

make run INPUT=my_book.json OUTPUT=output/my_book MODEL=google/gemma-4-31B-it

Skip phases when iterating:

fable-flow generate examples/cassie_beach_adventure_input.json --book-only
fable-flow generate examples/cassie_beach_adventure_input.json --skip-book-publish

Resume after a crash — reuses every artifact already on disk and only produces what's missing:

fable-flow generate examples/cassie_beach_adventure_input.json --resume

Resume granularity is fine-grained: per story stage (draft/edit/proof), per illustration file, per scene sub-step (narration → image → video → music → subtitles → composite), per PDF/EPUB, per final MP4. The scene manifest is persisted after every completed sub-step, so a mid-scene crash continues exactly where it stopped on the next --resume run.

Final summary prints all generated paths.


Output layout

After make run INPUT=examples/cassie_stephen_hawking_input.json:

output/cassie_stephen_hawking_input/
├── draft_story.txt
├── edited_story.txt
├── final_story.txt
├── book_content.json
├── illustrations/
│   ├── chapter_1_ill_1.png
│   ├── chapter_2_ill_1.png
│   └── ...
├── cassie_meets_stephen_hawking.pdf
├── cassie_meets_stephen_hawking.epub
├── scene_manifest.json
├── scenes/
│   ├── ch1_s01_narration.m4a
│   ├── ch1_s01_image.png
│   ├── ch1_s01_video.mp4
│   ├── ch1_s01_music.mp3
│   ├── ch1_s01.srt
│   ├── ch1_s01_final.mp4
│   └── ...
├── concat_list.txt
├── movie_manifest.json
└── cassie_meets_stephen_hawking.mp4

Make targets

Target What it does
make install / make sync sync .venv from uv.lock
make lock resolve uv.lock
make lock-upgrade resolve uv.lock with --upgrade
make validate INPUT=... schema-check an input JSON, no model calls
make run INPUT=... [OUTPUT=...] [MODEL=...] run the full pipeline
make vllm-serve VLLM_MODEL=... [VLLM_PORT=8000] start a local vLLM OpenAI server
make test pytest, no coverage
make cov pytest with coverage report
make fmt ruff format + autofix
make lint ruff format-check + check
make check lint + mypy
make clean remove caches, build artifacts
make help show all targets

Troubleshooting

libcudart.so.13: cannot open shared object file — your driver doesn't support CUDA 13. Either upgrade the driver to R580+ and bump vllm>=0.22 in pyproject.toml (also re-pin the torch index from cu126 to cu130), or stay on the cu126 stack which is the default here.

vLLM out of memory — pass --gpu-memory-utilization 0.85 or --max-model-len 8192 to vllm serve. Update the Makefile target if you want these by default.

Phase 2 hits GPU memorySceneProductionCoordinator lazy-releases the image model before HunyuanVideo loads, but they still both need to fit alongside the cached LLM. If you're sharing the GPU with vLLM, run vLLM on a separate device (CUDA_VISIBLE_DEVICES=0 vllm serve ...) and the pipeline on another (CUDA_VISIBLE_DEVICES=1 fable-flow generate ...).

LLM JSON parsing fails — every agent uses a multi-strategy parser (agents/_json_parse.py) that handles plain JSON, fenced blocks, and prose-embedded JSON. If a specific model frequently produces broken output, lower temperature for that agent or switch models.

Audio duration mismatch / pydub warnings on Python 3.13audioop was removed from stdlib in 3.13; audioop-lts is pulled in automatically as a conditional dependency.


License

Elastic-2.0

About

Where Stories Come to Life with AI Magic

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors