Modular AI Video Generation Pipeline

⚠️ Important Notes

Video Quality Issues: If your generated videos appear scrambled or distorted, this typically means you're not using the optimal video dimensions that the selected model was trained on. Each AI model has specific resolution requirements for best results. Check the model documentation for recommended dimensions and adjust your video settings accordingly.

Contributors Welcome! 🚀 This project is open to contributions from the community. If you're interested in helping improve this pipeline, adding new models, or fixing bugs, please feel free to submit pull requests or open issues.

New Project Announcement: I've started working on a completely separate and different video generation project. If you're interested in learning more or collaborating, feel free to reach out to me on LinkedIn!

An extensible, modular pipeline for generating short-form videos using a variety of AI models. This tool provides a powerful Streamlit-based web interface to define a video topic, select different AI models for each generation step (language, speech, image, video), and orchestrate the entire content creation process from script to final rendered video.

🎥 Demo Video

Modular AI Video Generation Pipeline Demo

Watch the full demo on YouTube

Core Features

End-to-End Video Generation: Go from a single topic idea to a fully edited video with narration, background visuals, and text overlays in one integrated workflow.
Fully Modular Architecture: Easily add, remove, or swap different AI models for each part of the pipeline. The system is designed for extension.
Dynamic Model Discovery: The application automatically discovers any new model modules you add, making them immediately available for selection in the UI.
Dual Generation Workflows:
- Image-to-Video (High Quality): Generates a keyframe image first, then animates it. Offers higher visual quality and control.
- Text-to-Video (Fast): Generates video clips directly from text prompts for a faster, more streamlined process.
Character Consistency: Utilizes IP-Adapters in supported models (like Juggernaut-XL) to maintain the appearance of a specific character or subject across different scenes.
Interactive Project Dashboard: Once a project is created, you have full control. Edit scripts, regenerate audio, modify visual prompts, and see the progress of every task in real-time.
Stateful Project Management: Stop and resume your work at any time. The entire project state is saved, allowing you to load existing projects, make changes, and continue where you left off.
Multi-Language Voice Generation: Generate narration in over 15 languages (including English, Spanish, French, German, Japanese, Hindi, and more) using advanced TTS models.
Voice Cloning: Provide a short .wav file of a reference voice to clone it for the video's narration, powered by Coqui XTTS.

Future Development Plans

TODO List

Text-to-Music (TTM) Modules
- Background music generation for videos
- Pure music production capabilities
- Integration with existing video pipeline
Additional Model Support
- FramePack and other advanced video generation models
- Enhanced model compatibility and optimization
- Lora (Low-Rank Adaptation) support for fine-tuning models
- Custom Lora training and management interface
- ControlNet integration for pose, depth, and style control
- Advanced ControlNet features (canny, segmentation, etc.)
Character Consistency Features
- Lora-based character consistency across scenes
- Character style preservation and transfer
- Multi-character management system
- Character pose and expression control
Advanced Editing Features
- Multilayer timeline style editor
- Professional-grade video editing capabilities
- Enhanced control over transitions and effects
UI/UX Improvements
- Migration to FastAPI backend
- Modern frontend with React/Vue
- Enhanced user experience and performance
Production Infrastructure
- Distributed model serving system
- Load balancing across multiple GPUs/servers
- Model caching and optimization
- User quota and resource management
- Queue management for multiple users
- Real-time progress tracking and status updates
- Automatic failover and recovery
- Resource usage analytics and monitoring

Architectural Overview

The pipeline follows a state-driven, sequential process. The ProjectManager tracks the status of every task in a project.json file. The TaskExecutor then reads this state and executes the next pending task using the specific modules you selected for the project.

graph TD
    A[Start: Create New Project in UI] --> B{Select Models & Workflow};
    B --> C[Provide Topic & Settings];
    C --> D[Project Initialized - project.json];
    D --> E[Task: Generate Script LLM];
    E --> F[Task: Generate Audio TTS];
    F --> G[Task: Create Scene Shots - LLM];

    subgraph "For Each Scene Shot"
        direction LR
        G --> H{I2V or T2V Flow?};
        H -- I2V --> I[Task: Gen Image T2I];
        I --> J[Task: Gen Video I2V];
        H -- T2V --> K[Task: Gen Video T2V];
    end

    J --> L[All Shots Done?];
    K --> L;

    L -- Yes --> M[Task: Assemble Scene Videos];
    M --> N[All Scenes Done?];
    N -- Yes --> O[Task: Assemble Final Reel];
    O --> P[✅ Final Video Complete];

Installation

This project uses uv for fast package management.

1. Prerequisites

Python 3.10 or newer.
git for cloning the repository.
For GPU acceleration (highly recommended): NVIDIA GPU with CUDA drivers installed.
FFmpeg: Required by moviepy for video processing. Ensure it's installed and accessible in your system's PATH.
- Ubuntu: sudo apt update && sudo apt install ffmpeg
- macOS (with Homebrew): brew install ffmpeg
- Windows: Download from the official site and add the bin folder to your PATH.

2. Clone the Repository

git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name

3. Set up a Virtual Environment and Install Dependencies

First, install uv:

pip install uv

Next, create a virtual environment and install all required packages using uv. This single command installs all dependencies, including PyTorch for your specific CUDA version (or CPU if CUDA is not available).

# Create a virtual environment
uv venv

# Activate the environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install all packages using the provided command
uv pip install torch torchvision torchaudio coqui-tts transformers streamlit sentencepiece moviepy psutil gputil ftfy "huggingface-hub[cli]" hf-transfer accelerate bitsandbytes pydantic --no-build-package llvmlite

Note: The --no-build-package llvmlite flag is included to prevent uv from trying to build the llvmlite package from source, which can fail without the proper LLVM toolchain. This forces it to use a pre-compiled wheel.

Getting Started

With your virtual environment activated, launch the Streamlit app:

streamlit run app.py

Your web browser should automatically open to the application's UI.

Workflow

Create a New Project: On the main page, fill out the "Create New Project" form.
- Generation Flow: Choose between "Image to Video" (high quality) or "Text to Video" (fast).
- Model Selection: Select your desired AI models from the dropdowns for each stage.
- Topic: Enter the subject of your video.
- Settings: Configure the video format, length, and number of scenes.
- Characters (Optional): If you select a model flow that supports character consistency, you can upload reference images for your subjects.
Processing Dashboard: After creating the project, you'll be taken to the dashboard.
- Automatic Mode: Toggle "Automatic Mode" and click "Start" to have the pipeline run through all the steps automatically.
- Manual Control: With automatic mode off, you can manually trigger each step (e.g., "Gen Audio", "Gen Image"). This is perfect for fine-tuning.
- Edit Everything: Click into any text box to edit the script narration or visual prompts, then regenerate that specific part.
Final Assembly: Once all scenes and clips are generated, a button will appear to assemble the final video. Click it to view the finished product, complete with subtitles and synchronized audio.

⭐ How to Develop Your Own Module ⭐

The pipeline is designed for easy extension. To add a new AI model, you simply need to create a Python class that inherits from one of the abstract base classes in base_modules.py and implements its required methods.

The Core Contract: base_modules.py

This file defines the interface for every module type:

BaseLLM: For language models.
BaseTTS: For text-to-speech models.
BaseT2I: For text-to-image models.
BaseI2V: For image-to-video models.
BaseT2V: For text-to-video models.

Step-by-Step Guide to Adding a New I2V Model

Let's create a new hypothetical Image-to-Video module called "MotionWeaver".

1. Create the File

Create a new file in the appropriate directory: i2v_modules/i2v_motion_weaver.py.

2. Define the Config and Module Classes

In your new file, set up the basic structure.

# In i2v_modules/i2v_motion_weaver.py
import torch
from typing import Dict, Any, List, Optional, Union

# Import from the project's own files
from base_modules import BaseI2V, BaseModuleConfig, ModuleCapabilities
from config_manager import DEVICE, clear_vram_globally, ContentConfig

# Step 2a: Define a Pydantic config for your model's parameters.
class MotionWeaverI2VConfig(BaseModuleConfig):
    model_id: str = "some-repo/motion-weaver-pro"
    num_inference_steps: int = 20
    motion_strength: float = 0.9

# Step 2b: Create the main class inheriting from the correct base class.
class MotionWeaverI2V(BaseI2V):
    # Link your config class
    Config = MotionWeaverI2VConfig

    # Implement all required abstract methods...

3. Implement get_capabilities()

This is the most important method for UI integration. It tells the application what your model can do, and this information is used to populate dropdowns and enable/disable features.

# Inside the MotionWeaverI2V class

@classmethod
def get_capabilities(cls) -> ModuleCapabilities:
    """Returns the spec sheet for this module."""
    return ModuleCapabilities(
        # This title appears in the UI dropdown. Be descriptive!
        title="MotionWeaver Pro (Smooth & Cinematic)",
        vram_gb_min=10.0,
        ram_gb_min=16.0,
        supports_ip_adapter=False, # This model doesn't support it
        max_subjects=0,
    )

4. Implement Core Functionality (generate_video_from_image)

This is where you call your model's code. The method signature is strictly defined by BaseI2V.

# Inside the MotionWeaverI2V class

def generate_video_from_image(self, image_path: str, output_video_path: str, target_duration: float, content_config: ContentConfig, visual_prompt: str, motion_prompt: Optional[str], ip_adapter_image: Optional[Union[str, List[str]]] = None) -> str:
    # 1. Load the model (if not already loaded)
    self._load_pipeline()

    # 2. Prepare inputs (e.g., load image, calculate frames)
    from diffusers.utils import load_image, export_to_video
    input_image = load_image(image_path)
    num_frames = int(target_duration * content_config.fps)

    # 3. Call the pipeline
    video_frames = self.pipe(
        image=input_image,
        prompt=visual_prompt, # Use the prompts provided by the controller
        motion_prompt=motion_prompt,
        num_frames=num_frames,
        motion_strength=self.config.motion_strength
    ).frames

    # 4. Save the output and return the path
    export_to_video(video_frames, output_video_path, fps=content_config.fps)
    print(f"MotionWeaver video saved to {output_video_path}")
    return output_video_path

5. Implement VRAM Management and Other Helpers

To manage memory, the pipeline loads and unloads models as needed.

# Inside the MotionWeaverI2V class

def _load_pipeline(self):
    """Loads the model into memory. Should be idempotent."""
    if self.pipe is None:
        from some_library import MotionWeaverPipeline # Local import
        print(f"Loading MotionWeaver pipeline: {self.config.model_id}...")
        self.pipe = MotionWeaverPipeline.from_pretrained(
            self.config.model_id, torch_dtype=torch.float16
        ).to(DEVICE)

def clear_vram(self):
    """Releases the model from VRAM."""
    print("Clearing MotionWeaver VRAM...")
    if self.pipe is not None:
        clear_vram_globally(self.pipe) # Use the global helper
    self.pipe = None

def get_model_capabilities(self) -> Dict[str, Any]:
    """Return technical details about the model."""
    return {
        "resolutions": {"Portrait": (512, 768), "Landscape": (768, 512)},
        "max_shot_duration": 4.0 # Max video length it can generate at once
    }

6. Register the Module

Finally, open the __init__.py file in the same directory (i2v_modules/__init__.py) and add an import for your new class. This makes it discoverable.

# In i2v_modules/__init__.py

from .i2v_ltx import LtxI2V
from .i2v_svd import SvdI2V
from .i2v_slideshow import SlideshowI2V
from .i2v_motion_weaver import MotionWeaverI2V # <-- Add this line

That's it! The next time you run streamlit run app.py, "MotionWeaver Pro (Smooth & Cinematic)" will appear as an option in the Image-to-Video Model dropdown.

Directory Structure

.
├── app.py                      # Main Streamlit web application
├── base_modules.py             # Abstract base classes for all modules (The Contract)
├── config_manager.py           # Pydantic configs and global settings
├── module_discovery.py         # Service to automatically find and load modules
├── project_manager.py          # Handles loading, saving, and managing project state
├── task_executor.py            # Orchestrates the execution of generation tasks
├── ui_task_executor.py         # Bridges the UI with the task executor
├── utils.py                    # Shared utility functions
├── video_assembly.py           # Functions for combining clips into the final video
├── llm_modules/                # Language model modules
│   ├── __init__.py
│   └── llm_zephyr.py
├── tts_modules/                # Text-to-Speech modules
├── t2i_modules/                # Text-to-Image modules
├── i2v_modules/                # Image-to-Video modules
└── t2v_modules/                # Text-to-Video modules

License

This project is licensed under the MIT License - see the LICENSE file for details.

Important Notice Regarding Model Licenses

While this project itself is MIT-licensed, the AI models used within this pipeline (including but not limited to language models, text-to-speech models, image generation models, and video generation models) are subject to their own respective licenses. Users of this project are responsible for:

Reviewing and complying with the license terms of each model they choose to use
Ensuring they have the necessary rights and permissions to use these models
Understanding that different models may have different usage restrictions, commercial terms, and attribution requirements

The MIT license of this project does not override or modify the license terms of any third-party models. Users must independently verify and comply with all applicable model licenses before use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Modular AI Video Generation Pipeline

⚠️ Important Notes

🎥 Demo Video

Core Features

Future Development Plans

TODO List

Architectural Overview

Installation

Getting Started

Workflow

⭐ How to Develop Your Own Module ⭐

Step-by-Step Guide to Adding a New I2V Model

Directory Structure

License

Important Notice Regarding Model Licenses

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
i2v_modules		i2v_modules
llm_modules		llm_modules
t2i_modules		t2i_modules
t2v_modules		t2v_modules
tts_modules		tts_modules
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
__init__.py		__init__.py
__requirements.txt		__requirements.txt
app.py		app.py
base_modules.py		base_modules.py
check_versions.py		check_versions.py
combined_code.txt		combined_code.txt
config_manager.py		config_manager.py
module_discovery.py		module_discovery.py
mp3_to_wav_converter.py		mp3_to_wav_converter.py
package_code.sh		package_code.sh
project_manager.py		project_manager.py
pyproject.toml		pyproject.toml
system.py		system.py
task_executor.py		task_executor.py
todo.todo		todo.todo
ui_task_executor.py		ui_task_executor.py
utils.py		utils.py
uv.lock		uv.lock
video_assembly.py		video_assembly.py

gowrav-vishwakarma/ai-video-generator-editor

Folders and files

Latest commit

History

Repository files navigation

Modular AI Video Generation Pipeline

⚠️ Important Notes

🎥 Demo Video

Core Features

Future Development Plans

TODO List

Architectural Overview

Installation

Getting Started

Workflow

⭐ How to Develop Your Own Module ⭐

Step-by-Step Guide to Adding a New I2V Model

Directory Structure

License

Important Notice Regarding Model Licenses

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages