Skip to content

docs: add cache-dit to community works#534

Open
DefTruth wants to merge 2 commits intoWan-Video:mainfrom
xlite-dev:main
Open

docs: add cache-dit to community works#534
DefTruth wants to merge 2 commits intoWan-Video:mainfrom
xlite-dev:main

Conversation

@DefTruth
Copy link

  • [CacheDiT] offers Fully Cache Acceleration support for Wan2.1 with DBCache, TaylorSeer and Cache CFG. Visit their example for more details.

📚Core Features of CacheDiT

  • 🎉Full 🤗Diffusers Support: Notably, cache-dit now supports nearly all of Diffusers' DiT-based pipelines, include 30+ series, nearly 100+ pipelines, such as FLUX.1, Qwen-Image, Qwen-Image-Lightning, Wan 2.1/2.2, HunyuanImage-2.1, HunyuanVideo, HiDream, AuraFlow, CogView3Plus, CogView4, CogVideoX, LTXVideo, ConsisID, SkyReelsV2, VisualCloze, PixArt, Chroma, Mochi, SD 3.5, DiT-XL, etc.
  • 🎉Extremely Easy to Use: In most cases, you only need one line of code: cache_dit.enable_cache(...). After calling this API, just use the pipeline as normal.
  • 🎉Easy New Model Integration: Features like Unified Cache APIs, Forward Pattern Matching, Automatic Block Adapter, Hybrid Forward Pattern, and Patch Functor make it highly functional and flexible. For example, we achieved 🎉 Day 1 support for HunyuanImage-2.1 with 1.7x speedup w/o precision loss—even before it was available in the Diffusers library.
  • 🎉State-of-the-Art Performance: Compared with algorithms including Δ-DiT, Chipmunk, FORA, DuCa, TaylorSeer and FoCa, cache-dit achieved the SOTA performance w/ 7.4x↑🎉 speedup on ClipScore!
  • 🎉Support for 4/8-Steps Distilled Models: Surprisingly, cache-dit's DBCache works for extremely few-step distilled models—something many other methods fail to do.
  • 🎉Compatibility with Other Optimizations: Designed to work seamlessly with torch.compile, model CPU offload, sequential CPU offload, group offloading, Quantization(torchao, 🔥nunchaku), etc.
  • 🎉Hybrid Cache Acceleration: Now supports hybrid Block-wise Cache + Calibrator schemes (e.g., DBCache or DBPrune + TaylorSeerCalibrator). DBCache or DBPrune acts as the Indicator to decide when to cache, while the Calibrator decides how to cache. More mainstream cache acceleration algorithms (e.g., FoCa) will be supported in the future, along with additional benchmarks—stay tuned for updates!

BlockAdapter for Wan

https://github.com/vipshop/cache-dit/blob/25892385e2d01ee9968dfb12d3d34af3fefea553/src/cache_dit/cache_factory/block_adapters/__init__.py#L77

@BlockAdapterRegistry.register("Wan")
def wan_adapter(pipe, **kwargs) -> BlockAdapter:
    from diffusers import (
        WanTransformer3DModel,
        WanVACETransformer3DModel,
    )

    assert isinstance(
        pipe.transformer,
        (WanTransformer3DModel, WanVACETransformer3DModel),
    )
    if getattr(pipe, "transformer_2", None):
        assert isinstance(
            pipe.transformer_2,
            (WanTransformer3DModel, WanVACETransformer3DModel),
        )
        # Wan 2.2 MoE
        return BlockAdapter(
            pipe=pipe,
            transformer=[
                pipe.transformer,
                pipe.transformer_2,
            ],
            blocks=[
                pipe.transformer.blocks,
                pipe.transformer_2.blocks,
            ],
            forward_pattern=[
                ForwardPattern.Pattern_2,
                ForwardPattern.Pattern_2,
            ],
            has_separate_cfg=True,
            **kwargs,
        )
    else:
        # Wan 2.1
        return BlockAdapter(
            pipe=pipe,
            transformer=pipe.transformer,
            blocks=pipe.transformer.blocks,
            forward_pattern=ForwardPattern.Pattern_2,
            has_separate_cfg=True,
            **kwargs,
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments