Dev feature caching #68

Linboyan-trc · 2025-06-15T05:18:35Z

希望在 upstream 创建 dev_feature_caching 分支，并将本 PR 合并到该分支。

…reshold of taylorseer

…limitation

gemini-code-assist

Summary of Changes

Hello @Linboyan-trc, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive framework for implementing and utilizing different feature caching strategies within the transformer inference process for both Hunyuan and Wan models. The goal is to potentially improve inference speed by selectively skipping full computations and using cached approximations based on various algorithms (Tea, Taylor, Ada, Custom).

Highlights

Feature Caching Framework: Introduced a base class (BaseTransformerInfer) and refactored existing transformer inferrers (HunyuanTransformerInfer, WanTransformerInfer) into base classes (BaseHunyuanTransformerInfer, BaseWanTransformerInfer) and specific implementations to support various feature caching strategies.
New Caching Strategies: Added three new feature caching implementations: Taylor (using Taylor series approximation on intermediate block outputs), Ada (employing an adaptive skipping mechanism based on residual changes), and Custom (a different residual caching approach).
Scheduler Integration: Updated base scheduler (BaseScheduler) to manage caching state (caching_records, step_index) and added caching-specific scheduler classes (HunyuanSchedulerTeaCaching, HunyuanSchedulerTaylorCaching, HunyuanSchedulerAdaCaching, HunyuanSchedulerCustomCaching, WanSchedulerTeaCaching, WanSchedulerTaylorCaching, WanSchedulerAdaCaching, WanSchedulerCustomCaching) to coordinate with the new transformer inferrers.
Configuration and Scripts: Added new configuration files (configs/caching/*.json) and inference scripts (scripts/run_*.sh) to enable and demonstrate the new Ada and Custom caching strategies, and updated existing ones for Taylor and Tea caching.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly refactors the feature caching system by introducing base classes and specialized implementations for different caching strategies (Tea, Taylor, Ada, Custom) for both Hunyuan and WAN models. The changes improve modularity, consistency, and extensibility. Configuration files and run scripts have been updated accordingly. Key areas of improvement include standardized naming, clearer logic for caching decisions, and consistent resource management. Some minor points like a persistent typo and a commented-out import have been noted.

gemini-code-assist · 2025-06-15T05:21:33Z

lightx2v/models/networks/hunyuan/infer/feature_caching/transformer_infer.py

+    def __init__(self, config):
+        super().__init__(config)
+        # 1. fixed args
+        self.decisive_double_block_id = 10


Consider adding a comment to explain the origin of decisive_double_block_id = 10.

gemini-code-assist · 2025-06-15T05:21:33Z

lightx2v/models/networks/hunyuan/infer/feature_caching/transformer_infer.py

+        self.moreg_steps = [int(0.1 * config.infer_steps), int(0.9 * config.infer_steps)]
+        self.moreg_hyp = [0.385, 8, 1, 2]
+        self.mograd_mul = 10
+        self.spatial_dim = 3072


Consider adding a comment explaining the source or significance of spatial_dim = 3072.

gemini-code-assist · 2025-06-15T05:21:33Z

lightx2v/models/networks/hunyuan/infer/transformer_infer.py

+        super().__init__(config)
+
+    @torch.compile(disable=not CHECK_ENABLE_GRAPH_MODE())
+    def infer(self, weights, img, txt, vec, cu_seqlens_qkv, max_seqlen_qkv, freqs_cis, token_replace_vec=None, frist_frame_token_num=None):


The parameter name frist_frame_token_num appears to have a typo and should likely be first_frame_token_num.

gemini-code-assist · 2025-06-15T05:21:33Z

lightx2v/models/networks/wan/causvid_model.py

@@ -13,7 +13,8 @@
 from lightx2v.models.networks.wan.infer.causvid.transformer_infer import (
    WanTransformerInferCausVid,
 )
-from lightx2v.models.networks.wan.infer.feature_caching.transformer_infer import WanTransformerInferTeaCaching
+
+# from lightx2v.models.networks.wan.infer.feature_caching.transformer_infer import WanTransformerInferTeaCaching


If TeaCaching is intended to be supported for WanCausVidModel, the import for WanTransformerInferTeaCaching should be reinstated and the necessary integration completed.

Linboyan-trc added 30 commits April 30, 2025 19:32

function hunyuan_t2v_tea, hunyuan_t2v_taylorseer, modify the fresh_th…

e0b4cdc

…reshold of taylorseer

hunyuan i2v,t2v + tea,tay; wan i2v,t2v + tea function, add log files

7e2c87e

删除了TeaCace Scheduler的多余属性

9fb4249

Merge branch 'main' into 02_feature_caching, 修复了tea, taylor功能，可以正常运行

3654dfc

删除了多余目录

69e1377

修复了TeaCaching部分的bug,目前t2v, i2v feature caching均可跑通

7ecb387

Merge branch 'main' into 02_feature_caching

8308361

Merge remote-tracking branch 'upstream/main' into 02_feature_caching

99283c7

modify transformers

845e627

hunyuan transformer can run, but the result is not correct

d47a6a9

fix bug: infer double_block_2 should return img and txt

b5933f9

fix bug

e8bbb5b

hunyuan can run transformer, transformerTea

b78363b

hunyuan transformer, tea, taylor can run

3722103

finish HunyuanAdaCache

49548e1

add ada.sh and ada.json

339ce14

hunyuan ada can run

7b0d294

modify function distribution, and finish custom

4a067a9

add cpuoffload, rewriting offload for transformer_infer

5b1bc9b

add scripts and configs for hunyuan i2v caching

eafea2b

finish Wan2.1 TransformerInfer

e3fa388

Wan2.1 Transformer can run

f334970

finish Wan2.1 Tea

9bb6811

finish Wan2.1 Tea and import

a09985b

Wan2.1 Tea function

759626e

finish Wan2.1 Taylor

bf23919

refactor directory structure

7328c14

add decision list for WanScheduler

fc25210

finish modify Wan2.1 Tea

9b666dc

modify Wan2.1 Taylor

83f6222

Linboyan-trc added 28 commits June 11, 2025 15:53

bantch 1: after refactor, hunyuan t2v * 5 function

00c0385

uniform scripts, configs, save_results, in configs for hunyuan i2v * 5

25be044

remove redundant files

48234b8

remove redundant files

4fad181

hunyuan t2v * 5, i2v * 5 can function

1bf2f1a

test Wan2.1 t2v * 3

b5680c1

test Wan2.1 t2v * 3

36242bf

Wan2.1 t2v * 3 can function

ce8d9b2

finish Wan2.1 Ada

190136b

finish Wan2.1 Ada

87df0d1

finish Wan2.1 Ada

5f58bbe

finish Wan2.1 Custom

47a64b5

finish Wan2.1 Custom

3aa2592

finish Wan2.1 Custom

517ccac

test: Wan2.1 i2v * 5

692c452

test Wan2.1 it2 * 5

6756047

Wan2.1 i2v * 5 function, but taylor cant function because the memory …

67ac7e8

…limitation

finish clear

1f90e2d

modify annotations

af8007b

lint for AdaCaching

aec5007

lint for AdaCaching

5d39fb5

for pylint

4fda2b0

for pylint

dfb6ecf

finish Hunyuan transformer offload * 1

854c598

finish Wan2.1 transformer offload * 1

cfa2d1b

prepare scripts for test

22890ea

fix bug

17bb458

for merge

e2e8339

gemini-code-assist bot reviewed Jun 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev feature caching #68

Dev feature caching #68

Uh oh!

Linboyan-trc commented Jun 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 15, 2025

Uh oh!

gemini-code-assist bot Jun 15, 2025

Uh oh!

gemini-code-assist bot Jun 15, 2025

Uh oh!

gemini-code-assist bot Jun 15, 2025

Uh oh!

Uh oh!

Dev feature caching #68

Are you sure you want to change the base?

Dev feature caching #68

Uh oh!

Conversation

Linboyan-trc commented Jun 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!