what does the variable “is_recompute” designed for ？

Hi,  thank you for your contribution to the community.  I have learned a lot from your wonderful work!

After reading your codes below 
https://github.com/NVlabs/LongLive/blob/a4a0c87f26733d525e7d540bf35683f6eb82b99a/wan/modules/causal_model.py#L229

I would like to ask you about some questions about KV cache update in `causal_model.py`, they looks a little similar to Self-Forcing, but actually quite different with modified design.

Especially, I would like to know what  the variable  `is_recompute `  is designed for ? 

I know torch.utils.checkpoint.checkpoint is required for training, but directly applying it seems may cause the tensor mismatch between the original forward process and activation recomputing process. For example, I tried to update the kv cache within the self attention module without using the cache_update_info, then got errors like `torch.utils.checkpoint.CheckpointError: torch.utils.checkpoint: Recomputed values for the following tensors have different metadata than during the forward pass. ` 

I wonder could you please briefly explain your design to solve (1) the KV cache over-writing conflict across timesteps and (2) potential tensor shape mismatch during gradient checkpoint activation recomputing for me?  Your answer will mean a lot to me !

Thank you very much for your kind reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what does the variable “is_recompute” designed for ？ #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

what does the variable “is_recompute” designed for ？ #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions