Skip to content

Conversation

@stomfaig
Copy link
Contributor

This PR implements a more direct reference system in Graph to different node types (e.g. Input, Param etc.).

Some notes about the current version of the draft.

  1. GraphImporter still also only receives input and params in TensorMeta (see ambiguity of this below). I've left it this way, because in GraphImporter we do not expect to change this again, so there is no reason to introduce more complication, though the notation will be improved to reflect this.

  2. One point of friction here is that thorughout the frontend there seems to be two notions of tensor_meta, one being the type TensorMeta, the other being either a dict[str, Any], the other being dict[str, list[Any]]. For now I left some convenience script in for the sake of this draft, but let's discuss what the general direction with this should be.

Closes #639

@stomfaig
Copy link
Contributor Author

cc: @zhanghb97 @R-Tars

@stomfaig stomfaig changed the title feat: draft of revised node reference system [Frontend] Refactor how frontend handles node datatypes Dec 13, 2025
@stomfaig stomfaig marked this pull request as draft December 13, 2025 16:53
@R-Tars
Copy link
Collaborator

R-Tars commented Dec 18, 2025

Thanks for pointing this out — I agree it is an important issue.

For now, I don’t think we should rush to unify tensor_meta, as doing so would likely require changes across a large amount of existing operator and frontend code. Given the scope of this PR, deferring that work seems reasonable.

In the longer term, I do think converging on TensorMeta as the unified representation would be preferable, but this can be revisited once the new reference system stabilizes.

@stomfaig stomfaig marked this pull request as ready for review December 18, 2025 23:21
@R-Tars
Copy link
Collaborator

R-Tars commented Dec 23, 2025

I ran into a runtime issue when testing this PR locally with Torch 2.8. The build fails while generating the DeepSeek-R1 example, and the traceback points to eliminate_weight_transpose.py in eliminate_transpose, with the following error:

AttributeError: 'int' object has no attribute 'shape'

In this case, tensor_meta seems to be resolved as an int instead of the expected TensorMeta (or at least an object with a shape attribute), so this part likely needs to be fixed or unified in this PR.

@stomfaig
Copy link
Contributor Author

Thanks for pointing that out, I have been struggling to validate whether the change will cause any breakages.

I'll try to figure out something and maybe add frontend tests.

@R-Tars
Copy link
Collaborator

R-Tars commented Dec 23, 2025

Thanks for looking into this. Just to add some context: a recent commit has caused most models to fail at runtime, so the frontend is currently in a rather unstable state. We are currently trying to address this issue. As a result, even if this specific issue is fixed, models may still not run correctly at the moment.

@R-Tars
Copy link
Collaborator

R-Tars commented Jan 7, 2026

It seems that this PR has not been updated for some time. I pulled the latest changes today and tested them locally against the current main branch. At the moment, the DeepSeek-R1 example still cannot run successfully, so the runtime issue does not appear to be fully resolved yet.

For this PR to be ready for merging, I think a minimum requirement is that the DeepSeek-R1 example can run correctly from start to finish. Currently, this is still not the case.

If you encounter any difficulties fixing the issue, please feel free to let me know — I am happy to help with debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor how frontend handles node datatypes

3 participants