Problem / Motivation
tpl commit currently creates an object with kind=template, but the committed template appears to have a ready replica only on the origin node where the source sandbox was running.
This differs from templates created through create-from-image, which are distributed to healthy target nodes and can then be scheduled cluster-wide.
In a multi-node cluster, this makes the scheduling semantics of tpl commit templates unclear:
- a template created by
tpl commit can only create new sandboxes on the origin node
- other healthy nodes do not appear to receive the committed rootfs data
- the final object is still
kind=template, so it looks like an ordinary reusable template, but does not behave like templates created through create-from-image
I would like to confirm whether this single-origin behavior is intentional, or whether templates created by tpl commit are expected to be distributed like other templates.
Current Behavior
In a multi-node cluster:
- A sandbox is running on one node.
tpl commit is used to commit that sandbox into a template.
- The resulting object is created as
kind=template.
- The committed rootfs remains available only on the origin node.
- New sandboxes based on that template cannot be scheduled normally across other healthy nodes unless the rootfs is replicated manually outside CubeSandbox.
Expected Behavior / Question
Should a template created by tpl commit behave like an ordinary distributed template?
In other words, after the sandbox is committed on the origin node, should the committed rootfs be distributed to healthy target nodes using the normal template replica/status lifecycle?
Proposed Solution
My current implementation direction is:
- Keep
tpl commit producing kind=template.
- After
runTemplateCommitJob commits the source sandbox on the origin node, export the committed rootfs as a standard ext4 rootfs artifact.
- Register that ext4 file as a
RootfsArtifact.
- Reuse the existing
distributeRootfsArtifact flow used by create-from-image.
- Reuse the normal template replica/status lifecycle so the committed template is distributed to healthy target nodes instead of staying single-node.
This would keep the committed object on the ordinary template path:
- normal template status and replica reporting
- normal delete and reconcile behavior
- failed-node retries through the existing redo flow
The change I am considering is intentionally scoped to tpl commit only. It does not try to change snapshot semantics or redesign create-from-image build placement in the same PR.
Alternatives Considered
- Keep
tpl commit as a single-node template.
This preserves the current behavior, but it makes kind=template semantics inconsistent with the rest of the template system and makes committed templates much less useful in multi-node clusters.
- Treat
tpl commit output more like a snapshot than a template.
This would align better with single-node locality, but it would be a larger semantic change because the current API already creates kind=template. It would likely need broader lifecycle and UX discussion.
- Add a new direct node-to-node or origin-node download path.
This seems less attractive than reusing RootfsArtifact, because CubeSandbox already has a standard artifact distribution flow with status tracking and retry behavior.
Additional Context
Relevant code paths:
CubeMaster/pkg/templatecenter/template_commit.go
CubeMaster/pkg/templatecenter/template_image.go
Maintainer Feedback Requested
Should tpl commit templates be distributed to healthy target nodes like create-from-image templates, or is the current single-origin behavior intentional?
If distributed behavior is preferred, I plan to continue with the artifact-based approach above and keep the change scoped to tpl commit template distribution only.
Problem / Motivation
tpl commitcurrently creates an object withkind=template, but the committed template appears to have a ready replica only on the origin node where the source sandbox was running.This differs from templates created through
create-from-image, which are distributed to healthy target nodes and can then be scheduled cluster-wide.In a multi-node cluster, this makes the scheduling semantics of
tpl committemplates unclear:tpl commitcan only create new sandboxes on the origin nodekind=template, so it looks like an ordinary reusable template, but does not behave like templates created throughcreate-from-imageI would like to confirm whether this single-origin behavior is intentional, or whether templates created by
tpl commitare expected to be distributed like other templates.Current Behavior
In a multi-node cluster:
tpl commitis used to commit that sandbox into a template.kind=template.Expected Behavior / Question
Should a template created by
tpl commitbehave like an ordinary distributed template?In other words, after the sandbox is committed on the origin node, should the committed rootfs be distributed to healthy target nodes using the normal template replica/status lifecycle?
Proposed Solution
My current implementation direction is:
tpl commitproducingkind=template.runTemplateCommitJobcommits the source sandbox on the origin node, export the committed rootfs as a standard ext4 rootfs artifact.RootfsArtifact.distributeRootfsArtifactflow used bycreate-from-image.This would keep the committed object on the ordinary template path:
The change I am considering is intentionally scoped to
tpl commitonly. It does not try to change snapshot semantics or redesigncreate-from-imagebuild placement in the same PR.Alternatives Considered
tpl commitas a single-node template.This preserves the current behavior, but it makes
kind=templatesemantics inconsistent with the rest of the template system and makes committed templates much less useful in multi-node clusters.tpl commitoutput more like a snapshot than a template.This would align better with single-node locality, but it would be a larger semantic change because the current API already creates
kind=template. It would likely need broader lifecycle and UX discussion.This seems less attractive than reusing
RootfsArtifact, because CubeSandbox already has a standard artifact distribution flow with status tracking and retry behavior.Additional Context
Relevant code paths:
CubeMaster/pkg/templatecenter/template_commit.goCubeMaster/pkg/templatecenter/template_image.goMaintainer Feedback Requested
Should
tpl committemplates be distributed to healthy target nodes likecreate-from-imagetemplates, or is the current single-origin behavior intentional?If distributed behavior is preferred, I plan to continue with the artifact-based approach above and keep the change scoped to
tpl committemplate distribution only.