Skip to content

Conversation

@avnyu
Copy link
Contributor

@avnyu avnyu commented Nov 12, 2025

What does this PR try to resolve?

My attempt to continue #10279.

This:

  • Moves the code for creating git db into GitSource::fetch_db
  • Creates GitSource for each submodules
  • Replaces fetch inside update_submodule by GitSource::fetch_db and db.copy_to
  • Removes recursive update_submodules calls cos db.copy_to already recursive.

Fixes #7987.

How to test and review this PR?

I tested using the original pull method:

~/.cargo/target/debug/cargo update -p boring --precise 46787b7b6909cadf81cf3a8cd9dc351c9efdfdbd
~/.cargo/target/debug/cargo update -p boring --precise c037a438f8d7b91533524570237afcfeffffe496

and confirmed that the time to do the second update is negligible.
Also test if it can fetch submodule offline using the downloaded git db

git clone https://github.com/pop-os/cosmic-files && cd cosmic-files
~/.cargo/target/debug/cargo fetch --locked --target=$(rustc --print host-tuple)
rm -rf ~/.cargo/git/checkout ~/.cargo/target/debug/cargo fetch --locked --target=$(rustc --print host-tuple) --offline

@rustbot rustbot added A-git Area: anything dealing with git S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 12, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 12, 2025

r? @weihanglo

rustbot has assigned @weihanglo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@epage
Copy link
Contributor

epage commented Nov 12, 2025

It would help reviewers if you break this down into atomic commits. For example, by making a commit for extracting update_db, it becomes clear whether it was purely an extraction or there are also other changes.

See https://doc.crates.io/contrib/process/working-on-cargo.html#submitting-a-pull-request

@rustbot

This comment has been minimized.

@avnyu avnyu force-pushed the cached-submodules branch from 70350b1 to ffd78e6 Compare November 12, 2025 17:52
@rustbot

This comment has been minimized.

@avnyu
Copy link
Contributor Author

avnyu commented Nov 12, 2025

Done, the "extract code into update_db" part is a single commit now. I think the remains part needs to be in the same commit

Comment on lines 472 to 475
.with_context(|| {
let name = child.name().unwrap_or("");
format!("failed to fetch submodule `{name}` from {child_remote_url}",)
})?;
Copy link
Contributor

@epage epage Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should have this also on update_db and copy_to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried and the testsuite failed. Should I also adjust testsuite more? I do try to keep it unchanged as much as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about put it only after db.copy_to? I tested and it do pass the testsuite without further modification. The original fetch was replaced by (gitsource create, update_db and db.copy_to), so put the error context to after db.copy_to seem enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aeda351
Done, and adjust the testsuite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See an alternative in #16246 (comment)

SourceId::from_url(&format!("git+{child_remote_url}#{head}"))?,
gctx,
RemoteKind::GitDependency,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What impact does this have on our progress bars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly the same? The old fetch will have a single progress bar, which now changes into a progress bar for update_db and a brief progress bar after for db.copy. This progress bars behavior should be the same as when update git source. Most of the time I don't even notice the brief one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ea4eb86 should fix when submodule database exist but still print "updating submodule"

@rustbot

This comment has been minimized.

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

We should have tests for this new caching behavior, especially for nested submodules. See https://doc.crates.io/contrib/process/working-on-cargo.html#making-a-change for how, and also mind the atomic commit pattern (adding test first capturing the existing behavior, and the fix commit showing behavior change through test diffs)

View changes since this review

&child_remote_url,
&reference,
let mut source = GitSource::new(
SourceId::from_url(&format!("git+{child_remote_url}#{head}"))?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of constructing URL manually, we have a SourceId::for_git for the purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a474768, I notice the GitSource created by for_git always has precise=None, result to locked_rev being Deferred, so fetch_db always try to fetch if online. Is it appropriate to use with_git_precise for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Yeah would be good if we could add a comment why we need to have with_git_precise there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

02b8bf9 Done.

let name = child.name().unwrap_or("");
format!("failed to fetch submodule `{name}` from {child_remote_url}",)
})?;
guard.mark_ok()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkout is marked ready then we recurse into submodules. I feel like that means the checkout is not ready yet.

Would this situation happens? _User cancels the operation before recursing to nested submodules. They later they dependency on this submodule directly and Cargo assumes it is fresh so never recurse into nested submodules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlook when inline db.copy_to, my bad. But now the code almost the same as db.copy_to, with db.copy has additional check for freshness, should I just use db.copy_to instead for reduce code dups?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if that works (does look like will work).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I need to push this myself? #16246 (review) If you can do it, please do. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am offering reorder commits, not writing a fix. Would appreciate if you can help write the fix :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

02b8bf9 Done.

@weihanglo
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 9, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 9, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@avnyu avnyu force-pushed the cached-submodules branch from 8705c44 to a474768 Compare December 10, 2025 03:54
@rustbot
Copy link
Collaborator

rustbot commented Dec 10, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@avnyu
Copy link
Contributor Author

avnyu commented Dec 10, 2025

Sorry for the late response, I force-pushed the commits. About new tests, if I am not mistaking, it should be only about the git db cache, not the correctness of git repo checkouts, as it should be already covered? Currently I can't find api for manipulating the cached git db, or example about testing the git cache db

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like. Let me know if you plan to reorganize the commits, or I can do that for you. Thanks!

View changes since this review

let name = child.name().unwrap_or("");
format!("failed to fetch submodule `{name}` from {child_remote_url}",)
})?;
guard.mark_ok()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if that works (does look like will work).

&child_remote_url,
&reference,
let mut source = GitSource::new(
SourceId::from_url(&format!("git+{child_remote_url}#{head}"))?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Yeah would be good if we could add a comment why we need to have with_git_precise there.

@avnyu
Copy link
Contributor Author

avnyu commented Dec 13, 2025

Overall looks like. Let me know if you plan to reorganize the commits, or I can do that for you. Thanks!

View changes since this review

If you can do it, please do. Thanks!

@avnyu avnyu force-pushed the cached-submodules branch from a474768 to fcb41ba Compare December 14, 2025 14:15
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avnyu I rearranged the commits to reflect behavior change in the fix commit. See 4f02e47.

Thanks for working with us!

View changes since this review

@weihanglo weihanglo enabled auto-merge December 14, 2025 15:39
@weihanglo weihanglo added this pull request to the merge queue Dec 14, 2025
Merged via the queue into rust-lang:master with commit 0101bde Dec 14, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-git Area: anything dealing with git S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

git submodules are not cached

4 participants