Skip to content

Conversation

@timwu20
Copy link

@timwu20 timwu20 commented Dec 6, 2025

Description

Introduces Speculative Availability chunk requests within the Availability Distribution subsystem. In Availability Distibution subsystem we listen on ActiveLeavesUpdate and call Prospective Parachains to gather backable candidates that we can start fetching their erasure encoded chunks from the backing group before the candidates are actually backed on chain. This feature is currently enabled by running the node with the --speculative-availability CLI flag.

Review Notes

Note: This PR is a reimplementation of PR#9444 with less changes to the handling of fetch tasks, overall less diffs, and added CLI flag to enable feature.

  • Moves request_backale_candidates out of Provisioner subsystem to subystem-util crate to be used as well in Availability Distribution subsystem.
  • Introduces CoreInfo and CoreInfoOrigin private types in availability-distribution::requester module. CoreInfo is used to create fetch tasks now instead of using available cores directly. We construct CoreInfo instances from calling Prospective Parachains for backable candidates and we denote the origin of these CoreInfo instances with CoreInfo::Scheduled. CoreInfo instances created from candidates that have been backed on chain have CoreInfo::Occupied origin.
  • Modifies Availability Store subsystem to accept new AvailabilityStoreMessage::NoteBackableCandidates message type.
    • This new message type is handled by the Availability Store subsystem to write the meta data to pass validation when accepting the actual chunk for the associated candidate hash. It was discovered during development, that storing the chunk from scheduled/early fetch requests were not being persisted due to not having the meta data previously stored for said candidate.
  • Includes Zombienet test that utilizes two parachains which support elastic scaling. Assertions are done on the polkadot_parachain_fetched_chunks_total[origin="scheduled"] metric.
  • --speculative-availability CLI flag support.

TODO

  • Benchmark with speculative availability enabled.

Checklist

  • My PR includes a detailed description as outlined in the "Description" and its two subsections above.
  • My PR follows the labeling requirements of this project (at minimum one label for T required)
    • External contributors: Use /cmd label <label-name> to add labels
    • Maintainers can also add labels manually
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)

@timwu20 timwu20 marked this pull request as draft December 6, 2025 05:20
@cla-bot-2021
Copy link

cla-bot-2021 bot commented Dec 6, 2025

User @timwu20, please sign the CLA here.

@timwu20
Copy link
Author

timwu20 commented Dec 6, 2025

/cmd prdoc

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2025

Command "prdoc" has failed ❌! See logs here

@timwu20
Copy link
Author

timwu20 commented Dec 6, 2025

/cmd label T0-node T8-polkadot

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2025

Command "label T0-node T8-polkadot" has failed ❌! See logs here

@timwu20 timwu20 marked this pull request as ready for review December 7, 2025 20:32
Copy link
Contributor

@haikoschol haikoschol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just some comment/logging nits

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick first pass, will have a closer look tomorrow.

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one edge case as described by Axay on his PR: If the core does not immediately get occupied on the next leaf, we would drop the task and refetch the same thing if the core gets occupied later.

Just highlighting mostly, maybe worth documenting that this is a know limitiation. A fix is not terribly hard, but likely still overkill. (Fix being to look into the ancestry, just as we do with occupied cores - the tricky part is just to keep it cheap as fetching all the data from other subystems is relatively heavy ... but yeah, not worth it. Let's just add a comment describing this limitation.)


if let Some(session_info) = session_info {
let num_validators =
session_info.validator_groups.iter().fold(0usize, |mut acc, group| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session info also controls the full list of validators, no reason to accumulate group counts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just lifted this from earlier in the file here. Is there another way to get the number of validators from the localized polkadot_availability_distribution::requester::session_cache::SessionInfo (link)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added num_validators attribute in localized SessionInfo in fb004b2.

@github-actions
Copy link
Contributor

Review required! Latest push from author must always be reviewed

@timwu20 timwu20 requested a review from eskimor December 12, 2025 16:58
@timwu20
Copy link
Author

timwu20 commented Dec 19, 2025

There is one edge case as described by Axay on his PR: If the core does not immediately get occupied on the next leaf, we would drop the task and refetch the same thing if the core gets occupied later.

Just highlighting mostly, maybe worth documenting that this is a know limitiation. A fix is not terribly hard, but likely still overkill. (Fix being to look into the ancestry, just as we do with occupied cores - the tricky part is just to keep it cheap as fetching all the data from other subystems is relatively heavy ... but yeah, not worth it. Let's just add a comment describing this limitation.)

Added a note about this limitation in fb004b2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants