Skip to content

Conversation

dapplion
Copy link
Collaborator

Issue Addressed

@jimmygchen reported an issue:

consistency with serving column by root requests - observed behaviour: when we send a single root, the node returns, but when I send roots for the entire epoch, it misses some slots

Looking at the code, the only suspicious line is this one:

.filter(|c| available_columns.contains(c))

where we ignore requested data columns if the CGC has changed recently; however, I'm not sure if validator custody is in effect in your tests?

I thought it would be very helpful for the node to self-diagnose and check if there are discrepancies between what the peer requests from us, what we are supposed to have, and what we actually have.

Proposed Changes

Self-diagnose why we don't return the requested columns in a data_columns_by_root request. The outcomes are:

  • Sent: block known, block has data, we custody the requested index, and the column is found in the store
  • requested_unknown_block_root: requested a block root not found in our caches or the store
  • requested_block_without_data: requested a known block, but it has 0 blobs
  • requested_missing_columns: requested known block with data that we custod,y but columns not in store missing_columns
  • requested_pending_block: requested known block, columns not found in store, but block is not imported yet
  • requested_outside_da_check: requested columns not found in store, and their epoch is outside the PeerDAS da_window
  • requested_we_dont_custody: requested column index we don't custody at the block's epoch (requested_indices_we_dont_custody tracks which specific indices)

{
// If the block is imported (check done **before** attempting to
// read columns from the store) the columns should be there.
requested_missing_columns += 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful to record the root and the column index for this case for debugging

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a lot of debugging noise in production code, is it not sufficient to just log the columns we're expected but unable to serve?

  • In the case of requested_unknown_block_root, we might be able to serve from the da checker, so I don't think it's right to skip it. We should serve them as long as the column is gossip verified.
  • chain.get_blinded_block is not cached, so we might read the database for up to 128 times per request - this may not be huge, but we could just prioritise serving from da checker without this extra cost.

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a lot of debugging noise in production code, is it not sufficient to just log the columns we're expected but unable to serve?

I agree with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants