Skip to content

Enrich Forest Archive with receipts/events/tipset mapping #6802

@LesnyRumcajs

Description

@LesnyRumcajs

Summary

There's an ask from the ecosystem to enrich the Forest Archive with additional information:

  • events,
  • message receipts,
  • epoch to tipset mapping.

Based on recent snapshots, the disk space overhead would be minimal.

Details

Events and message receipts

# calibnet:
❯ forest-tool archive info lite_2000_3580931.forest.car.zst
CAR format:       ForestCARv1.zst
Snapshot version: 1
Network:          calibnet
Epoch:            3580931
State-roots:      2000
Messages sets:    2000
Receipts:         49662
Receipts size:    3 MiB
Events:           70489
Events size:      19.6 MiB
Head Tipset:      bafy2bzaceb3mmhd7nzfr6iybwtx7yskpca4khenx6w4swd6aii5er7jxj2ml2
                  bafy2bzacecbc4kdarzogsylixrzxq5hoo7j73b4n25nivbdmshlq5zvq55dlo
                  bafy2bzaceacjm5kyd6ye43ssvg3wprgxnbjvowwlagwzt3dcl6zu6qcqpkr4y
Index size:       356.8 MiB

# mainnet
❯ forest-tool archive info lite_2000_5881637.forest.car.zst
CAR format:       ForestCARv1.zst
Snapshot version: 1
Network:          mainnet
Epoch:            5881637
State-roots:      2000
Messages sets:    2000
Receipts:         32145
Receipts size:    1.4 MiB
Events:           78323
Events size:      14 MiB
Head Tipset:      bafy2bzaceczzxkwgk57humvkuvmrdcaofjmyqq73rdpuox4oo5qzliabgbhd2
                  bafy2bzacecdkeuaizhozax2w2xzw36oaaoldcuglwoyxyrzrvssgd2roqli7e
                  bafy2bzacea4lzsq6bw7nndrjveemixa7j5qnrkhfr6tqva4ky3tvhobgt57g6
                  bafy2bzacebkgc26qitf7rhjn5a2coega7aegshtnc6i5konfglg6dfolvf6k4
Index size:       1.91 GiB

This translates to roughly ~1.4 MiB for receipts and ~14 MiB for events (uncompressed) on mainnet for 2000 epochs.

Warning

Forest won't be able to backfill receipts and events for the entire archive; only post-FVM epochs. This shouldn't be a big deal but shout if I'm wrong. The epoch-to-tipset mapping does not have this limitation as it only requires chain data, not state recomputation.

Backfilling receipts and events only post-FVM (Skyr @ 1960320) so ~4M epochs would total to ~3 GiB for receipts and ~30 GiB for events (assuming roughly the same load).

Epoch to tipset key mapping

Estimated size: 180–224 MiB uncompressed on mainnet (assuming 4 blocks/tipset over 5.9M epochs).

Based on experiments in #6827, it'd be 1 GiB compressed. An alternative is to have a skip list - with skip length of 10 we'd end at ~158 MiB on mainnet.

Additional considerations:

  • this mapping must be created strictly for finalized tipsets
  • this could replace the existing checkpoints mechanism in Forest (which is effectively a poor-man's version of this mapping)

This would allow for faster epoch-to-tipset lookups.

Approach

Backfilling

Theoretically, we could backfill this data in existing diff snapshots but this would be cumbersome. A better approach is to create additional diff CARs that only contain the extra data (as suggested by ribasushi), rather than replacing the ~20 TB archive. This could be facilitated with a subcommand that would wrap this logic.

It probably doesn't make sense to add the epoch-to-tipset mapping for archival snapshots; perhaps we could add it to the newly generated lite snapshots.

Latest snapshots

Receipts and events could be added to latest snapshots without any format changes. The disk space overhead will be minimal (<20 MiB for mainnet snapshot). Initial snapshots might not have all events, but eventually (after ~1 day) they should all be there. Lotus import with receipts and events has been confirmed to work.

Epoch-to-tipset-key mapping would live under a new field in snapshot format v2 header (see https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0108.md#v2-specification). This change is backwards compatible.

/// Defined in <https://github.com/filecoin-project/FIPs/blob/98e33b9fa306959aa0131519eb4cc155522b2081/FRCs/frc-0108.md#snapshotmetadata>
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, derive_more::Constructor)]
#[serde(rename_all = "PascalCase")]
pub struct FilecoinSnapshotMetadata {
/// Snapshot version
pub version: FilecoinSnapshotVersion,
/// Chain head tipset key
pub head_tipset_key: NonEmpty<Cid>,
/// F3 snapshot `CID`
pub f3_data: Option<Cid>,
}

Additional Links & Resources

Discussion in Slack https://filecoinproject.slack.com/archives/C027XAH72TD/p1773397075551599

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: TaskDiscrete task to implement

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions