Skip to content

SDSTOR-22421: shard superblk backward compat for log-only shard refactor#438

Open
Hooper9973 wants to merge 3 commits into
eBay:dev/refactor_create_seal_shardfrom
Hooper9973:compatibiltiy_adaption
Open

SDSTOR-22421: shard superblk backward compat for log-only shard refactor#438
Hooper9973 wants to merge 3 commits into
eBay:dev/refactor_create_seal_shardfrom
Hooper9973:compatibiltiy_adaption

Conversation

@Hooper9973

Copy link
Copy Markdown
Contributor
  • Migrate v2 shard superblk (sb_version=0x02) to v3 on recovery; unify decode logic into decode_shard_sb() shared by raft-log path and on_shard_meta_blk_found

  • Decode old shard raft-log messages (v2 header_extn) on new code path; extend ResyncShardMetaData schema with sealed_lsn at end for old-buffer compat

  • Override get_blk_alloc_hint for CREATE/SEAL_SHARD to return committed-blk hint, skipping data-block allocation for log-only shard messages

@codecov-commenter

codecov-commenter commented Jun 23, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 56.52174% with 30 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (dev/refactor_create_seal_shard@0644c35). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/lib/homestore_backend/hs_shard_manager.cpp 67.30% 15 Missing and 2 partials ⚠️
...ib/homestore_backend/replication_state_machine.cpp 0.00% 7 Missing ⚠️
src/lib/homestore_backend/hs_http_manager.cpp 0.00% 6 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@                        Coverage Diff                        @@
##             dev/refactor_create_seal_shard     #438   +/-   ##
=================================================================
  Coverage                                  ?   52.64%           
=================================================================
  Files                                     ?       36           
  Lines                                     ?     5345           
  Branches                                  ?      672           
=================================================================
  Hits                                      ?     2814           
  Misses                                    ?     2243           
  Partials                                  ?      288           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Hooper9973 Hooper9973 force-pushed the compatibiltiy_adaption branch from a4ebeef to 5122fce Compare June 24, 2026 10:51
- Migrate v2 shard superblk (sb_version=0x02) to v3 on recovery; unify decode
logic into decode_shard_sb() shared by raft-log path and on_shard_meta_blk_found

- Decode old shard raft-log messages (v2 header_extn) on new code path; extend
ResyncShardMetaData schema with sealed_lsn at end for old-buffer compat

- Override get_blk_alloc_hint for CREATE/SEAL_SHARD to return committed-blk hint,
skipping data-block allocation for log-only shard messages
@Hooper9973 Hooper9973 force-pushed the compatibiltiy_adaption branch from 5122fce to 91fbdd8 Compare June 24, 2026 10:52

@xiaoxichen xiaoxichen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm in general

// Handles all supported on-disk versions. Returns nullopt for unknown versions.
static std::optional< std::tuple< ShardInfo, homestore::chunk_num_t, homestore::chunk_num_t > >
decode_shard_sb(const HSHomeObject::shard_info_superblk* sb) {
if (sb->sb_version >= HSHomeObject::shard_info_superblk::shard_sb_version) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the situation of sb->sb_version > HSHomeObject::shard_info_superblk::shard_sb_version ? older version read data written by newer version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sb_version > shard_sb_version happens when older SM receive new SM's msg (rolling upgrade). With >=, old code silently parses unknown future struct layout — dangerous if new version changed field offsets or types. Returning nullopt here causes RELEASE_ASSERT at callers, which is more safe than slient unexpected error.
Changed to ==.

State state;
uint64_t lsn; // created_lsn
uint64_t sealed_lsn{INT64_MAX}; // sealed_lsn
uint64_t lsn; // created_lsn

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename it as create_lsn?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Old format: shard message carried shard header/footer data blocks. Return committed_blk_id
// so HomeStore skips block allocation and data write. Shard on_commit will ignore pbas.
homestore::blk_alloc_hints hints;
hints.committed_blk_id = homestore::MultiBlkId{homestore::BlkId{0, 1, 0xFFFF}};

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this as a global constant (like tombstone_pba) so other pieces of code is easier to check/assert based on this value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to tombstone_pba directly.
Although chunk_id seems hardly to be the 0xFFFF, but tombstone_pba is more safe.
And when rollback mesg, homestore will free the blk, and tombstone_pba's blk_count is 0 which is more safe.

Comment thread src/lib/homestore_backend/tests/hs_shard_tests.cpp
Comment thread src/lib/homestore_backend/replication_state_machine.cpp
void HSHomeObject::on_shard_meta_blk_found(homestore::meta_blk* mblk, sisl::byte_view buf) {
// First peek at the version
auto* header = reinterpret_cast< const DataHeader* >(buf.bytes());
RELEASE_ASSERT(header->version == DataHeader::data_header_version, "Unknown shard superblock DataHeader version {}",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use valid() here, or add a clear marker so this if can be removed when we upgrade the data version later? This assert will likely fail during future data version updates, and the original context may be lost by then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

Comment on lines +358 to +361
// Simulate the actual bug: first v2 shard has incorrect type field set to BLOB_INFO
if (shard_id == shard_ids[0]) {
old_sb->type = HSHomeObject::DataHeader::data_type_t::BLOB_INFO; // Bug scenario
LOGINFO("Created v1 shard {} with BLOB_INFO type (bug scenario)", shard_id);
LOGINFO("Created v2 shard {} with BLOB_INFO type (bug scenario)", shard_id);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is already fixed, let's remove this part.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@JacksonYao287 JacksonYao287 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants