SDSTOR-22421: shard superblk backward compat for log-only shard refactor#438
SDSTOR-22421: shard superblk backward compat for log-only shard refactor#438Hooper9973 wants to merge 3 commits into
Conversation
746acd4 to
a4ebeef
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev/refactor_create_seal_shard #438 +/- ##
=================================================================
Coverage ? 52.64%
=================================================================
Files ? 36
Lines ? 5345
Branches ? 672
=================================================================
Hits ? 2814
Misses ? 2243
Partials ? 288 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
a4ebeef to
5122fce
Compare
- Migrate v2 shard superblk (sb_version=0x02) to v3 on recovery; unify decode logic into decode_shard_sb() shared by raft-log path and on_shard_meta_blk_found - Decode old shard raft-log messages (v2 header_extn) on new code path; extend ResyncShardMetaData schema with sealed_lsn at end for old-buffer compat - Override get_blk_alloc_hint for CREATE/SEAL_SHARD to return committed-blk hint, skipping data-block allocation for log-only shard messages
5122fce to
91fbdd8
Compare
| // Handles all supported on-disk versions. Returns nullopt for unknown versions. | ||
| static std::optional< std::tuple< ShardInfo, homestore::chunk_num_t, homestore::chunk_num_t > > | ||
| decode_shard_sb(const HSHomeObject::shard_info_superblk* sb) { | ||
| if (sb->sb_version >= HSHomeObject::shard_info_superblk::shard_sb_version) { |
There was a problem hiding this comment.
what would be the situation of sb->sb_version > HSHomeObject::shard_info_superblk::shard_sb_version ? older version read data written by newer version?
There was a problem hiding this comment.
sb_version > shard_sb_version happens when older SM receive new SM's msg (rolling upgrade). With >=, old code silently parses unknown future struct layout — dangerous if new version changed field offsets or types. Returning nullopt here causes RELEASE_ASSERT at callers, which is more safe than slient unexpected error.
Changed to ==.
| State state; | ||
| uint64_t lsn; // created_lsn | ||
| uint64_t sealed_lsn{INT64_MAX}; // sealed_lsn | ||
| uint64_t lsn; // created_lsn |
There was a problem hiding this comment.
maybe rename it as create_lsn?
| // Old format: shard message carried shard header/footer data blocks. Return committed_blk_id | ||
| // so HomeStore skips block allocation and data write. Shard on_commit will ignore pbas. | ||
| homestore::blk_alloc_hints hints; | ||
| hints.committed_blk_id = homestore::MultiBlkId{homestore::BlkId{0, 1, 0xFFFF}}; |
There was a problem hiding this comment.
make this as a global constant (like tombstone_pba) so other pieces of code is easier to check/assert based on this value.
There was a problem hiding this comment.
changed it to tombstone_pba directly.
Although chunk_id seems hardly to be the 0xFFFF, but tombstone_pba is more safe.
And when rollback mesg, homestore will free the blk, and tombstone_pba's blk_count is 0 which is more safe.
| void HSHomeObject::on_shard_meta_blk_found(homestore::meta_blk* mblk, sisl::byte_view buf) { | ||
| // First peek at the version | ||
| auto* header = reinterpret_cast< const DataHeader* >(buf.bytes()); | ||
| RELEASE_ASSERT(header->version == DataHeader::data_header_version, "Unknown shard superblock DataHeader version {}", |
There was a problem hiding this comment.
Could you use valid() here, or add a clear marker so this if can be removed when we upgrade the data version later? This assert will likely fail during future data version updates, and the original context may be lost by then.
| // Simulate the actual bug: first v2 shard has incorrect type field set to BLOB_INFO | ||
| if (shard_id == shard_ids[0]) { | ||
| old_sb->type = HSHomeObject::DataHeader::data_type_t::BLOB_INFO; // Bug scenario | ||
| LOGINFO("Created v1 shard {} with BLOB_INFO type (bug scenario)", shard_id); | ||
| LOGINFO("Created v2 shard {} with BLOB_INFO type (bug scenario)", shard_id); |
There was a problem hiding this comment.
The bug is already fixed, let's remove this part.
Migrate v2 shard superblk (sb_version=0x02) to v3 on recovery; unify decode logic into decode_shard_sb() shared by raft-log path and on_shard_meta_blk_found
Decode old shard raft-log messages (v2 header_extn) on new code path; extend ResyncShardMetaData schema with sealed_lsn at end for old-buffer compat
Override get_blk_alloc_hint for CREATE/SEAL_SHARD to return committed-blk hint, skipping data-block allocation for log-only shard messages