Skip to content

feat(wal): add segmented write-ahead log#6

Open
windsornguyen wants to merge 1 commit into
mainfrom
feat/wal
Open

feat(wal): add segmented write-ahead log#6
windsornguyen wants to merge 1 commit into
mainfrom
feat/wal

Conversation

@windsornguyen

Copy link
Copy Markdown
Owner

Pull Request

Linear Issue

N/A

Summary

What: Adds cloud9-wal, a small segmented write-ahead log crate with typed opaque records, explicit sync, checksummed headers/payloads, rotation, and recovery by truncating only incomplete tails.

Why: Raft needs a durable storage primitive that is simpler than a database and stricter than ad hoc file writes. The WAL keeps byte durability separate from Raft, SQL, and KV semantics.

Lines added: +727

Test Plan

  • cargo test -p cloud9-wal -- --nocapture
  • cargo clippy -p cloud9-wal --all-targets -- -D warnings
  • cargo test --workspace -- --nocapture
  • cargo clippy --workspace --all-targets -- -D warnings

Repro / Showcase

N/A. Tests cover append/reopen, segment rotation, incomplete-tail truncation, checksum corruption, missing segments, reserved kinds, and oversized records.

Tests Added

  • Unit tests
  • Integration tests
  • E2E tests

Documentation

N/A

Notes for Reviewers

The crate intentionally owns only byte durability. Higher layers should encode Raft hard state, log entries, truncations, and snapshots as caller-owned record payloads.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf17cfbdbd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cloud9-wal/src/wal.rs
Comment on lines +99 to +100
self.active_id = self.active_id.checked_add(1).ok_or(WalError::SegmentIdExhausted)?;
self.active = segment::open_segment(&self.dir, self.active_id)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Delay active_id mutation until new segment opens

If segment::open_segment fails during rotation (e.g., ENOSPC or permission error), active_id has already been incremented, so the Wal instance is left internally inconsistent: it still holds the old file handle but now reports a different segment id for future errors/LSNs and can skip ids on retry. This turns a transient I/O failure into persistent state corruption for the in-memory writer.

Useful? React with 👍 / 👎.

Comment thread cloud9-wal/src/wal.rs
Comment on lines +56 to +59
self.active.write_all(&encoded).map_err(|source| {
WalError::io(segment::segment_path(&self.dir, self.active_id), source)
})?;
self.active_len = checked_add_len(self.active_len, encoded.len())?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle partial writes before returning append errors

write_all may return an error after writing part of the buffer, but active_len is only advanced on success. In disk-full/intermittent I/O scenarios, a failed append can still leave a partial record on disk while in-memory offset stays stale, so later successful appends get incorrect LSNs and recovery truncates at the earlier incomplete tail, potentially discarding records written after the failure.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant