Skip to content

feat(jepsen): add Cloud9 KV correctness harness#5

Open
windsornguyen wants to merge 2 commits into
mainfrom
feat/jepsen-helpers
Open

feat(jepsen): add Cloud9 KV correctness harness#5
windsornguyen wants to merge 2 commits into
mainfrom
feat/jepsen-helpers

Conversation

@windsornguyen

Copy link
Copy Markdown
Owner

Pull Request

Linear Issue

N/A

Summary

What: Adds a typed Cloud9 KV protocol surface, a runnable node/server path, and a minimal Jepsen harness that can drive the database through the public API.

Why: Jepsen needs to test the system through the same boundary an external client uses. This gives Cloud9 a real correctness harness for replicated KV behavior instead of testing only internal Raft state transitions.

Lines added: +2332

Test Plan

  • cargo test --workspace -- --nocapture
  • cargo clippy --workspace --all-targets -- -D warnings

Repro / Showcase

N/A. This PR wires the harness and node surface; full Jepsen execution is the next validation step.

Tests Added

  • Unit tests
  • Integration-style harness
  • E2E tests

Documentation

  • Internal:
    • jepsen/README.md documents the local Jepsen workflow.

Notes for Reviewers

The important review boundary is the public KV API and whether the Jepsen harness drives Cloud9 through the right production-shaped path. The Clojure side is intentionally minimal so failures are easy to interpret.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69bb083992

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cloud9-node/src/lib.rs
config: NodeConfig,
node: Mutex<RaftNode>,
state: Arc<RwLock<KvState>>,
waiters: Mutex<HashMap<LogIndex, oneshot::Sender<Result<KvApplyResult, ConnectError>>>>,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Key proposal waiters by term and clear stale waiters

waiters is keyed only by LogIndex, but Raft can replace an uncommitted entry at a given index after a leader change. In that case, a later committed entry at the same index can complete the wrong client's waiter, and proposals that are discarded without a replacement can block forever on receiver.await. This breaks request/result correlation during failover and can return incorrect mutation outcomes to clients.

Useful? React with 👍 / 👎.

Comment thread cloud9/src/main.rs
Comment on lines +190 to +193
_ => Err(miette::miette!(
"peer address `{host}:{port}` resolved ambiguously to {} addresses",
addrs.len()
)),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle dual-stack DNS peer resolution without hard failure

The config loader rejects any peer hostname that resolves to more than one socket address. Common names like localhost (or many DNS hosts in dual-stack environments) often resolve to both IPv4 and IPv6, so c9 start/check-config fails even though either address would be usable. This makes valid cluster configs unusable in typical environments.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant