feat(jepsen): add Cloud9 KV correctness harness#5
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 69bb083992
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| config: NodeConfig, | ||
| node: Mutex<RaftNode>, | ||
| state: Arc<RwLock<KvState>>, | ||
| waiters: Mutex<HashMap<LogIndex, oneshot::Sender<Result<KvApplyResult, ConnectError>>>>, |
There was a problem hiding this comment.
Key proposal waiters by term and clear stale waiters
waiters is keyed only by LogIndex, but Raft can replace an uncommitted entry at a given index after a leader change. In that case, a later committed entry at the same index can complete the wrong client's waiter, and proposals that are discarded without a replacement can block forever on receiver.await. This breaks request/result correlation during failover and can return incorrect mutation outcomes to clients.
Useful? React with 👍 / 👎.
| _ => Err(miette::miette!( | ||
| "peer address `{host}:{port}` resolved ambiguously to {} addresses", | ||
| addrs.len() | ||
| )), |
There was a problem hiding this comment.
Handle dual-stack DNS peer resolution without hard failure
The config loader rejects any peer hostname that resolves to more than one socket address. Common names like localhost (or many DNS hosts in dual-stack environments) often resolve to both IPv4 and IPv6, so c9 start/check-config fails even though either address would be usable. This makes valid cluster configs unusable in typical environments.
Useful? React with 👍 / 👎.
Pull Request
Linear Issue
N/A
Summary
What: Adds a typed Cloud9 KV protocol surface, a runnable node/server path, and a minimal Jepsen harness that can drive the database through the public API.
Why: Jepsen needs to test the system through the same boundary an external client uses. This gives Cloud9 a real correctness harness for replicated KV behavior instead of testing only internal Raft state transitions.
Lines added: +2332
Test Plan
cargo test --workspace -- --nocapturecargo clippy --workspace --all-targets -- -D warningsRepro / Showcase
N/A. This PR wires the harness and node surface; full Jepsen execution is the next validation step.
Tests Added
Documentation
jepsen/README.mddocuments the local Jepsen workflow.Notes for Reviewers
The important review boundary is the public KV API and whether the Jepsen harness drives Cloud9 through the right production-shaped path. The Clojure side is intentionally minimal so failures are easy to interpret.