Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 01f841f0f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if cfg.listen.http.is_some() { | ||
| cfg.local_address = format!("http://127.0.0.1:{port}/"); | ||
| } else { |
There was a problem hiding this comment.
Preserve listener path when overriding local_address with --port
When --port is used, this code rewrites local_address to http://127.0.0.1:{port}/ (or HTTPS) and drops any existing ingest path from the config. If a node is configured with a non-root listener path (for example /cmr), peers will be told to send to /, which causes 404 delivery failures and broken routing for that run mode.
Useful? React with 👍 / 👎.
| } else { | ||
| format!("/{}", dashboard_cfg.path) | ||
| }; | ||
| if path == base || path.starts_with(&format!("{base}/")) { |
There was a problem hiding this comment.
Normalize dashboard base path before route prefix matching
Route dispatch uses raw dashboard.path for prefix matching, so values like "/" or "/_cmr/" produce checks against "//" or "/_cmr//". In those cases, subroutes such as /api/... never enter the dashboard handler and return 404 even though the dashboard is enabled, leaving only the exact base path reachable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This pull request adds a "Basic Web UI" to enable non-programmers to interact with the CMR peer system through a web dashboard. The changes introduce significant new functionality while also making breaking API changes to the compression oracle interface and routing policy defaults.
Changes:
- Adds an embedded web dashboard with REST API endpoints for peer management, message composition, policy configuration, and real-time event streaming
- Implements SMTP inbound transport listener with MIME email parsing
- Introduces normalized distance-based routing thresholds alongside raw distance thresholds
- Removes deprecated
ncd_symmethods from the compression oracle interface - Adds configuration hot-reload and runtime management APIs
Reviewed changes
Copilot reviewed 21 out of 23 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/cmr-peer/src/dashboard.rs | New embedded web dashboard with API endpoints, SSE streaming, and peer connectivity probing |
| crates/cmr-peer/src/app.rs | SMTP listener implementation, MIME email parsing, dashboard event recording, config management APIs |
| crates/cmr-peer/src/main.rs | CLI port overrides, automatic config generation, dashboard URL hints on startup |
| crates/cmr-peer/src/transport.rs | Handshake message size limits parameterization |
| crates/cmr-peer/src/config.rs | Dashboard, SMTP listener, and policy tuning configuration structures |
| crates/cmr-peer/src/lib.rs | Export dashboard module |
| crates/cmr-core/src/router.rs | Normalized distance thresholding, routing diagnostics, cache statistics APIs, removal of ncd_sym methods |
| crates/cmr-core/src/policy.rs | Normalized distance threshold field, updated default max_match_distance values |
| crates/cmr-core/src/lib.rs | Export new public types for dashboard APIs |
| crates/cmr-core/src/compressor_ipc.rs | Remove NcdSym request/response variants |
| crates/cmr-compressor/src/main.rs | Remove ncd_sym handler, add ZPAQ method configuration |
| crates/cmr-peer/cmr-peer.example.toml | Document SMTP listener and dashboard configuration |
| cmr-peer.toml* | Runtime artifacts that should not be committed |
| Test files | Updated to match new compression oracle interface and routing behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Optional static pairwise keys: | ||
| # [[static_keys]] | ||
| # peer = "http://peer-a.example/cmr" | ||
| # hex_key = "666f6f626172313233" |
There was a problem hiding this comment.
The example configuration file now shows the static_keys section commented out at the end. However, the actual structure definition in the code expects an array. The old format showed an example array entry, which was more helpful. Consider adding back a complete commented example showing the array syntax with one or two example entries.
| # Optional static pairwise keys: | |
| # [[static_keys]] | |
| # peer = "http://peer-a.example/cmr" | |
| # hex_key = "666f6f626172313233" | |
| # Optional static pairwise keys (array of tables). | |
| # Uncomment and duplicate [[static_keys]] blocks as needed. | |
| # [[static_keys]] | |
| # peer = "http://peer-a.example/cmr" | |
| # hex_key = "666f6f626172313233" | |
| # | |
| # [[static_keys]] | |
| # peer = "http://peer-b.example/cmr" | |
| # hex_key = "abcdef0123456789" |
| async fn read_smtp_data<R>( | ||
| reader: &mut BufReader<R>, | ||
| max_message_bytes: usize, | ||
| ) -> Result<Vec<u8>, AppError> | ||
| where | ||
| R: tokio::io::AsyncRead + Unpin, | ||
| { | ||
| let mut data = Vec::new(); | ||
| let mut line = String::new(); | ||
| loop { | ||
| line.clear(); | ||
| let read = reader.read_line(&mut line).await.map_err(AppError::Io)?; | ||
| if read == 0 { | ||
| return Err(AppError::Runtime( | ||
| "smtp DATA terminated unexpectedly".to_owned(), | ||
| )); | ||
| } | ||
| if line == ".\r\n" || line == ".\n" || line == "." { | ||
| break; | ||
| } | ||
| let mut bytes = line.as_bytes().to_vec(); | ||
| if bytes.starts_with(b"..") { | ||
| bytes.remove(0); | ||
| } | ||
| data.extend_from_slice(&bytes); | ||
| if data.len() > max_message_bytes { | ||
| return Err(AppError::Runtime(format!( | ||
| "smtp DATA exceeds max_message_bytes ({max_message_bytes})" | ||
| ))); | ||
| } | ||
| } | ||
| Ok(data) |
There was a problem hiding this comment.
The SMTP implementation doesn't set any timeout on read operations, which could allow a malicious client to hold connections open indefinitely by sending data very slowly. Consider adding a timeout using tokio::time::timeout to prevent resource exhaustion attacks.
| local_address = "http://127.0.0.1:4001/" | ||
| security_level = "strict" | ||
| static_keys = [] | ||
| prefer_http_handshake = false | ||
|
|
||
| [listen.http] | ||
| bind = "0.0.0.0:4001" | ||
| path = "/" | ||
|
|
||
| [listen.udp] | ||
| bind = "0.0.0.0:5001" | ||
| service = "cmr" | ||
|
|
||
| [compressor] | ||
| command = "cmr-compressor" | ||
| args = [] | ||
| max_frame_bytes = 8388608 | ||
|
|
||
| [ssh] | ||
| binary = "ssh" | ||
| default_remote_command = "cmr-peer receive-stdin --config /etc/cmr-peer.toml --transport ssh" | ||
|
|
||
| [policy_tuning] | ||
|
|
||
| [dashboard] | ||
| enabled = true | ||
| path = "/_cmr" |
There was a problem hiding this comment.
This appears to be a runtime configuration file that should not be committed to the repository. Configuration files with local development settings should be in .gitignore to prevent conflicts between different developers' environments.
| if query.get("token").is_some_and(|value| value == expected) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Allowing authentication tokens in query parameters (line 1128) can expose credentials in server logs, browser history, and referrer headers. This is a security risk. Consider removing query parameter authentication and only supporting the Authorization header for bearer tokens.
| if query.get("token").is_some_and(|value| value == expected) { | |
| return true; | |
| } |
| enabled = true | ||
| path = "/_cmr" | ||
| # Optional bearer token required for all dashboard/API requests. | ||
| # auth_token = "change-me" |
There was a problem hiding this comment.
The example config enables the web dashboard ([dashboard] with enabled = true and path /_cmr) while leaving auth_token unset and commented out. In combination with the current implementation (where DashboardConfig.auth_token defaults to None and is_authorized grants access when the token is missing), this makes a freshly initialized peer expose powerful dashboard/API operations over HTTP on 0.0.0.0 with no authentication, so any reachable client can read status, change config, or send messages. Consider shipping the template with the dashboard disabled by default or with a required auth token (and clearly documenting it), so that operators do not end up with an exposed, unauthenticated admin UI out of the box.
| enabled = true | |
| path = "/_cmr" | |
| # Optional bearer token required for all dashboard/API requests. | |
| # auth_token = "change-me" | |
| enabled = false | |
| path = "/_cmr" | |
| # Bearer token required for all dashboard/API requests. Change this before enabling. | |
| auth_token = "change-me" |
| [dashboard] | ||
| enabled = true | ||
| path = "/_cmr" |
There was a problem hiding this comment.
This default cmr-peer.toml enables the web dashboard ([dashboard] with enabled = true and path /_cmr) without configuring any auth_token. Given that the dashboard implementation treats a missing token as "no auth required" and the HTTP listener here binds to 0.0.0.0, running with this config will expose the full dashboard/API surface (status, runtime controls, config edits, message sending, etc.) to any client on the network with no authentication. It would be safer to ship this config with the dashboard disabled or requiring an auth token, and to clearly instruct operators how to enable it securely.
| [dashboard] | |
| enabled = true | |
| path = "/_cmr" | |
| [dashboard] | |
| # Enable or disable the web dashboard. | |
| enabled = true | |
| # URL path for the dashboard. | |
| path = "/_cmr" | |
| # Authentication token required to access the dashboard/API. | |
| # CHANGE THIS to a strong, secret value before exposing the service. | |
| auth_token = "CHANGE_ME" |
…s for real world networking.
….1 Conformance issue
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 28 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| fn default_dashboard_enabled() -> bool { | ||
| true |
There was a problem hiding this comment.
The dashboard (including config/policy mutation endpoints) is enabled by default (default_dashboard_enabled() -> true) and auth_token defaults to None, which makes the admin surface unauthenticated unless the operator explicitly configures it. This is risky when the HTTP listener binds to 0.0.0.0 (as in the example configs). Consider defaulting enabled to false, auto-generating/requiring an auth token when enabled, or only enabling by default when binding to loopback.
| true | |
| false |
| IpAddr::V4(ip) => format!("{ip}:{port}"), | ||
| IpAddr::V6(ip) => format!("[{ip}]:{port}"), | ||
| }, | ||
| Err(_) => format!("0.0.0.0:{port}"), |
There was a problem hiding this comment.
rewrite_bind_port falls back to 0.0.0.0:{port} on parse failure. This means a config like bind = "localhost:8080" (not parseable as SocketAddr) combined with --port will unexpectedly widen exposure from loopback-only to all interfaces. Consider preserving the original host when possible (e.g., splitting host:port, handling hostnames like localhost, or returning an error instead of defaulting to 0.0.0.0).
| Err(_) => format!("0.0.0.0:{port}"), | |
| Err(_) => { | |
| // If `bind` is not a valid `SocketAddr`, try to preserve the original host. | |
| // This handles cases like `localhost:8080` where we want to keep `localhost` | |
| // and only override the port, instead of widening exposure to 0.0.0.0. | |
| if let Some((host, old_port)) = bind.rsplit_once(':') { | |
| // Only treat this as host:port if the suffix is a valid u16 port. | |
| if !host.is_empty() && old_port.parse::<u16>().is_ok() { | |
| return format!("{host}:{port}"); | |
| } | |
| } | |
| // If there was no valid port suffix but we still have a non-empty host, | |
| // assume it's a host without a port and append the overridden port. | |
| if !bind.is_empty() { | |
| return format!("{bind}:{port}"); | |
| } | |
| // As a last resort, fall back to listening on all interfaces. | |
| format!("0.0.0.0:{port}") | |
| } |
| fn is_local_peer_alias(&self, peer: &str) -> bool { | ||
| let local = self.local_address.trim_end_matches('/'); | ||
| let candidate = peer.trim_end_matches('/'); | ||
| candidate == local || candidate.starts_with(&format!("{local}/")) |
There was a problem hiding this comment.
is_local_peer_alias treats any peer address that starts with "{local}/" as a local alias. This can incorrectly classify distinct peers as “local” (e.g. local=http://host, peer=http://host/peer-a/), causing them to be filtered out of peer_count() and peer_snapshots(). If the intent is only to tolerate trailing-slash differences, consider restricting this to exact match after trim_end_matches('/') (similar to same_address_alias in protocol.rs).
| candidate == local || candidate.starts_with(&format!("{local}/")) | |
| candidate == local |
| let candidate = parent.join(command); | ||
| candidate.is_file().then_some(candidate) |
There was a problem hiding this comment.
sibling_worker_path only checks for a file literally named command next to the current executable. On Windows, the worker is typically cmr-compressor.exe, so the fallback won’t find it when command is cmr-compressor. Consider adding platform-aware resolution (e.g., appending .exe on Windows or using std::env::consts::EXE_EXTENSION), or using current_exe’s directory with a more robust lookup strategy.
| let candidate = parent.join(command); | |
| candidate.is_file().then_some(candidate) | |
| // Build a list of candidate filenames to check next to the current executable. | |
| // Always try the raw command, and on platforms with an executable extension | |
| // (e.g., `.exe` on Windows), also try `command` plus that extension if it | |
| // does not already end with it. | |
| let mut candidates = Vec::new(); | |
| candidates.push(command.to_owned()); | |
| let exe_ext = std::env::consts::EXE_EXTENSION; | |
| if !exe_ext.is_empty() { | |
| // Do a case-insensitive check to see if the command already ends with | |
| // the executable extension, to avoid duplicating it. | |
| let cmd_lower = command.to_ascii_lowercase(); | |
| let ext_with_dot = format!(".{}", exe_ext); | |
| let ext_lower = ext_with_dot.to_ascii_lowercase(); | |
| if !cmd_lower.ends_with(&ext_lower) { | |
| candidates.push(format!("{command}{}", ext_with_dot)); | |
| } | |
| } | |
| for name in candidates { | |
| let candidate = parent.join(&name); | |
| if candidate.is_file() { | |
| return Some(candidate); | |
| } | |
| } | |
| None |
| if dashboard_cfg.enabled { | ||
| let base = if dashboard_cfg.path.starts_with('/') { | ||
| dashboard_cfg.path.clone() | ||
| } else { | ||
| format!("/{}", dashboard_cfg.path) | ||
| }; | ||
| if path == base || path.starts_with(&format!("{base}/")) { | ||
| return dashboard::handle_dashboard_request(req, state, dashboard_cfg).await; |
There was a problem hiding this comment.
dashboard_cfg.path isn’t normalized the same way as dashboard::normalize_base_path: if the config path includes a trailing slash (e.g. /_cmr/), base becomes /_cmr/ and the path == base || path.starts_with("{base}/") checks won’t match /_cmr or /_cmr/api/..., effectively disabling the dashboard routes. Consider trimming trailing slashes (and ensuring a leading slash) before matching, or reusing the same normalization helper used in dashboard.rs.
| async fn parse_json_body<T: DeserializeOwned>( | ||
| req: Request<Incoming>, | ||
| ) -> Result<T, Response<PeerBody>> { | ||
| let body = req.into_body().collect().await.map_err(|err| { | ||
| response_api_error( | ||
| StatusCode::BAD_REQUEST, | ||
| "invalid_body", | ||
| "failed to read request body", | ||
| Some(serde_json::json!({"io_error": err.to_string()})), | ||
| ) | ||
| })?; | ||
| serde_json::from_slice::<T>(&body.to_bytes()).map_err(|err| { |
There was a problem hiding this comment.
parse_json_body reads and buffers the entire request body with no size limit. If the dashboard is reachable over the network, this can be used for memory exhaustion (especially on endpoints like /api/config/apply). Consider enforcing a max JSON body size (via Content-Length checks and/or a capped collection/streaming decode) and returning 413 Payload Too Large when exceeded.
| query: &HashMap<String, String>, | ||
| ) -> bool { | ||
| let Some(expected) = cfg.auth_token.as_deref() else { | ||
| return true; | ||
| }; | ||
| if query.get("token").is_some_and(|value| value == expected) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Allowing the auth token via the ?token= query parameter makes it easy to leak the token via logs, browser history, referer headers, and upstream proxies. If this is meant only as a convenience, consider removing query-token auth (or gating it behind an explicit config flag) and requiring the Authorization: Bearer ... header instead.
| query: &HashMap<String, String>, | |
| ) -> bool { | |
| let Some(expected) = cfg.auth_token.as_deref() else { | |
| return true; | |
| }; | |
| if query.get("token").is_some_and(|value| value == expected) { | |
| return true; | |
| } | |
| _query: &HashMap<String, String>, | |
| ) -> bool { | |
| let Some(expected) = cfg.auth_token.as_deref() else { | |
| return true; | |
| }; |
| let Ok(text) = value.to_str() else { | ||
| return false; | ||
| }; | ||
| text == format!("Bearer {expected}") |
There was a problem hiding this comment.
Authorization parsing is overly strict: it requires an exact, case-sensitive match to "Bearer {token}" and allocates a new string for the comparison. HTTP auth schemes are case-insensitive, and many clients send bearer/extra whitespace. Consider parsing the header more defensively (trim, split on whitespace, compare scheme case-insensitively) and compare the token without allocating.
| text == format!("Bearer {expected}") | |
| let text = text.trim(); | |
| let mut parts = text.split_whitespace(); | |
| let Some(scheme) = parts.next() else { | |
| return false; | |
| }; | |
| let Some(token) = parts.next() else { | |
| return false; | |
| }; | |
| // Reject headers with unexpected extra components (e.g., "Bearer token extra"). | |
| if parts.next().is_some() { | |
| return false; | |
| } | |
| scheme.eq_ignore_ascii_case("Bearer") && token == expected |
…ment by default; Use HTTP Basic Authentication, but only accept it over HTTPS to mitigate Credential Exposure
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 28 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| spam: SpamPolicy { | ||
| min_intrinsic_dependence: 0.02, | ||
| max_match_distance: 0.72, | ||
| max_match_distance: 500.0, |
There was a problem hiding this comment.
The max_match_distance threshold has been changed from 0.72 to 500.0, which is a dramatic increase (nearly 700x). This fundamentally changes the behavior of the routing algorithm by making it much more permissive. The comment on line 38 indicates this is now a "raw Section 3.2 match distance" rather than a normalized distance, which explains the scale change. However, this is a breaking change that significantly alters routing behavior. Consider whether existing deployments using the "strict" security level would expect this more permissive matching, and whether this should be documented in a migration guide or changelog.
| max_match_distance: 500.0, | |
| max_match_distance: 0.72, |
| fn router_match_threshold_is_normalized_and_size_aware() { | ||
| let mut policy = permissive_policy(); | ||
| policy.spam.max_match_distance = 0.0; | ||
| policy.trust.max_outbound_inbound_ratio = 10.0; | ||
| let mut router = Router::new("http://local".to_owned(), policy, StubOracle::ok(0.9, 10.0)); | ||
|
|
||
| let seed = message_with_sender("http://sink", b"topic alpha", None, "2029/12/31 23:59:59"); | ||
| assert!( | ||
| router | ||
| .process_incoming(&seed, TransportKind::Http, ts("2030/01/01 00:00:10")) | ||
| .accepted | ||
| ); | ||
|
|
||
| let incoming = message_with_sender("http://origin", b"topic beta", None, "2029/12/31 23:59:58"); | ||
| let out = router.process_incoming(&incoming, TransportKind::Http, ts("2030/01/01 00:00:10")); | ||
| assert!(out.accepted); | ||
| assert_eq!(out.matched_count, 0); | ||
| assert!(out.forwards.is_empty()); | ||
| } |
There was a problem hiding this comment.
The test name router_match_threshold_is_normalized_and_size_aware is misleading. The test actually verifies that the raw distance threshold gates matching (max_match_distance=0.0 with oracle distance=10.0 results in no matches), which demonstrates the threshold is NOT normalized. Consider renaming this test to something like router_match_threshold_rejects_raw_distance_above_limit to accurately reflect what it tests.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback -- fixing this:
The test name router_match_threshold_is_normalized_and_size_aware is misleading. The test actually verifies that the raw distance threshold gates matching (max_match_distance=0.0 with oracle distance=10.0 results in no matches), which demonstrates the threshold is NOT normalized. Consider renaming this test to something like router_match_threshold_rejects_raw_distance_above_limit to accurately reflect what it tests.
(so just renaming it)
|
@turtle261 I've opened a new pull request, #2, to work on those changes. Once the pull request is ready, I'll request review from you. |
* Rename test to router_match_threshold_rejects_raw_distance_above_limit
Implement a Web UI, allowing direct, easy use.
Allow a human to be a client without needing programming skills.