-
-
Notifications
You must be signed in to change notification settings - Fork 149
chore: add mandatory hottier for pstats dataset #1414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: add mandatory hottier for pstats dataset #1414
Conversation
WalkthroughAdds lazy initialization of a hot tier for the dataset stats stream before regular hot-tier syncing. Implements a private helper to check storage for the stream and create a default-sized hot tier if missing. Errors during this pre-step are traced and do not halt the subsequent per-stream sync. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant H as HotTierManager
participant S as Storage/PARSEABLE
participant HT as HotTier Store
rect rgba(230,240,255,0.5)
note over H: Pre-sync step (new)
H->>H: create_pstats_hot_tier()
H->>S: check_or_load_stream(DATASET_STATS_STREAM_NAME)
alt Stream exists
H->>HT: get_hot_tier(DATASET_STATS_STREAM_NAME)
alt Hot tier missing
H->>HT: put_hot_tier(DATASET_STATS_STREAM_NAME, default StreamHotTier)
note right of HT: version=CURRENT_HOT_TIER_VERSION<br/>size=MIN_STREAM_HOT_TIER_SIZE_BYTES
else Hot tier present
note right of HT: No-op
end
else Stream absent
note over H,S: No-op
end
opt Error
H->>H: trace! error and continue
end
end
rect rgba(235,255,235,0.5)
note over H: Existing per-stream sync continues
H->>H: sync per stream (unchanged flow)
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/hottier.rs (2)
255-259
: Surface pre-step failures at a higher log level (debug/warn) for operational visibility.Swallowing errors with trace! can hide misconfigurations (e.g., permissions on hot-tier dir). Consider logging at least debug! and include structured fields.
Apply this localized change:
- if let Err(e) = self.create_pstats_hot_tier().await { - tracing::trace!("Skipping pstats hot tier creation because of error: {e}"); - } + if let Err(e) = self.create_pstats_hot_tier().await { + tracing::debug!(error = %e, "Skipping dataset-stats hot tier creation pre-step"); + }Optionally, emit an info! when the hot tier is created (see suggestion below in create_pstats_hot_tier).
716-739
: Make the helper idempotent, explicit, and slightly more robust; standardize naming.The logic is sound and idempotent. Two small improvements:
- Naming: “pstats” vs “dataset stats” is inconsistent. Prefer a clear name like ensure_dataset_stats_hot_tier for discoverability.
- Robustness: ensure the per-stream directory exists before put_hot_tier to avoid relying on LocalFileSystem::put creating parents. Also, log on successful creation to aid ops.
Apply the following focused adjustments:
- /// Creates hot tier for pstats internal stream if the stream exists in storage - async fn create_pstats_hot_tier(&self) -> Result<(), HotTierError> { + /// Ensures a hot tier exists for the dataset-stats stream if the stream exists in storage. + async fn create_pstats_hot_tier(&self) -> Result<(), HotTierError> { // Check if pstats hot tier already exists if !self.check_stream_hot_tier_exists(DATASET_STATS_STREAM_NAME) { // Check if pstats stream exists in storage by attempting to load it if PARSEABLE .check_or_load_stream(DATASET_STATS_STREAM_NAME) .await { + // Ensure the directory exists for the metadata file + let dir = self.hot_tier_path.join(DATASET_STATS_STREAM_NAME); + if !dir.exists() { + tokio::fs::create_dir_all(&dir).await?; + } let mut stream_hot_tier = StreamHotTier { version: Some(CURRENT_HOT_TIER_VERSION.to_string()), size: MIN_STREAM_HOT_TIER_SIZE_BYTES, used_size: 0, available_size: MIN_STREAM_HOT_TIER_SIZE_BYTES, oldest_date_time_entry: None, }; self.put_hot_tier(DATASET_STATS_STREAM_NAME, &mut stream_hot_tier) .await?; + tracing::info!( + stream = DATASET_STATS_STREAM_NAME, + size_bytes = MIN_STREAM_HOT_TIER_SIZE_BYTES, + "Created dataset-stats hot tier metadata" + ); } } Ok(()) }Optional follow-up: factor this into a generic ensure_hot_tier_for_stream(stream_name, size_bytes) and reuse in put_internal_stream_hot_tier to de-duplicate logic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/hottier.rs
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (2)
src/hottier.rs (2)
30-30
: Import of DATASET_STATS_STREAM_NAME looks right.Pulling the dataset-stats stream name from storage::field_stats is appropriate for colocating ownership with storage-layer concerns. No issues spotted.
716-739
: No issues found withcheck_or_load_stream
behaviorThe
check_or_load_stream(&self, stream_name: &str) -> bool
helper:
- Returns
true
if the stream is already inPARSEABLE.streams
(in-memory) or,- In
Mode::Query
orMode::Prism
, attempts to load from storage viacreate_stream_and_schema_from_storage
(which callsstreams.get_or_create
) and returnstrue
on success .The
create_stream_and_schema_from_storage
implementation:
- Verifies existence via
storage.list_streams()
,- Inserts the stream into
self.streams
usingget_or_create
before returningOk(true)
.Finally,
PARSEABLE.streams.list()
simply collects the in-memory keys (contains
⇒list
includes the name) . Thus atrue
result fromcheck_or_load_stream
guarantees the stream appears inPARSEABLE.streams.list()
for downstream hot-tier synchronization.
Summary by CodeRabbit
New Features
Bug Fixes