Skip to content

[data_storage] remove cluster_name from VineyardStorageSpec and fix metrics tags#211

Open
lucky-zzz wants to merge 2 commits into
mainfrom
zsp_tmp
Open

[data_storage] remove cluster_name from VineyardStorageSpec and fix metrics tags#211
lucky-zzz wants to merge 2 commits into
mainfrom
zsp_tmp

Conversation

@lucky-zzz

Copy link
Copy Markdown
Collaborator

No description provided.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6964b8c3e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +106 to +108
int64 heartbeat_timeout_ms = 1;
int64 cleanup_grace_ms = 2;
int64 liveness_check_interval_ms = 3;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reserve old Vineyard field numbers

When an older AdminService client sends VineyardStorageSpec during a rolling upgrade, field #2 used to be heartbeat_timeout_ms and #3 used to be cleanup_grace_ms; this change reuses those tags for different settings, so the new server decodes the old heartbeat timeout as cleanup grace and the old cleanup grace as liveness interval while silently defaulting heartbeat. Keep the previous tag numbers for the remaining fields and reserve the removed cluster_name tag instead of renumbering.

Useful? React with 👍 / 👎.

…nstanceGroup.vineyard_storage_name instead of hardcoded v6d_ + instance_id

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c832eb158c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1430 to +1431
if (name.empty()) {
return nullptr;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Vineyard lookup for un-migrated groups

When the manager is upgraded with registry data written by the old version, InstanceGroup has no vineyard_storage_name because the new field is read with an empty default; this branch then makes every existing v6d ReportEvent fail with INSTANCE_NOT_EXIST instead of finding the already-registered v6d_<instance_id> backend. Please keep a fallback/migration for the old storage name when the new group field is absent, otherwise existing Vineyard deployments stop accepting heartbeats/block events until every group is manually patched.

Useful? React with 👍 / 👎.

Comment on lines +112 to +113
info.instance_id = instance_id;
info.metrics_tags = {{"instance_id", instance_id}, {"host", host_ip_port}};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Track shared Vineyard nodes per instance

With vineyard_storage_name, several instances in the same group can share this single VineyardBackend; if the same v6d host reports for a second instance, this existing-node path overwrites the stored instance_id (and the host generation remains global). Later liveness or HOST_DOWN cleanup invokes CleanupHostLocations only for the last reporting instance, and cleanup for earlier instances can be skipped as stale, leaving dead Vineyard locations visible across instance-isolated metadata. Track nodes by (instance_id, host) or keep all instance IDs associated with a host.

Useful? React with 👍 / 👎.

instance_group_info.set_user_data(proto_instance_group->user_data());
instance_group_info.set_version(proto_instance_group->version());
instance_group_info.set_extra_info(proto_instance_group->extra_info());
instance_group_info.set_vineyard_storage_name(proto_instance_group->vineyard_storage_name());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve Vineyard name when update omits it

For UpdateInstanceGroup, any client or script that omits the new vineyard_storage_name field is parsed here as an empty string, and the registry update replaces the whole group with that value. In rolling upgrades or with older REST/gRPC clients, a routine group update after configuring Vineyard will silently clear the backend association, causing subsequent v6d ReportEvent lookup to fail until the field is set again.

Useful? React with 👍 / 👎.

if (registry_manager_) {
vineyard_storage_name = registry_manager_->GetVineyardStorageName(instance_id);
}
return [this, instance_id, vineyard_storage_name](const CacheLocation &loc) -> bool {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve Vineyard storage at check time

If vineyard_storage_name is changed for a group while instances are registered, the existing MetaSearcher keeps using the value captured here at registration time; UpdateInstanceGroup only updates the registry and does not recreate meta searchers. After switching a group to a new Vineyard backend, queries can still validate Vineyard locations against the old backend and filter out otherwise live locations until the instance/meta searcher is recreated.

Useful? React with 👍 / 👎.

@qoderai qoderai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental review: the existing bot review already covers the main risks well (proto field renumbering, empty vineyard_storage_name breaking LookupVineyardBackend, shared-backend instance_id overwrite, UpdateInstanceGroup clearing the field, and stale capture in the check functor). One additional observation: the new if (storage_unique_name.empty()) return true; in GetCheckLocDataExistFunc is a fail-open behavior change — the old code always called Exist() with a non-empty name, whereas now an unconfigured group silently accepts vineyard locations as valid without any storage check. Consider returning false or logging a warning there.


🤖 Generated by Qoder

Comment on lines +2189 to 2191
if (storage_unique_name.empty()) {
return true;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new if (storage_unique_name.empty()) return true; guard makes the location-existence check fail-open when vineyard_storage_name is empty. Previously, VineyardStorageNameFromInstance(instance_id) always produced a non-empty "v6d_" + instance_id, so Exist() was always invoked and would return false for a missing backend — effectively fail-closed. Now, if the group hasn't been configured with vineyard_storage_name (e.g., during a rolling upgrade or after an UpdateInstanceGroup that omitted the field), vineyard cache locations are reported as valid without any storage check. This could lead the system to select stale or non-existent vineyard locations for reads, deferring the failure to a later stage with a less obvious error. Consider returning false here so unverifiable locations are filtered out, or at minimum logging a warning so the silent accept is observable.


🤖 Generated by Qoder

@github-actions github-actions Bot added the ai reviewed AI has reviewed this PR label Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai reviewed AI has reviewed this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant