Skip to content

feat(pb): actor v1 to v2 migration#4548

Merged
NathanFlurry merged 1 commit intomainfrom
04-02-feat_pb_actor_v1_to_v2_migration
Apr 5, 2026
Merged

feat(pb): actor v1 to v2 migration#4548
NathanFlurry merged 1 commit intomainfrom
04-02-feat_pb_actor_v1_to_v2_migration

Conversation

@MasterPtato
Copy link
Copy Markdown
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app bot commented Apr 3, 2026

🚅 Deployed to the rivet-pr-4548 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web Apr 5, 2026 at 11:37 am
frontend-inspector 🕒 Building (View Logs) Web Apr 5, 2026 at 11:37 am
frontend-cloud ❌ Build Failed (View Logs) Web Apr 3, 2026 at 3:42 am
website ❌ Build Failed (View Logs) Web Apr 3, 2026 at 3:42 am
mcp-hub ✅ Success (View Logs) Web Apr 3, 2026 at 3:42 am
ladle ❌ Build Failed (View Logs) Web Apr 3, 2026 at 3:41 am

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4548 April 3, 2026 03:41 Destroyed
@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

PR 4548 Review: Actor v1 to v2 Migration

This PR implements the infrastructure for migrating actors from v1 to v2 workflow, adds a Reallocating state to the actor2 state machine, fixes several envoy client bugs, and updates API endpoints to use /actors as the canonical path.


Bugs and Correctness Issues

1. Double typo in config description (config/pegboard.rs, config-schema.json)

Both the Rust doc comment and JSON schema have "before before": How long to wait after starting to attempt to reallocate before before setting actor to sleep. Remove the duplicate word.


2. Missing actor_id tag on v2 dispatch (actor/mod.rs)

The early-exit from_v1 dispatch at line ~235 now correctly has .tag("actor_id", input.actor_id) added. Worth confirming this was a pre-existing missing tag that was silently preventing systems from locating the migrated v2 workflow by actor_id.


3. MigratedToV2 message sent before v2 workflow is ready (actor/mod.rs)

The sequence dispatches the v2 workflow then immediately sends MigratedToV2. The guard (pegboard_gateway.rs) hands off to handle_actor_v2 immediately on receiving MigratedToV2. Since dispatch() only enqueues the v2 workflow without waiting for it to start, there is a window where incoming requests get handed to v2 handling before the v2 workflow has started. This may be fine if handle_actor_v2 properly waits on its Ready subscription. Please confirm the v2 handler tolerates a Ready signal that arrives after the handler starts.


4. Reallocating threshold check may never fire (actor2/mod.rs)

The check compares state.retry_backoff_state.last_retry_ts > *since_ts + ctx.config().pegboard().actor_retry_duration_threshold(). If last_retry_ts initializes to 0 (epoch), this condition is always false and the actor will never transition from Reallocating back to Sleeping until the first retry timestamp is recorded. Please verify that last_retry_ts is reliably updated on each retry attempt and that its initial value is intentional.


5. Unused subscription not dropped early in delete.rs

Both destroy_sub (v1) and destroy_sub2 (v2) are created upfront. After determining a v2 destroy was sent, destroy_sub (v1) is never explicitly dropped before awaiting destroy_sub2. While it drops at function return, explicitly dropping it after choosing the v2 path would free the NATS subscription sooner, consistent with the pattern used in the gateway.


Minor Issues

6. Spawned work in handleConnClose may swallow errors (envoy/index.ts)

handleConnClose is now synchronous (removing await at the call site is correct), but it internally calls spawn for the lost-connection timeout. Confirm spawn logs or handles rejections rather than silently dropping them.


7. handleShutdown reentrancy guard should be commented (envoy/index.ts)

The shuttingDown flag correctly prevents duplicate shutdowns. A short comment explaining why duplicates can occur (e.g. conn-close racing with shutdown) would prevent the guard from looking like dead code on future reads.


8. Test crash policy change (scripts/tests/utils.ts)

Changing the default test crash policy from "destroy" to "sleep" changes what the e2e tests cover. This seems intentional for v2 reallocation testing, but make sure the destroy crash policy path still has coverage.


Positive Changes

  • The new Reallocating { since_ts } state is a clear improvement over Sleeping { attempting_reallocation: bool } -- an explicit state with a timestamp is much more debuggable.
  • Replacing util::timestamp::now() with ctx.activity(GetTsInput {}).await? throughout the v2 workflow is correct for workflow determinism and replay safety.
  • The runner_pool_metadata_poller.rs fix (keys::subspace() to namespace::keys::subspace()) looks like an important correctness bug fix.
  • The errored flag in connection.ts prevents a websocket error from being silently overwritten with a generic closed message.
  • Subscribing to MigratedToV2 and both v1/v2 event subscriptions before dispatching correctly avoids the TOCTOU race where a signal fires before the subscriber is registered.

Copy link
Copy Markdown
Member

NathanFlurry commented Apr 5, 2026

Merge activity

  • Apr 5, 11:11 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 5, 11:37 AM UTC: Graphite rebased this pull request as part of a merge.
  • Apr 5, 11:37 AM UTC: @NathanFlurry merged this pull request with Graphite.

@NathanFlurry NathanFlurry changed the base branch from 04-01-feat_envoy-client_fully_flesh_out_tunnel_impl to graphite-base/4548 April 5, 2026 11:34
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4548 to main April 5, 2026 11:35
@NathanFlurry NathanFlurry force-pushed the 04-02-feat_pb_actor_v1_to_v2_migration branch from b3c76ca to cf87a32 Compare April 5, 2026 11:36
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4548 April 5, 2026 11:36 Destroyed
@NathanFlurry NathanFlurry merged commit d61b0fe into main Apr 5, 2026
10 of 19 checks passed
@NathanFlurry NathanFlurry deleted the 04-02-feat_pb_actor_v1_to_v2_migration branch April 5, 2026 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants