Skip to content

Add RetryPolicy CRUD API and armadactl support#4805

Open
dejanzele wants to merge 1 commit intoarmadaproject:masterfrom
dejanzele:retry-policy-crud
Open

Add RetryPolicy CRUD API and armadactl support#4805
dejanzele wants to merge 1 commit intoarmadaproject:masterfrom
dejanzele:retry-policy-crud

Conversation

@dejanzele
Copy link
Copy Markdown
Member

What type of PR is this?

Feature (retry policy PR 2 of 4)

What this PR does / why we need it

Adds RetryPolicy as a first-class API resource with full CRUD operations, following the same patterns as Queue. Operators can create retry policies and assign them to queues by name.

  • Adds RetryPolicy, RetryRule, RetryExitCodeMatcher proto messages and RetryAction enum to submit.proto
  • Adds RetryPolicyService gRPC service with Create/Update/Delete/Get/List RPCs
  • Adds REST gateway bindings for all operations (/v1/retry-policy/...)
  • Adds retry_policy field (string, references policy by name) to the Queue proto message
  • Adds internal/server/retrypolicy/ - repository (PostgreSQL) and gRPC handler, mirroring internal/server/queue/
  • Adds pkg/client/retrypolicy/ - client library for all CRUD operations
  • Adds cmd/armadactl/cmd/retrypolicy.go - CLI commands for managing retry policies
  • Adds --retry-policy flag to queue create/update commands
  • Adds RetryPolicy resource kind for file-based creation (armadactl create -f policy.yaml)

Usage:

armadactl create retry-policy -f policy.yaml
armadactl get retry-policy ml-training
armadactl get retry-policies
armadactl delete retry-policy ml-training
armadactl create queue ml-queue --retry-policy ml-training

Which issue(s) this PR fixes

Part of #4683 (Retry Policy)

Special notes for your reviewer

  • This is retry policy PR 2 of 4: Engine + config (independent) and CRUD + armadactl (this) -> Scheduler wiring -> Backoff + pod naming
  • No dependency on error categorization PRs - this is pure CRUD infrastructure
  • Independent of the engine PR (PR 1) - these can be reviewed in parallel
  • Follows the queue CRUD pattern exactly: proto -> server handler -> PostgreSQL repository -> client library -> armadactl
  • Repository uses INSERT with unique constraint check (not read-then-write) for concurrent-safe creates
  • Update uses single UPDATE with RowsAffected check instead of read-then-upsert
  • The retry_policy table (name text PK, definition bytea) stores serialized proto, same as the queue table
  • File-based create/update because rules are too complex for CLI flags
  • No scheduling behavior change in this PR

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR adds a complete RetryPolicy CRUD API — proto messages, gRPC service, PostgreSQL repository, REST gateway bindings, client library, and armadactl commands — following the established Queue pattern throughout. All prior review concerns have been resolved: the 032_create_retry_policy.sql migration is present, permissions are distinct for create/update/delete, idempotent delete and unauthenticated reads are documented, and ExitCodeOperator is properly an enum.

Confidence Score: 5/5

This PR is safe to merge; all previously flagged issues have been addressed and the implementation follows established patterns.

All prior P0/P1 concerns (missing migration, wrong gRPC codes, permission asymmetry, free-form operator string, missing name validation) are fully resolved. The only remaining finding is a P2 cosmetic issue with null output for an empty policy list.

No files require special attention; internal/armadactl/retrypolicy.go has a minor P2 cosmetic issue with null output for an empty list.

Important Files Changed

Filename Overview
internal/server/retrypolicy/service.go New gRPC service handler; auth, validation, and error mapping are correct and consistent with queue service
internal/server/retrypolicy/repository.go PostgreSQL CRUD repository; concurrent-safe create via ON CONFLICT DO NOTHING, intentionally idempotent delete is documented
internal/server/retrypolicy/service_test.go Comprehensive unit tests covering all code paths: auth failures, empty name, not-found, already-exists, and success cases
internal/lookout/schema/migrations/032_create_retry_policy.sql Correctly numbered migration (following 031) creating retry_policy table matching queue table pattern
pkg/api/submit.proto Adds RetryPolicy messages, RetryPolicyService, and REST gateway bindings; ExitCodeOperator is properly modelled as an enum
internal/armadactl/retrypolicy.go Clean CLI integration; GetAllRetryPolicies outputs 'null' for empty result instead of a user-friendly empty message
cmd/armadactl/cmd/retrypolicy.go CLI command definitions follow established patterns; file-flag required for create/update, positional arg for get/delete
internal/server/submit/submit.go Adds retryPolicyService field and five delegation methods, mirroring the existing queue service delegation pattern
internal/server/server.go Correct wiring: PostgresRetryPolicyRepository → retrypolicy.Server → both Submit and RetryPolicyService gRPC registrations
internal/server/permissions/permissions.go Three distinct permission constants added: CreateRetryPolicy, UpdateRetryPolicy, DeleteRetryPolicy
pkg/client/retrypolicy/create.go Standard client library create implementation using common connection/context patterns
pkg/client/retrypolicy/get.go Standard client library get/list implementation; both single and bulk fetch covered
pkg/client/retrypolicy/update.go Standard client library update implementation
pkg/client/retrypolicy/delete.go Standard client library delete implementation
pkg/client/resource.go RetryPolicy added to NewResourceKind switch correctly alongside Queue
internal/armadactl/queue.go CreateResource extended with ResourceKindRetryPolicy case, following the existing Queue pattern

Sequence Diagram

sequenceDiagram
    participant CLI as armadactl
    participant Client as pkg/client/retrypolicy
    participant Submit as Submit gRPC Server
    participant Service as retrypolicy.Server
    participant Repo as PostgresRetryPolicyRepository
    participant DB as PostgreSQL

    CLI->>Client: Create/Update/Delete/Get(policy)
    Client->>Submit: gRPC or REST HTTP
    Submit->>Service: delegate (e.g. CreateRetryPolicy)
    Service->>Service: AuthorizeAction(ctx, permission)
    Service->>Service: validate req.Name != ""
    Service->>Repo: CreateRetryPolicy(ctx, policy)
    Repo->>Repo: proto.Marshal(policy)
    Repo->>DB: INSERT INTO retry_policy ON CONFLICT DO NOTHING
    DB-->>Repo: RowsAffected
    Repo-->>Service: nil or ErrAlreadyExists
    Service-->>Submit: types.Empty{} or gRPC status error
    Submit-->>Client: response
    Client-->>CLI: result
Loading

Reviews (10): Last reviewed commit: "Add RetryPolicy CRUD API and armadactl s..." | Re-trigger Greptile

@dejanzele dejanzele force-pushed the retry-policy-crud branch 2 times, most recently from 1d4f603 to 45023c8 Compare March 30, 2026 14:06
@dejanzele
Copy link
Copy Markdown
Member Author

@greptileai

@dejanzele dejanzele force-pushed the retry-policy-crud branch 3 times, most recently from a797d8b to 63f1c89 Compare March 30, 2026 16:15
@dejanzele
Copy link
Copy Markdown
Member Author

@greptileai

@dejanzele dejanzele force-pushed the retry-policy-crud branch 2 times, most recently from 513e46f to 07bffb3 Compare March 31, 2026 19:43
Introduce RetryPolicy as a first-class API resource with full CRUD
operations. This is pure infrastructure with no scheduling behavior
changes.

Proto: Add RetryPolicy, RetryRule, RetryExitCodeMatcher messages and
RetryPolicyService gRPC service with REST gateway bindings on the
Submit service. Add retry_policy field to Queue message.

Server: Add retrypolicy package with PostgresRetryPolicyRepository
(stores serialized proto in retry_policy table) and Server handler
with authorization checks. Wire into server startup and register
the gRPC service. Add CreateRetryPolicy/DeleteRetryPolicy permissions.

Client: Add pkg/client/retrypolicy with Create/Update/Delete/Get/GetAll
functions matching the queue client pattern.

CLI: Add armadactl commands for create/update/delete/get retry-policy
and get retry-policies, all using file-based input for create/update.
Add --retry-policy flag to queue create and update commands.
Add RetryPolicy as a valid ResourceKind for file-based creation.

Signed-off-by: Dejan Zele Pejchev <pejchev@gmail.com>
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
@dejanzele dejanzele force-pushed the retry-policy-crud branch from 07bffb3 to aed9ba2 Compare April 7, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant