Skip to content

admin: Support creating incidents and adding serials#8740

Open
beautifulentropy wants to merge 1 commit intomainfrom
admin-tool-sa-incidents
Open

admin: Support creating incidents and adding serials#8740
beautifulentropy wants to merge 1 commit intomainfrom
admin-tool-sa-incidents

Conversation

@beautifulentropy
Copy link
Copy Markdown
Member

@beautifulentropy beautifulentropy commented Apr 29, 2026

Add a StorageAuthorityAdmin gRPC service and four admin subcommands that move incident management into the admin tool. Register the service only for the admin client and write through a new incidents_sa_admin MySQL user; the existing incidents_sa user stays SELECT-only.

Incidents can impact tens to hundreds of millions of certificates, and operators may load serials from multiple overlapping time spans during the affected window. admin load-incident-serials reads serials from a file, fans them out across N workers (default 10), and streams them to the SA in batches of 10,000. The server re-batches at the same threshold and uses INSERT IGNORE on insertion, so overlapping bulk loads and within-batch duplicates are idempotent.

The remaining subcommands cover CRUD on the incidents table:

  • admin create-incident adds a row and creates the per-incident serials table.
  • admin list-incidents prints each incident's name, enabled state, renewBy, and URL.
  • admin update-incident changes any subset of url, renewBy, and enabled on an incident by name.

Fixes #6943

@beautifulentropy beautifulentropy force-pushed the admin-tool-sa-incidents branch 7 times, most recently from da7c02c to 5392654 Compare April 29, 2026 20:59
@beautifulentropy beautifulentropy force-pushed the admin-tool-sa-incidents branch from 5392654 to 24c16e5 Compare April 29, 2026 21:01
@beautifulentropy beautifulentropy marked this pull request as ready for review April 29, 2026 21:10
@beautifulentropy beautifulentropy requested a review from a team as a code owner April 29, 2026 21:10
@github-actions
Copy link
Copy Markdown
Contributor

@beautifulentropy, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

Copy link
Copy Markdown
Contributor

@aarongable aarongable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a detailed review of admin/incident_test.go or saa_test.go, but I've done a pass on everything else.

Comment thread cmd/admin/dryrun.go
var _ saAdminClient = (*dryRunSAAdmin)(nil)

func (d dryRunSAAdmin) CreateIncident(_ context.Context, req *sapb.CreateIncidentRequest, _ ...grpc.CallOption) (*sapb.Incident, error) {
d.log.Infof("dry-run: Create incident %q (url=%q, renewBy=%s)", req.SerialTable, req.Url, req.RenewBy.AsTime())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up that the blog PR totally overhauls how the dry-run clients above log, so keep an eye on that depending on whether this lands before or after the blog PR.

Comment thread sa/proto/sa.proto
// incidents row identified by serialTable. An empty url, nil renewBy, and
// nil enabled all mean "leave that field alone"; at least one of them must
// be set.
message UpdateIncidentRequest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to be really fancy, you could use a FieldMask to control which fields get updated, rather than relying on zero-values.

But as cool as FieldMasks are, we don't use them anywhere else, and this probably isn't the time to start. If we want to use them, we should make a concerted effort to use them everywhere. Anyway, just sharing for fun.

Comment thread cmd/admin/incident.go
return errors.New("-parallelism must be > 0")
}

file, err := os.Open(s.serialsFile)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: if s.serialsFile is -, then open stdin. Alternatively, add documentation noting that a user could pass /dev/stdin as the file to read from if they want to stream serials on stdin.

Comment thread sa/proto/sa.proto
Comment thread sa/proto/sa.proto
// StorageAuthorityAdmin exposes those SA methods exclusive to the admin tool.
service StorageAuthorityAdmin {
rpc CreateIncident(CreateIncidentRequest) returns (Incident) {}
rpc UpdateIncident(UpdateIncidentRequest) returns (google.protobuf.Empty) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not return Incident, reflecting the new state of the world, just like Create?

Comment thread sa/db/02-users_next.sql
Comment thread cmd/boulder-sa/main.go
Comment thread cmd/admin/incident.go
Comment thread sa/saa.go

var incidentTable string
var buf []string

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding an explicit check that the table exists, so we can provide a better error message if it doesn't? Totally optional, feel free to ignore.

@aarongable aarongable requested review from a team and jsha and removed request for a team April 30, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

admin: Add commands to create and enable incident tables

2 participants