Skip to content

Commit b729eeb

Browse files
Create regional-go-reconciler and a reconciler dashboard. (#1089)
This creates a new streamlined mechanism for creating reconcilers. The `reconciler` dashboard makes it easy to create a single dashboard for the workqueue pieces as well as the reconciler service itself. It also supports legacy services (e.g. syncer) that were stood up with separate workqueue/syncer modules, but may want to take advantage of this. Based on: #1088 --------- Signed-off-by: Matt Moore <[email protected]> Co-authored-by: octo-sts[bot] <157150467+octo-sts[bot]@users.noreply.github.com>
1 parent fb9f858 commit b729eeb

File tree

8 files changed

+834
-0
lines changed

8 files changed

+834
-0
lines changed
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Reconciler Dashboard Module
2+
3+
This module creates a comprehensive dashboard for monitoring a reconciler that combines a workqueue with a reconciler service. It displays metrics from both the workqueue infrastructure (receiver and dispatcher) and the reconciler service itself.
4+
5+
## Usage
6+
7+
```hcl
8+
module "my-reconciler-dashboard" {
9+
source = "chainguard-dev/terraform-infra-common//modules/dashboard/reconciler"
10+
11+
project_id = var.project_id
12+
name = "my-reconciler"
13+
14+
# Optional: Override service names if different from defaults
15+
# service_name = "custom-reconciler-name" # defaults to ${name}-rec
16+
# workqueue_name = "custom-workqueue-name" # defaults to ${name}-wq
17+
18+
# Workqueue configuration
19+
max_retry = 100
20+
concurrent_work = 20
21+
scope = "global"
22+
23+
# Optional sections
24+
sections = {
25+
github = false
26+
}
27+
28+
notification_channels = [var.notification_channel]
29+
}
30+
```
31+
32+
## Features
33+
34+
The dashboard includes:
35+
36+
### Workqueue Metrics
37+
- **Workqueue State**: Work in progress, queued, added, deduplication rates, completion attempts
38+
- **Processing Metrics**: Process latency, wait latency, time to completion
39+
- **Dead Letter Queue**: Failed tasks monitoring
40+
41+
### Reconciler Service Metrics
42+
- **Error Reporting**: Error tracking and reporting for the reconciler (collapsed by default)
43+
- **Service Logs**: Reconciler service logs
44+
- **gRPC Metrics**: RPC rates, latencies, error rates
45+
- **GitHub API Metrics**: API usage and rate limiting (optional)
46+
- **Resources**: CPU, memory, and other resource utilization
47+
48+
## Variables
49+
50+
| Name | Description | Default |
51+
|------|-------------|---------|
52+
| `project_id` | The GCP project ID | Required |
53+
| `name` | Base name for the reconciler | Required |
54+
| `service_name` | Reconciler service name | `${name}-rec` |
55+
| `workqueue_name` | Workqueue name | `${name}-wq` |
56+
| `max_retry` | Maximum retry attempts for tasks | `100` |
57+
| `concurrent_work` | Concurrent work items | `20` |
58+
| `scope` | Workqueue scope (regional/global) | `global` |
59+
| `sections` | Optional dashboard sections | See variables.tf |
60+
| `notification_channels` | Alert notification channels | `[]` |
61+
62+
## Outputs
63+
64+
| Name | Description |
65+
|------|-------------|
66+
| `json` | The dashboard JSON configuration |
67+
68+
## Integration with regional-go-reconciler
69+
70+
This dashboard module is designed to work seamlessly with the `regional-go-reconciler` module:
71+
72+
```hcl
73+
module "my-reconciler" {
74+
source = "chainguard-dev/terraform-infra-common//modules/regional-go-reconciler"
75+
# ... configuration ...
76+
}
77+
78+
module "my-reconciler-dashboard" {
79+
source = "chainguard-dev/terraform-infra-common//modules/dashboard/reconciler"
80+
81+
project_id = var.project_id
82+
name = "my-reconciler" # Same base name as the reconciler
83+
max_retry = module.my-reconciler.max-retry
84+
concurrent_work = module.my-reconciler.concurrent-work
85+
scope = "global"
86+
}
87+
<!-- BEGIN_TF_DOCS -->
88+
## Requirements
89+
90+
No requirements.
91+
92+
## Providers
93+
94+
No providers.
95+
96+
## Modules
97+
98+
| Name | Source | Version |
99+
|------|--------|---------|
100+
| <a name="module_alerts"></a> [alerts](#module\_alerts) | ../sections/alerts | n/a |
101+
| <a name="module_dashboard"></a> [dashboard](#module\_dashboard) | ../ | n/a |
102+
| <a name="module_errgrp"></a> [errgrp](#module\_errgrp) | ../sections/errgrp | n/a |
103+
| <a name="module_github"></a> [github](#module\_github) | ../sections/github | n/a |
104+
| <a name="module_grpc"></a> [grpc](#module\_grpc) | ../sections/grpc | n/a |
105+
| <a name="module_layout"></a> [layout](#module\_layout) | ../sections/layout | n/a |
106+
| <a name="module_reconciler-logs"></a> [reconciler-logs](#module\_reconciler-logs) | ../sections/logs | n/a |
107+
| <a name="module_resources"></a> [resources](#module\_resources) | ../sections/resources | n/a |
108+
| <a name="module_width"></a> [width](#module\_width) | ../sections/width | n/a |
109+
| <a name="module_workqueue-state"></a> [workqueue-state](#module\_workqueue-state) | ../sections/workqueue | n/a |
110+
111+
## Resources
112+
113+
No resources.
114+
115+
## Inputs
116+
117+
| Name | Description | Type | Default | Required |
118+
|------|-------------|------|---------|:--------:|
119+
| <a name="input_alerts"></a> [alerts](#input\_alerts) | Map of alert names to alert configurations | <pre>map(object({<br/> displayName = string<br/> documentation = string<br/> userLabels = map(string)<br/> project = string<br/> notificationChannel = string<br/> }))</pre> | `{}` | no |
120+
| <a name="input_concurrent_work"></a> [concurrent\_work](#input\_concurrent\_work) | The amount of concurrent work the workqueue dispatches | `number` | `20` | no |
121+
| <a name="input_labels"></a> [labels](#input\_labels) | Additional labels to add to the dashboard | `map(string)` | `{}` | no |
122+
| <a name="input_max_retry"></a> [max\_retry](#input\_max\_retry) | The maximum number of retry attempts for workqueue tasks | `number` | `100` | no |
123+
| <a name="input_name"></a> [name](#input\_name) | The name of the reconciler (base name without suffixes) | `string` | n/a | yes |
124+
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channels for alerts | `list(string)` | `[]` | no |
125+
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The GCP project ID | `string` | n/a | yes |
126+
| <a name="input_scope"></a> [scope](#input\_scope) | The scope of the workqueue (regional or global) | `string` | `"global"` | no |
127+
| <a name="input_sections"></a> [sections](#input\_sections) | Configure visibility of optional dashboard sections | <pre>object({<br/> github = optional(bool, false)<br/> })</pre> | `{}` | no |
128+
| <a name="input_service_name"></a> [service\_name](#input\_service\_name) | The name of the reconciler service (defaults to name-rec) | `string` | `""` | no |
129+
| <a name="input_workqueue_name"></a> [workqueue\_name](#input\_workqueue\_name) | The name of the workqueue (defaults to name-wq) | `string` | `""` | no |
130+
131+
## Outputs
132+
133+
| Name | Description |
134+
|------|-------------|
135+
| <a name="output_json"></a> [json](#output\_json) | n/a |
136+
<!-- END_TF_DOCS -->
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
/*
2+
Copyright 2025 Chainguard, Inc.
3+
SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
locals {
7+
service_name = var.service_name != "" ? var.service_name : "${var.name}-rec"
8+
workqueue_name = var.workqueue_name != "" ? var.workqueue_name : "${var.name}-wq"
9+
}
10+
11+
// Workqueue metrics section
12+
module "workqueue-state" {
13+
source = "../sections/workqueue"
14+
15+
title = "Workqueue State"
16+
service_name = local.workqueue_name
17+
max_retry = var.max_retry
18+
concurrent_work = var.concurrent_work
19+
scope = var.scope
20+
filter = []
21+
collapsed = false
22+
}
23+
24+
// Reconciler service sections
25+
module "errgrp" {
26+
source = "../sections/errgrp"
27+
title = "Reconciler Error Reporting"
28+
project_id = var.project_id
29+
service_name = local.service_name
30+
collapsed = true
31+
}
32+
33+
module "reconciler-logs" {
34+
source = "../sections/logs"
35+
title = "Reconciler Logs"
36+
filter = ["resource.labels.service_name=\"${local.service_name}\""]
37+
cloudrun_type = "service"
38+
}
39+
40+
module "grpc" {
41+
source = "../sections/grpc"
42+
title = "GRPC"
43+
filter = []
44+
service_name = local.service_name
45+
}
46+
47+
module "github" {
48+
source = "../sections/github"
49+
title = "GitHub API"
50+
filter = []
51+
}
52+
53+
module "resources" {
54+
source = "../sections/resources"
55+
title = "Reconciler Resources"
56+
filter = []
57+
cloudrun_name = local.service_name
58+
cloudrun_type = "service"
59+
notification_channels = var.notification_channels
60+
}
61+
62+
module "alerts" {
63+
for_each = var.alerts
64+
65+
source = "../sections/alerts"
66+
alert = each.value
67+
title = "Alert: ${each.key}"
68+
}
69+
70+
module "width" { source = "../sections/width" }
71+
72+
module "layout" {
73+
source = "../sections/layout"
74+
sections = concat(
75+
[for x in keys(var.alerts) : module.alerts[x].section],
76+
[
77+
module.workqueue-state.section,
78+
module.errgrp.section,
79+
module.reconciler-logs.section,
80+
module.grpc.section,
81+
],
82+
var.sections.github ? [module.github.section] : [],
83+
[module.resources.section],
84+
)
85+
}
86+
87+
module "dashboard" {
88+
source = "../"
89+
90+
object = {
91+
displayName = "Reconciler: ${var.name}"
92+
labels = merge({
93+
"service" : ""
94+
"reconciler" : ""
95+
}, var.labels)
96+
dashboardFilters = [
97+
{
98+
# for GCP Cloud Run built-in metrics
99+
filterType = "RESOURCE_LABEL"
100+
stringValue = local.service_name
101+
labelKey = "service_name"
102+
},
103+
{
104+
# for Prometheus user added metrics
105+
filterType = "METRIC_LABEL"
106+
stringValue = local.service_name
107+
labelKey = "service_name"
108+
},
109+
]
110+
111+
// https://cloud.google.com/monitoring/api/ref_v3/rest/v1/projects.dashboards#mosaiclayout
112+
mosaicLayout = {
113+
columns = module.width.size
114+
tiles = module.layout.tiles,
115+
}
116+
}
117+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
/*
2+
Copyright 2025 Chainguard, Inc.
3+
SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
output "json" {
7+
value = module.dashboard.json
8+
}
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
/*
2+
Copyright 2025 Chainguard, Inc.
3+
SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
variable "project_id" {
7+
description = "The GCP project ID"
8+
type = string
9+
}
10+
11+
variable "name" {
12+
description = "The name of the reconciler (base name without suffixes)"
13+
type = string
14+
}
15+
16+
variable "service_name" {
17+
description = "The name of the reconciler service (defaults to name-rec)"
18+
type = string
19+
default = ""
20+
}
21+
22+
variable "workqueue_name" {
23+
description = "The name of the workqueue (defaults to name-wq)"
24+
type = string
25+
default = ""
26+
}
27+
28+
// Workqueue configuration
29+
variable "max_retry" {
30+
description = "The maximum number of retry attempts for workqueue tasks"
31+
type = number
32+
default = 100
33+
}
34+
35+
variable "concurrent_work" {
36+
description = "The amount of concurrent work the workqueue dispatches"
37+
type = number
38+
default = 20
39+
}
40+
41+
variable "scope" {
42+
description = "The scope of the workqueue (regional or global)"
43+
type = string
44+
default = "global"
45+
}
46+
47+
// Section visibility
48+
variable "sections" {
49+
description = "Configure visibility of optional dashboard sections"
50+
type = object({
51+
github = optional(bool, false)
52+
})
53+
default = {}
54+
}
55+
56+
// Alert configuration
57+
variable "alerts" {
58+
description = "Map of alert names to alert configurations"
59+
type = map(object({
60+
displayName = string
61+
documentation = string
62+
userLabels = map(string)
63+
project = string
64+
notificationChannel = string
65+
}))
66+
default = {}
67+
}
68+
69+
variable "notification_channels" {
70+
description = "List of notification channels for alerts"
71+
type = list(string)
72+
default = []
73+
}
74+
75+
variable "labels" {
76+
description = "Additional labels to add to the dashboard"
77+
type = map(string)
78+
default = {}
79+
}

0 commit comments

Comments
 (0)