|
1 | | -# Multi-Project Controller Architecture |
| 1 | +# Multi-Project Controller |
2 | 2 |
|
3 | | -## Overview |
4 | | - |
5 | | -The multi-project controller enables Kubernetes ingress-gce to manage Network Endpoint Groups (NEGs) across multiple Google Cloud Platform (GCP) projects. This allows for multi-tenant scenarios where different namespaces or services can be associated with different GCP projects through ProviderConfig resources. |
| 3 | +Enables ingress-gce to manage GCP resources across multiple projects through ProviderConfig CRs. |
6 | 4 |
|
7 | 5 | ## Architecture |
8 | 6 |
|
9 | | -### Core Components |
10 | | - |
11 | 7 | ``` |
12 | 8 | ┌─────────────────────────────────────────────────────────────┐ |
13 | | -│ Main Process │ |
| 9 | +│ Main Controller │ |
14 | 10 | │ │ |
15 | 11 | │ ┌────────────────────────────────────────────────────────┐ │ |
16 | | -│ │ start.Start() │ │ |
17 | | -│ │ - Creates base SharedIndexInformers │ │ |
18 | | -│ │ - Starts informers with globalStopCh │ │ |
19 | | -│ │ - Creates ProviderConfigController │ │ |
20 | | -│ └────────────────────┬───────────────────────────────────┘ │ |
21 | | -│ │ │ |
22 | | -│ ┌────────────────────▼───────────────────────────────────┐ │ |
23 | | -│ │ ProviderConfigController │ │ |
24 | | -│ │ - Watches ProviderConfig resources │ │ |
25 | | -│ │ - Manages lifecycle of per-PC controllers │ │ |
26 | | -│ └────────────────────┬───────────────────────────────────┘ │ |
27 | | -│ │ │ |
28 | | -│ ┌────────────────────▼───────────────────────────────────┐ │ |
29 | | -│ │ ProviderConfigControllersManager │ │ |
30 | | -│ │ - Starts/stops NEG controllers per ProviderConfig │ │ |
31 | | -│ │ - Manages controller lifecycle │ │ |
32 | | -│ └──────────┬─────────────────────┬───────────────────────┘ │ |
33 | | -│ │ │ │ |
34 | | -│ ┌────────▼──────────┐ ┌───────▼──────────┐ │ |
35 | | -│ │ NEG Controller #1 │ │ NEG Controller #2 │ ... │ |
36 | | -│ │ (ProviderConfig A) │ │ (ProviderConfig B) │ │ |
37 | | -│ └────────────────────┘ └──────────────────┘ │ |
| 12 | +│ │ Shared Kubernetes Informers │ │ |
| 13 | +│ │ (Services, Ingresses, EndpointSlices) │ │ |
| 14 | +│ └─────────────────────┬──────────────────────────────────┘ │ |
| 15 | +│ │ │ |
| 16 | +│ ┌─────────────────────▼──────────────────────────────────┐ │ |
| 17 | +│ │ ProviderConfig Controller │ │ |
| 18 | +│ │ Watches ProviderConfig resources │ │ |
| 19 | +│ │ Manages per-project controllers │ │ |
| 20 | +│ └─────────────────────┬──────────────────────────────────┘ │ |
| 21 | +│ │ │ |
| 22 | +│ ┌────────────────┼────────────────┐ │ |
| 23 | +│ │ │ │ │ |
| 24 | +│ ┌────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐ │ |
| 25 | +│ │Project A │ │ Project B │ │ Project C │ ... │ |
| 26 | +│ │Controller│ │Controller │ │Controller │ │ |
| 27 | +│ └──────────┘ └───────────┘ └───────────┘ │ |
38 | 28 | └─────────────────────────────────────────────────────────────┘ |
39 | 29 | ``` |
40 | 30 |
|
41 | | -### Key Design Principles |
42 | | - |
43 | | -1. **Shared Informers**: Base informers are created once and shared across all ProviderConfig controllers |
44 | | -2. **Filtered Views**: Each NEG controller gets a filtered view of resources based on ProviderConfig |
45 | | -3. **Lifecycle Management**: Controllers can be started/stopped independently as ProviderConfigs are added/removed |
46 | | -4. **Channel Management**: Proper channel lifecycle ensures clean shutdown and resource cleanup |
47 | | - |
48 | | -## Component Details |
49 | | - |
50 | | -### start/start.go |
51 | | -Main entry point that: |
52 | | -- Creates base SharedIndexInformers via InformerSet (no factories) |
53 | | -- Starts all informers with the global stop channel |
54 | | -- Creates the ProviderConfigController |
55 | | -- Manages leader election (when enabled) |
56 | | - |
57 | | -### controller/controller.go |
58 | | -ProviderConfigController that: |
59 | | -- Watches ProviderConfig resources |
60 | | -- Enqueues changes for processing |
61 | | -- Delegates to ProviderConfigControllersManager |
62 | | - |
63 | | -### manager/manager.go |
64 | | -ProviderConfigControllersManager that: |
65 | | -- Maintains a map of active controllers per ProviderConfig |
66 | | -- Starts NEG controllers when ProviderConfigs are added |
67 | | -- Stops NEG controllers when ProviderConfigs are deleted |
68 | | -- Manages finalizers for cleanup |
69 | | - |
70 | | -### neg/neg.go |
71 | | -NEG controller factory that: |
72 | | -- Wraps base SharedIndexInformers with provider-config filters via ProviderConfigFilteredInformer |
73 | | -- Sets up the NEG controller with proper GCE client |
74 | | -- Manages channel lifecycle (globalStopCh vs providerConfigStopCh) |
75 | | - |
76 | | -### filteredinformer/ |
77 | | -Filtered informer implementation that: |
78 | | -- Wraps base SharedIndexInformers |
79 | | -- Filters resources based on ProviderConfig labels |
80 | | -- Provides filtered cache/store views |
81 | | - |
82 | | -## Channel Lifecycle |
83 | | - |
84 | | -The implementation uses three types of channels: |
85 | | - |
86 | | -1. **globalStopCh**: Process-wide shutdown signal |
87 | | - - Closes on leader election loss or process termination |
88 | | - - Used by base informers and shared resources |
89 | | - |
90 | | -2. **providerConfigStopCh**: Per-ProviderConfig shutdown signal |
91 | | - - Closed when a ProviderConfig is deleted |
92 | | - - Used to stop PC-specific controllers |
93 | | - |
94 | | -3. **joinedStopCh**: Combined shutdown signal |
95 | | - - Closes when either globalStopCh OR providerConfigStopCh closes |
96 | | - - Used by PC-specific resources that should stop in either case |
| 31 | +## Key Concepts |
97 | 32 |
|
98 | | -## Resource Filtering |
| 33 | +- **ProviderConfig**: CR defining a GCP project configuration |
| 34 | +- **Resource Filtering**: Resources are associated via labels; each controller sees only its labeled resources |
| 35 | +- **Shared Informers**: Base informers are created once and shared; controllers get filtered views |
| 36 | +- **Dynamic Lifecycle**: Controllers start/stop with ProviderConfig create/delete |
99 | 37 |
|
100 | | -Resources are associated with ProviderConfigs through labels: |
101 | | -- Services, Ingresses, etc. have a label indicating their ProviderConfig |
102 | | -- The filtered informer only passes through resources matching the PC name |
103 | | -- This ensures each controller only sees and manages its own resources |
| 38 | +## Usage |
104 | 39 |
|
105 | | -## Informer Lifecycle |
| 40 | +### Create ProviderConfig |
106 | 41 |
|
107 | | -### Creation |
108 | | -1. Base informers are created via `InformerSet` using `NewXInformer()` functions |
109 | | -2. Base informers are started by `InformerSet.Start` with `globalStopCh` |
110 | | -3. Filtered wrappers are created per ProviderConfig using `ProviderConfigFilteredInformer` |
111 | | - |
112 | | -### Synchronization |
113 | | -- `InformerSet.Start` waits for base informer caches to sync |
114 | | -- Filtered informers rely on the synced base caches |
115 | | -- Controllers use `CombinedHasSynced()` from filtered informers before processing |
116 | | - |
117 | | -### Shutdown |
118 | | -- Base informers stop when globalStopCh closes |
119 | | -- Filtered informers are just wrappers (no separate shutdown) |
120 | | -- Controllers stop when their providerConfigStopCh closes |
121 | | - |
122 | | -## Configuration |
123 | | - |
124 | | -Key configuration flags: |
125 | | -- `--provider-config-name-label-key`: Label key for PC association (default: cloud.gke.io/provider-config-name) |
126 | | -- `--multi-project-owner-label-key`: Label key for PC owner |
127 | | -- `--resync-period`: Informer resync period |
128 | | - |
129 | | -## Testing |
130 | | - |
131 | | -### Unit Tests |
132 | | -- Controller logic testing |
133 | | -- Filter functionality testing |
134 | | -- Channel lifecycle testing |
135 | | - |
136 | | -### Integration Tests |
137 | | -- Multi-ProviderConfig scenarios |
138 | | -- Controller start/stop sequencing |
139 | | -- Resource cleanup verification |
140 | | - |
141 | | -### Key Test Scenarios |
142 | | -1. Single ProviderConfig with services |
143 | | -2. Multiple ProviderConfigs |
144 | | -3. ProviderConfig deletion and cleanup |
145 | | -4. Shared informer survival across PC changes |
146 | | - |
147 | | -## Common Operations |
148 | | - |
149 | | -### Adding a ProviderConfig |
150 | | -1. Create ProviderConfig resource |
151 | | -2. Controller detects addition |
152 | | -3. Manager starts NEG controller |
153 | | -4. NEG controller creates filtered informers |
154 | | -5. NEGs are created in target GCP project |
155 | | - |
156 | | -### Removing a ProviderConfig |
| 42 | +```yaml |
| 43 | +apiVersion: networking.gke.io/v1 |
| 44 | +kind: ProviderConfig |
| 45 | +metadata: |
| 46 | + name: team-a-project |
| 47 | +spec: |
| 48 | + projectID: team-a-gcp-project |
| 49 | + network: team-a-network |
| 50 | +``` |
157 | 51 |
|
158 | | -The deletion process follows a specific sequence to ensure proper cleanup: |
| 52 | +### Associate Resources |
| 53 | +
|
| 54 | +```yaml |
| 55 | +apiVersion: v1 |
| 56 | +kind: Service |
| 57 | +metadata: |
| 58 | + name: my-service |
| 59 | + labels: |
| 60 | + ${PROVIDER_CONFIG_LABEL_KEY}: provider-config-a |
| 61 | +spec: |
| 62 | + # Service spec... |
| 63 | +``` |
159 | 64 |
|
160 | | -1. **External automation initiates deletion**: |
161 | | - - Server-side automation triggers the deletion process |
162 | | - - All namespaces belonging to the ProviderConfig are deleted first |
| 65 | +## Operations |
163 | 66 |
|
164 | | -2. **Namespace cleanup**: |
165 | | - - Kubernetes deletes all resources within the namespaces |
166 | | - - Services are deleted, triggering NEG cleanup |
167 | | - - NEG controller removes NEGs from GCP as services are deleted |
| 67 | +### Adding a Project |
| 68 | +1. Create ProviderConfig |
| 69 | +2. Label services/ingresses with PC name |
| 70 | +3. NEGs created in target project |
168 | 71 |
|
169 | | -3. **Wait for namespace deletion**: |
170 | | - - External automation waits for all namespaces to be fully deleted |
171 | | - - This ensures all NEGs and other resources are cleaned up |
| 72 | +### Removing a Project |
| 73 | +1. Remove/relabel services using the PC |
| 74 | +2. Wait for NEG cleanup |
| 75 | +3. Delete ProviderConfig |
172 | 76 |
|
173 | | -4. **ProviderConfig deletion**: |
174 | | - - Only after namespaces are gone, ProviderConfig is deleted |
175 | | - - Controller stops the NEG controller for this ProviderConfig |
176 | | - - Finalizer is removed from ProviderConfig |
177 | | - - ProviderConfig resource is removed from Kubernetes |
| 77 | +## Guarantees |
178 | 78 |
|
179 | | -**Important**: NEGs are not automatically deleted when a ProviderConfig is removed. They are cleaned up as part of the namespace/service deletion process that happens before ProviderConfig deletion. |
| 79 | +- Controllers only manage explicitly labeled resources |
| 80 | +- One controller per ProviderConfig |
| 81 | +- Base infrastructure survives individual controller failures |
| 82 | +- PC deletion doesn't affect other projects |
0 commit comments