Skip to content

Commit 19e14e2

Browse files
jzhoucliqrclaude
andcommitted
Add LXD control plane provisioning support
This commit introduces comprehensive LXD control plane provisioning functionality including: - VMHost integration for LXD container lifecycle management - Static IP assignment capabilities for LXD instances - Enhanced machine controller with LXD-specific provisioning logic - Comprehensive test suite including integration and unit tests - Design specifications and requirements documentation - Error handling and service layer improvements 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent 769064c commit 19e14e2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+10983
-28
lines changed

.kiro/specs/lxd-controlplane-provisioning/design.md

Lines changed: 514 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Requirements Document
2+
3+
## Introduction
4+
5+
This feature enhances the cluster-api-provider-maas v4.6.0-spectro to support dynamic creation of LXD virtual machines inside available LXD hosts specifically for control plane nodes. The primary goal is to improve resource efficiency by allowing smaller-sized control plane instances on large bare metal machines, enabling multiple LXD VMs per host while maintaining high availability through resource pool management and availability zone distribution. This enhancement integrates with the existing Spectro-specific features including custom endpoints, preferred subnet support, and enhanced namespace management.
6+
7+
## Requirements
8+
9+
### Requirement 1
10+
11+
**User Story:** As a cluster administrator, I want to optionally provision control plane nodes as LXD VMs instead of bare metal machines, so that I can improve resource utilization on large bare metal hosts.
12+
13+
#### Acceptance Criteria
14+
15+
1. WHEN creating a control plane node THEN the system SHALL provide an option to use LXD VM provisioning via a `provisioningMode` field
16+
2. WHEN `provisioningMode` is set to "lxd" THEN the system SHALL create a virtual machine on an available LXD host
17+
3. WHEN `provisioningMode` is set to "bare-metal" or omitted THEN the system SHALL continue to use bare metal provisioning as before
18+
4. WHEN provisioning LXD VMs THEN the system SHALL allow configuration of VM resource specifications (CPU, memory, disk) through existing `minCPU` and `minMemory` fields
19+
20+
### Requirement 2
21+
22+
**User Story:** As a cluster administrator, I want to specify a resource pool for LXD hosts, so that I can dedicate specific hosts for control plane VM provisioning and maintain resource isolation.
23+
24+
#### Acceptance Criteria
25+
26+
1. WHEN configuring LXD VM provisioning THEN the system SHALL use the existing `resourcePool` field to target LXD-capable hosts
27+
2. WHEN a resource pool is specified THEN the system SHALL only provision LXD VMs on hosts within that pool that support LXD
28+
3. WHEN no resource pool is specified THEN the system SHALL use any available LXD-capable host
29+
4. IF the specified resource pool contains no available LXD hosts THEN the system SHALL return an error with clear messaging
30+
31+
### Requirement 3
32+
33+
**User Story:** As a cluster administrator, I want LXD VMs to be distributed across multiple availability zones when the resource pool spans multiple AZs, so that I can maintain high availability for my control plane.
34+
35+
#### Acceptance Criteria
36+
37+
1. WHEN the target resource pool spans multiple availability zones THEN the system SHALL distribute LXD VMs across different AZs using existing `failureDomain` support
38+
2. WHEN provisioning multiple control plane VMs THEN the system SHALL attempt to place them in different availability zones
39+
3. WHEN an availability zone becomes unavailable THEN the remaining VMs in other AZs SHALL continue to function
40+
4. IF insufficient availability zones are available for the requested number of control plane nodes THEN the system SHALL distribute VMs as evenly as possible across available AZs
41+
42+
### Requirement 4
43+
44+
**User Story:** As a cluster administrator, I want the system to automatically select appropriate LXD hosts based on available resources, so that VM provisioning succeeds without manual host selection.
45+
46+
#### Acceptance Criteria
47+
48+
1. WHEN provisioning a LXD VM THEN the system SHALL check available resources (CPU, memory, disk) on candidate LXD hosts
49+
2. WHEN multiple suitable hosts are available THEN the system SHALL select the most appropriate host based on resource availability
50+
3. WHEN no suitable host has sufficient resources THEN the system SHALL return an error indicating resource constraints
51+
4. WHEN a selected LXD host becomes unavailable during provisioning THEN the system SHALL retry with another suitable host
52+
53+
### Requirement 5
54+
55+
**User Story:** As a developer integrating with cluster-api-provider-maas, I want the LXD VM provisioning to be transparent to existing cluster-api workflows, so that existing automation and tooling continue to work.
56+
57+
#### Acceptance Criteria
58+
59+
1. WHEN using LXD VM provisioning THEN the cluster-api Machine resource interface SHALL remain unchanged
60+
2. WHEN a LXD VM is provisioned THEN it SHALL appear as a normal Machine resource in the cluster
61+
3. WHEN querying Machine status THEN LXD VMs SHALL report status information consistent with bare metal machines
62+
4. WHEN deleting a Machine backed by LXD VM THEN the system SHALL properly clean up both the VM and any associated resources
63+
64+
### Requirement 6
65+
66+
**User Story:** As a cluster administrator using Spectro-specific features, I want LXD VM provisioning to work seamlessly with custom endpoints and preferred subnets, so that my existing configurations remain functional.
67+
68+
#### Acceptance Criteria
69+
70+
1. WHEN using custom endpoint annotations with LXD VMs THEN the system SHALL skip DNS reconciliation as with bare metal machines
71+
2. WHEN preferred subnet configuration is specified THEN LXD VMs SHALL select IP addresses from preferred subnets for DNS attachment
72+
3. WHEN namespace-scoped controllers are used THEN LXD VM provisioning SHALL respect namespace boundaries
73+
4. WHEN using custom ports for API server endpoints THEN LXD VMs SHALL properly register with the specified port
74+
75+
### Requirement 7
76+
77+
**User Story:** As a cluster administrator, I want proper error handling and logging for LXD VM operations, so that I can troubleshoot issues and monitor the provisioning process.
78+
79+
#### Acceptance Criteria
80+
81+
1. WHEN LXD VM provisioning fails THEN the system SHALL log detailed error information including host selection and resource constraints
82+
2. WHEN LXD host communication fails THEN the system SHALL retry with exponential backoff and log retry attempts
83+
3. WHEN LXD VM creation succeeds THEN the system SHALL log VM details including host location and resource allocation
84+
4. WHEN LXD VM deletion occurs THEN the system SHALL log cleanup operations and verify successful resource deallocation
85+
86+
### Requirement 8
87+
88+
**User Story:** As a system integrator, I want LXD VM provisioning to integrate with existing MAAS client patterns and error handling, so that the implementation follows established conventions.
89+
90+
#### Acceptance Criteria
91+
92+
1. WHEN implementing LXD VM support THEN the system SHALL use the existing `github.com/spectrocloud/maas-client-go` library patterns
93+
2. WHEN LXD operations fail THEN the system SHALL use existing error handling and condition reporting mechanisms
94+
3. WHEN LXD VMs are provisioned THEN the system SHALL follow existing reconciliation patterns with appropriate requeue intervals
95+
4. WHEN generating provider IDs for LXD VMs THEN the system SHALL maintain the existing format while distinguishing VM resources
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
# Implementation Plan
2+
3+
- [ ] 1. Extend MaasMachine API types with LXD configuration
4+
- Add ProvisioningMode enum and LXDConfig struct to MaasMachineSpec in api/v1beta1/maasmachine_types.go
5+
- Add LXD-specific status fields to MaasMachineStatus including LXDHost, LXDProject, and VMResourceUsage
6+
- Update kubebuilder validation tags and documentation for new fields
7+
- Generate updated CRD manifests using make manifests
8+
- _Requirements: 1.1, 1.4_
9+
10+
- [ ] 2. Create LXD data models and interfaces
11+
- [ ] 2.1 Define LXD provisioning service interface
12+
- Create pkg/maas/lxd/interfaces.go with Service interface definitions
13+
- Define VMSpec, LXDVMResult, LXDHost, and ResourceInfo structs
14+
- Add LXDCapabilities and LXDError type definitions following existing patterns
15+
- _Requirements: 1.1, 4.1_
16+
17+
- [ ] 2.2 Implement LXD error handling types
18+
- Create pkg/maas/lxd/errors.go with LXDError struct and LXDErrorType enum
19+
- Implement error wrapping methods following existing pkg/errors patterns
20+
- Add error formatting and logging helpers consistent with existing error handling
21+
- Write unit tests for error type definitions and formatting
22+
- _Requirements: 7.1, 7.2, 8.2_
23+
24+
- [ ] 3. Implement core LXD provisioning service
25+
- [ ] 3.1 Create LXD host discovery and selection logic
26+
- Implement GetAvailableLXDHosts method in pkg/maas/lxd/service.go using existing MAAS client patterns
27+
- Add resource pool filtering for LXD-capable hosts following existing resourcePool logic
28+
- Implement SelectOptimalHost algorithm based on available CPU, memory, and disk resources
29+
- Add unit tests for host selection with various resource constraint scenarios
30+
- _Requirements: 2.1, 2.2, 4.1, 4.2_
31+
32+
- [ ] 3.2 Implement availability zone distribution logic
33+
- Create DistributeAcrossAZs method using existing failureDomain support patterns
34+
- Add algorithms to evenly distribute VMs across available AZs from resource pools
35+
- Handle edge cases when insufficient AZs are available for requested VM count
36+
- Write unit tests for AZ distribution with different host and zone configurations
37+
- _Requirements: 3.1, 3.2, 3.3, 3.4_
38+
39+
- [ ] 3.3 Implement VM lifecycle management methods
40+
- Create ComposeVM method for LXD VM creation via MAAS VMHosts API following existing allocation patterns
41+
- Implement DeployVM method with cloud-init user data support using existing deployment workflow
42+
- Add GetVM method for status checking following existing machine status patterns
43+
- Implement DeleteVM method with proper cleanup validation using existing release patterns
44+
- Write unit tests for each lifecycle operation with mock MAAS client
45+
- _Requirements: 1.1, 1.2, 4.3, 5.3_
46+
47+
- [ ] 4. Enhance machine service with LXD support
48+
- [ ] 4.1 Extend existing machine service interface
49+
- Modify DeployMachine method in pkg/maas/machine/machine.go to handle provisioning mode decisions
50+
- Add deployLXDVM private method following existing deployBareMetal patterns
51+
- Create buildVMSpec method to convert MaasMachineSpec to LXD VMSpec
52+
- Update existing error handling to support both bare metal and LXD error types
53+
- _Requirements: 1.1, 1.3, 5.1_
54+
55+
- [ ] 4.2 Implement LXD VM provisioning workflow
56+
- Integrate LXD provisioning service with existing machine service patterns
57+
- Add resource validation for LXD VMs using existing minCPU and minMemory fields
58+
- Implement fallback to existing bare metal logic when LXD provisioning is not specified
59+
- Write integration tests for mixed bare metal and LXD provisioning scenarios
60+
- _Requirements: 1.1, 1.2, 4.4, 5.2_
61+
62+
- [ ] 5. Update MaasMachine controller for LXD integration
63+
- [ ] 5.1 Modify machine reconciliation logic
64+
- Update reconcileNormal method in controllers/maasmachine_controller.go to detect LXD provisioning mode
65+
- Add LXD-specific condition reporting using existing condition framework patterns
66+
- Implement status updates for LXD host information and VM resource usage
67+
- Ensure DNS attachment logic works with LXD VMs using existing reconcileDNSAttachment patterns
68+
- _Requirements: 5.1, 5.2, 5.3, 6.2_
69+
70+
- [ ] 5.2 Implement LXD-specific error handling and recovery
71+
- Add retry logic with exponential backoff for LXD host failures using existing requeue patterns
72+
- Implement host selection retry when initial host becomes unavailable
73+
- Add comprehensive logging for LXD operations following existing logging patterns
74+
- Create error recovery workflows using existing condition and status update mechanisms
75+
- _Requirements: 4.4, 7.1, 7.2, 7.3_
76+
77+
- [ ] 5.3 Update machine deletion and cleanup logic
78+
- Extend reconcileDelete method to handle LXD VM cleanup using existing release patterns
79+
- Add verification of VM resource deallocation during deletion process
80+
- Implement timeout handling for LXD VM deletion operations with existing requeue logic
81+
- Write unit tests for cleanup scenarios and error cases
82+
- _Requirements: 5.4, 7.4_
83+
84+
- [ ] 6. Add MAAS client extensions for LXD operations
85+
- [ ] 6.1 Research and implement MAAS VMHosts API integration
86+
- Investigate MAAS client library support for VMHosts operations in pkg/maas/scope/client.go
87+
- Add VM host discovery methods using existing client patterns and error handling
88+
- Implement VM composition and deployment API calls following existing machine allocation patterns
89+
- Create VM deletion and status checking API integrations with existing cleanup patterns
90+
- _Requirements: 4.1, 4.2, 8.1_
91+
92+
- [ ] 6.2 Implement LXD-specific MAAS API error handling
93+
- Add error parsing for MAAS LXD API responses using existing error handling patterns
94+
- Implement retry mechanisms for transient LXD API failures with existing requeue logic
95+
- Add timeout handling for long-running LXD operations following existing timeout patterns
96+
- Write unit tests for various MAAS API error scenarios with mock client
97+
- _Requirements: 7.1, 7.2, 8.2_
98+
99+
- [ ] 7. Create comprehensive test suite for LXD functionality
100+
- [ ] 7.1 Write unit tests for LXD provisioning service
101+
- Test host selection algorithms with different resource availability scenarios
102+
- Test AZ distribution logic with various host configurations and zone mappings
103+
- Test VM lifecycle operations with mock MAAS client following existing test patterns
104+
- Test error handling for all defined LXD error types and recovery scenarios
105+
- _Requirements: 1.1, 2.1, 3.1, 4.1_
106+
107+
- [ ] 7.2 Create integration tests for machine service LXD support
108+
- Test end-to-end LXD VM provisioning workflow with real MAAS client interactions
109+
- Test mixed bare metal and LXD deployments in same cluster scenarios
110+
- Test resource pool filtering and selection with existing resource pool configurations
111+
- Test controller reconciliation with LXD backend following existing controller test patterns
112+
- _Requirements: 1.3, 2.1, 4.1, 5.1_
113+
114+
- [ ] 7.3 Implement controller tests for LXD scenarios
115+
- Test MaasMachine reconciliation with LXD provisioning mode using existing controller test framework
116+
- Test error recovery and retry mechanisms with existing error injection patterns
117+
- Test status updates and condition reporting following existing condition test patterns
118+
- Test cleanup operations for failed LXD VMs using existing cleanup test scenarios
119+
- _Requirements: 5.1, 5.2, 5.3, 7.1_
120+
121+
- [ ] 8. Add validation and defaulting for LXD configuration
122+
- [ ] 8.1 Implement webhook validation for LXD configuration
123+
- Add validation logic in api/v1beta1/maasmachine_webhook.go following existing validation patterns
124+
- Validate LXD configuration fields when ProvisioningMode is "lxd" using existing validation framework
125+
- Add resource constraint validation for LXD VMs consistent with existing minCPU/minMemory validation
126+
- Ensure backward compatibility with existing bare metal configurations
127+
- _Requirements: 1.1, 1.4_
128+
129+
- [ ] 8.2 Add defaulting logic for LXD fields
130+
- Implement sensible defaults for LXD profile and project settings in webhook defaulting
131+
- Add default storage size configuration based on typical control plane requirements
132+
- Set ProvisioningMode default to "bare-metal" to maintain backward compatibility
133+
- Write unit tests for validation and defaulting scenarios using existing webhook test patterns
134+
- _Requirements: 1.1, 1.4_
135+
136+
- [ ] 9. Update provider ID generation and machine scope for LXD VMs
137+
- [ ] 9.1 Extend provider ID generation
138+
- Update GetProviderID method in pkg/maas/scope/machine.go to distinguish LXD VMs from bare metal
139+
- Modify provider ID format to include "maas-lxd" scheme for LXD VMs while maintaining existing format for bare metal
140+
- Ensure provider ID uniqueness across bare metal and LXD resources within failure domains
141+
- Add IsLXDProvisioning helper method to MachineScope for provisioning mode detection
142+
- _Requirements: 5.3, 8.4_
143+
144+
- [ ] 9.2 Update existing provider ID parsing logic
145+
- Review and update any provider ID parsing logic to handle both "maas://" and "maas-lxd://" schemes
146+
- Ensure SetNodeProviderID method works correctly with LXD VM provider IDs
147+
- Write tests for provider ID generation and parsing with both provisioning modes
148+
- _Requirements: 5.3_
149+
150+
- [ ] 10. Integrate with existing Spectro features for LXD VMs
151+
- [ ] 10.1 Verify custom endpoint compatibility
152+
- Test that LXD VMs respect existing custom endpoint annotations using IsCustomEndpoint logic
153+
- Ensure DNS reconciliation is properly skipped for LXD VMs when custom endpoints are configured
154+
- Verify custom port configuration works with LXD VMs in existing APIServerPort logic
155+
- _Requirements: 6.1, 6.4_
156+
157+
- [ ] 10.2 Test preferred subnet integration
158+
- Verify LXD VM IP addresses work with existing preferred subnet ConfigMap logic in GetPreferredSubnets
159+
- Test that getExternalMachineIP function works correctly with LXD VM addresses
160+
- Ensure DNS attachment uses preferred subnets for LXD VMs following existing patterns
161+
- _Requirements: 6.2_
162+
163+
- [ ] 10.3 Validate namespace scoping compatibility
164+
- Test that LXD VM provisioning respects existing namespace boundaries in controllers
165+
- Verify namespace-scoped controllers work correctly with LXD provisioning
166+
- Ensure resource isolation is maintained with existing namespace scoping logic
167+
- _Requirements: 6.3_

0 commit comments

Comments
 (0)