Skip to content

Commit 1d387b9

Browse files
feat: added doc
1 parent 4448aa3 commit 1d387b9

File tree

1 file changed

+192
-0
lines changed

1 file changed

+192
-0
lines changed
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# ADR-012: MongoDB Migration from Bitnami
2+
3+
## Context
4+
5+
The NVSentinel platform relies on MongoDB as its primary data store for persisting health events. Previously, this was deployed via a heavily customized Bitnami Helm chart, vendored as a subchart within our `mongodb-store` component.
6+
7+
While functional, this approach presented significant operational challenges:
8+
- **Upstream Image Deprecations:** As announced in the [official Bitnami containers issue #83267](https://github.com/bitnami/containers/issues/83267), effective August 28th, 2025, Bitnami moved most versioned container images to a read-only `docker.io/bitnamilegacy` registry, with no further updates or security patches. Our deployment depended on these now-archived images, materially increasing long-term maintenance and security risk. See also: [Appsmith's guidance on Bitnami image deprecation](https://docs.appsmith.com/getting-started/setup/instance-management/bitnami-image-deprecation).
9+
- **Manual Lifecycle Management:** The Bitnami chart directly created a Kubernetes `StatefulSet`. All "Day 2" operations—such as database version upgrades, scaling, member recovery, and configuration changes—were manual, brittle, and required direct `kubectl` intervention.
10+
- **Lack of Integrated Features:** Critical production features like automated backups, point-in-time recovery (PITR), and advanced monitoring were not part of the chart. Implementing them would require building and maintaining separate, complex solutions.
11+
- **Complex Templating:** To achieve our security requirements (TLS/mTLS with `cert-manager`, X.509 authentication), we had to write complex Go template logic within our `mongodb-store` chart. This included loops to generate per-replica certificates, which was fragile and hard to maintain.
12+
13+
The goal was to modernize our database layer by adopting a Kubernetes Operator, which automates lifecycle management and provides a declarative API for managing the database.
14+
15+
## Decision: Adopt the Percona Operator for MongoDB
16+
17+
We decided to migrate from the Bitnami Helm chart to the **Percona Operator for MongoDB**. This decision was made after evaluating the available open-source MongoDB deployment options for Kubernetes, including the official MongoDB Community Operator and manual StatefulSet approaches. We concluded that Percona provided the best—and effectively the only viable—combination of enterprise-grade features, operational flexibility, and compatibility with our existing infrastructure requirements among fully open-source solutions.
18+
19+
### Options Considered
20+
21+
#### 1. Official MongoDB Community Operator
22+
23+
- **Pros:** It is the official operator from MongoDB.
24+
- **Cons:**
25+
- **Replica Set Only (no documented sharding):** The Community Operator’s primary CRD is `MongoDBCommunity`, which targets replica set deployments; there is no documented sharded-cluster management in the Community Operator. Sharding orchestration is documented for MongoDB’s Enterprise/Atlas tooling, not the Community Operator. See: MongoDB Community Operator Helm chart and operator repo overviews.
26+
- **No Integrated Backup/PITR:** The Community Operator does not provide a built-in backup controller or PITR. MongoDB’s integrated backup automation is tied to Ops Manager/Cloud Manager (Enterprise/Atlas); there is no community-equivalent controller comparable to Percona’s PBM integration.
27+
- **More Basic CRD Surface:** The Community CRD does not document first-class support for pod `sidecars` or a direct `cert-manager` Issuer binding (e.g., no `tls.issuerConf` equivalent). TLS is typically provided via Secrets you create and manage yourself, which is workable but less integrated than Percona’s model for our use case.
28+
- **Operational Gaps:** Rolling upgrades, observability, and user management can be done, but require more manual integration and/or external systems compared to Percona’s operator.
29+
30+
References:
31+
- MongoDB Helm charts (Community Operator): https://github.com/mongodb/helm-charts/tree/main/charts/community-operator
32+
- MongoDB Kubernetes Operator (Community) repository: https://github.com/mongodb/mongodb-kubernetes-operator
33+
- MongoDB Ops Manager (backup is an Enterprise feature): https://www.mongodb.com/docs/ops-manager/current/backup/
34+
35+
#### 2. Percona Operator for MongoDB
36+
37+
- **Pros:**
38+
- **Full Topology Support:** Provides native, declarative support for both replica sets and sharded clusters, ensuring a future-proof growth path.
39+
- **Integrated, Open-Source "Day 2" Features:** Comes with built-in, declarative APIs for **Percona Backup for MongoDB (PBM)** for automated backups with **Point-in-Time Recovery (PITR)**. PITR is achieved through continuous oplog archival to remote storage, allowing restoration to any specific timestamp. The operator's CRD includes native support for **sidecar containers**, which we use to deploy `mongodb_exporter` for Prometheus metrics collection directly alongside each MongoDB pod.
40+
- **Excellent Flexibility:** The Custom Resource (CRD) API is highly configurable and designed for integration. It has first-class support for adding `sidecars` and an explicit `tls.issuerConf` block for seamless `cert-manager` integration.
41+
- **Open-Source Operator & Tooling:** The operator and backup tooling (PBM) are Apache 2.0; the database server (Percona Server for MongoDB) is SSPL. See Licensing for details.
42+
43+
References:
44+
- Percona Operator docs (features, sharding, automation): https://docs.percona.com/percona-operator-for-mongodb/
45+
- Backups & PITR with PBM (Operator docs): https://docs.percona.com/percona-operator-for-mongodb/backups.html
46+
- PITR configuration (oplog archival): https://docs.percona.com/percona-operator-for-mongodb/backups.html#store-operations-logs-for-point-in-time-recovery
47+
48+
- **Cons:**
49+
- Requires another controller (the operator) running in the cluster. This is an acceptable trade-off for the automation benefits gained.
50+
51+
#### 3. Maintain Our Own Custom Helm Chart
52+
53+
- **Pros:** Maximum control over every configuration detail.
54+
- **Cons:**
55+
- **Highest Operational Burden:** We would have to manually implement and maintain:
56+
- Our own Helm chart templates for `StatefulSet`, `Service`, `PersistentVolumeClaim`, and networking
57+
- Scripted logic for all "Day 2" operations: version upgrades (requiring manual rolling restart strategies), horizontal scaling (adding/removing replica set members), configuration changes, and pod recovery
58+
- TLS certificate lifecycle management (generation, rotation, distribution to each pod)
59+
- MongoDB-specific operational knowledge embedded in scripts rather than leveraged from a battle-tested operator
60+
- **Container Image Management:** We would need to either:
61+
- Build and maintain our own MongoDB container images with security patches and updates, including a CI/CD pipeline for image builds, vulnerability scanning, and registry hosting
62+
- Or depend on upstream images (MongoDB Community, Percona Server for MongoDB, or similar) without operator-level lifecycle automation, requiring manual intervention for breaking changes or version migrations
63+
- **Ongoing Maintenance Costs:** Every MongoDB version upgrade, security patch, or operational pattern change would require custom chart updates, testing, and rollout procedures. This essentially means re-implementing operator functionality piecemeal, without the benefit of community testing, documentation, or support that comes with established operators like Percona's.
64+
65+
### Licensing and Open Source Model
66+
67+
A key factor in this decision was Percona's commitment to open source, which differs significantly from MongoDB's own licensing strategy.
68+
69+
- **Percona Operator & Tools (Apache 2.0):** The
70+
`percona-server-mongodb-operator` itself, along with key ecosystem
71+
tools like `percona-backup-mongodb`, are licensed under the permissive
72+
Apache 2.0 license. This provides maximum flexibility and avoids vendor
73+
lock-in for the management layer. References:
74+
- Operator license: https://github.com/percona/percona-server-mongodb-operator/blob/main/LICENSE
75+
- PBM license: https://github.com/percona/percona-backup-mongodb/blob/main/LICENSE
76+
- **Percona Server for MongoDB (SSPL):** The underlying database server, `Percona Server for MongoDB`, is distributed under the **Server Side Public License (SSPL)** (Percona describes it as “source-available”). Citation:
77+
- PSMDB license: https://github.com/percona/percona-server-mongodb/blob/v8.0/LICENSE-Community.txt
78+
- **MongoDB's Licensing:** MongoDB's Community Operator is Apache 2.0, but the database it deploys is SSPL. More advanced operator features (e.g., integrated backup orchestration) live behind the Enterprise/Ops Manager/Atlas ecosystem.
79+
80+
Percona's model provides a "best of both worlds" scenario: a permissively licensed, open-source management layer that provides enterprise-grade features (like backups) for free, while still using the SSPL-licensed database core. This avoids the licensing complexities and costs associated with MongoDB's enterprise offerings.
81+
82+
References:
83+
- Operator docs (features, configuration, TLS, backups, sidecars): https://docs.percona.com/percona-operator-for-mongodb/index.html
84+
- Operator release notes (active maintenance cadence): https://docs.percona.com/percona-operator-for-mongodb/RN/index.html
85+
86+
87+
88+
## Architecture & Implementation
89+
90+
The migration was implemented by making our `mongodb-store` chart a dual-backend system, capable of deploying either the old Bitnami chart or the new Percona stack based on a boolean flag. This de-risked the migration and showcased Percona's adaptability.
91+
92+
### 1. Conditional Dependencies
93+
94+
The `mongodb-store/Chart.yaml` was modified to conditionally include either Bitnami or the two Percona charts (`psmdb-operator` and `psmdb-db`):
95+
96+
```yaml
97+
# In distros/kubernetes/nvsentinel/charts/mongodb-store/Chart.yaml
98+
dependencies:
99+
- name: mongodb # Old Bitnami chart
100+
condition: mongodb-store.useBitnami
101+
- name: psmdb-operator # Percona Operator
102+
condition: mongodb-store.usePerconaOperator
103+
- name: psmdb-db # Percona Database CRD
104+
condition: mongodb-store.usePerconaOperator
105+
```
106+
107+
### 2. Declarative Configuration via Custom Resource
108+
109+
Instead of directly templating a `StatefulSet`, we now create a high-level `PerconaServerMongoDB` resource. This resource captures our intent, and the operator handles the low-level implementation.
110+
111+
**Key Integrations from `values.yaml`:**
112+
113+
- **`cert-manager` Integration:** We continue to use `cert-manager` to manage TLS certificates. Client certificates are generated via cert-manager `Certificate` resources that reference our `mongodb-psmdb-issuer`. The Percona Operator is configured to use TLS mode, and certificates are provided to the pods via Kubernetes Secrets created by cert-manager.
114+
```yaml
115+
tls:
116+
mode: requireTLS
117+
```
118+
119+
- **First-Class Sidecar Support for Metrics:** Our `mongodb_exporter` for Prometheus was cleanly integrated using the operator's native `sidecars` API.
120+
```yaml
121+
replsets:
122+
rs0:
123+
sidecars:
124+
- name: mongodb-exporter
125+
image: percona/mongodb_exporter:0.40.0
126+
args:
127+
- --discovering-mode
128+
- --compatible-mode
129+
- --collect-all
130+
- --web.listen-address=:9216
131+
- --mongodb.direct-connect
132+
ports:
133+
- name: metrics
134+
containerPort: 9216
135+
```
136+
137+
### 3. Shift in TLS Management
138+
139+
**Bitnami Approach (Previous):**
140+
- Used cert-manager to generate **per-replica server certificates** via a Go template loop (e.g., `mongo-server-cert-0`, `mongo-server-cert-1`, `mongo-server-cert-2`)
141+
- Also used cert-manager to generate **client certificates** for application connectivity
142+
- Both server and client certificates referenced our custom `mongo-ca-issuer`
143+
144+
**Percona Approach (Current):**
145+
- The Percona Operator **auto-generates server certificates** internally when `tls.mode: requireTLS` is set, eliminating the need for per-replica certificate templates
146+
- We continue using cert-manager to generate **client certificates** (`mongo-app-client-cert`, `mongo-dgxcops-client-cert`) that reference our custom `mongodb-psmdb-issuer`
147+
- This hybrid approach simplifies our Helm templates by removing the complex per-replica server certificate loop while maintaining cert-manager integration for client authentication
148+
149+
**Simplification Achieved:**
150+
The removal of the per-replica server certificate generation (the `{{- range $i := until $replicaCount }}` loop in `certmanager.yaml`) significantly reduced template complexity. The operator now handles server certificate provisioning and rotation automatically, while we retain full control over client certificate issuance via our existing cert-manager infrastructure.
151+
152+
### 4. Architectural Shift Summary
153+
154+
| Aspect | Old (Bitnami) | New (Percona Operator) |
155+
| :--- | :--- | :--- |
156+
| **Control Model** | **Imperative:** Manually define a `StatefulSet`. | **Declarative:** Define a `PerconaServerMongoDB` resource; the operator builds the `StatefulSet`.|
157+
| **Lifecycle** | **Manual:** Upgrades, scaling, and recovery are manual `kubectl` tasks. | **Automated:** The operator handles rolling upgrades, scaling, and pod self-healing. |
158+
| **Backups** | **None:** Required a separate, custom-built solution. | **Integrated:** Declarative, scheduled backups and PITR via the `backup` block in the CRD. |
159+
| **Integration** | **Brittle:** Required complex template logic in our chart to inject features. | **Flexible:** Native support for `sidecars` via a purpose-built CRD API. Server TLS certificates are auto-generated by the operator; client certificates continue to use our existing cert-manager infrastructure. |
160+
161+
## Consequences
162+
163+
### Positive Outcomes
164+
165+
1. **Reduced Operational Overhead:**
166+
- **Automated Lifecycle Management:** The operator handles rolling upgrades, scaling (both horizontal and vertical), and pod self-healing without manual `kubectl` intervention.
167+
- **Simplified Helm Templates:** Removed 50+ lines of complex Go template logic for per-replica certificate generation (the `{{- range $i := until $replicaCount }}` loop in `certmanager.yaml`).
168+
- **Declarative Configuration:** Changed from imperative `StatefulSet` definitions to declarative `PerconaServerMongoDB` custom resources, making intent clearer and reducing configuration drift.
169+
170+
2. **Enhanced Production Readiness:**
171+
- **Integrated Backup Solution:** Clear path to enabling automated backups and Point-in-Time Recovery (PITR) via Percona Backup for MongoDB (PBM), eliminating the need to build a custom backup solution.
172+
- **Native Metrics Export:** `mongodb_exporter` runs as a sidecar on each MongoDB pod, providing per-replica metrics via a clean CRD API rather than requiring a separate Deployment.
173+
174+
3. **Future-Proof Architecture:**
175+
- **Sharding Support:** While currently using a 3-node replica set, the operator provides native sharding capabilities if we need to scale beyond a single replica set's capacity.
176+
- **Active Maintenance:** Percona Operator receives regular updates (latest: 1.21.1, released October 2025) with MongoDB 8.0 support, ensuring compatibility with current MongoDB versions.
177+
- **Escape from Deprecated Images:** No longer dependent on `docker.io/bitnamilegacy` images that moved to a read-only registry in August 2025.
178+
179+
### Trade-offs and Considerations
180+
181+
1. **MongoDB-Specific Operator Lock-in:**
182+
- **Consideration:** We're now dependent on Percona's operator for database management. If we want to switch to a different operator in the future, that would require another migration.
183+
- **Mitigation:** Percona Operator is open source (Apache 2.0), actively maintained, and has a strong community.
184+
185+
## References
186+
187+
- Percona Operator for MongoDB documentation (features, configuration, TLS, backups, sidecars): https://docs.percona.com/percona-operator-for-mongodb/index.html
188+
- Percona Operator for MongoDB release notes (active maintenance cadence): https://docs.percona.com/percona-operator-for-mongodb/RN/index.html
189+
- Percona Server for MongoDB 8.0 release notes (source-available build, compatibility): https://docs.percona.com/percona-server-for-mongodb/8.0/release_notes/8.0.12-4.html
190+
- Percona Operator license (Apache 2.0): https://github.com/percona/percona-server-mongodb-operator/blob/main/LICENSE
191+
- Percona Backup for MongoDB (PBM) license (Apache 2.0): https://github.com/percona/percona-backup-mongodb/blob/main/LICENSE
192+
- Percona Server for MongoDB license (SSPL): https://github.com/percona/percona-server-mongodb/blob/v8.0/LICENSE-Community.txt

0 commit comments

Comments
 (0)