-
Notifications
You must be signed in to change notification settings - Fork 49
Async Backup support #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Async Backup support #134
Conversation
…eed_v1_seaweed_with_backup.yaml
…et references to seaweed types and update deepcopy functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @MagicFun1241, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the SeaweedFS Operator by introducing a dedicated Filer Backup component. This new feature allows users to configure and manage asynchronous backups of their filer data to a variety of destinations, including local storage, other filers, and popular cloud storage services. The implementation includes robust credential management via Kubernetes secrets and integrates seamlessly with the existing operator framework.
Highlights
- New Filer Backup Component: Introduced a new
FilerBackupcomponent within theSeaweedcustom resource, enabling asynchronous backup capabilities for SeaweedFS filer data. - Flexible Backup Sink Configurations: Added comprehensive support for various backup destinations, including local file systems, filer-to-filer replication, and major cloud storage providers like AWS S3, Google Cloud Storage, Azure Blob Storage, and Backblaze B2.
- Secure Credential Management: Implemented a mechanism to securely fetch cloud storage credentials from Kubernetes secrets, allowing for flexible key mapping to adapt to different secret structures.
- Kubernetes Operator Integration: Extended the operator's reconciliation logic, RBAC roles, and Helm chart to properly manage the lifecycle of the new Filer Backup component, including its ConfigMap and StatefulSet.
- Dependency and Toolchain Updates: Upgraded the Go toolchain to version 1.24 and updated numerous Go module dependencies, including the core
seaweedfslibrary and various cloud SDKs, ensuring compatibility and leveraging newer features.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for asynchronous filer backups to various storage sinks. The changes include new API types for backup configuration, RBAC updates for accessing secrets and PVCs, and the controller logic to manage backup statefulsets and configmaps.
My review focuses on improving robustness, maintainability, and configuration correctness.
| func buildStubProbeHandler() corev1.ProbeHandler { | ||
| return corev1.ProbeHandler{ | ||
| Exec: &corev1.ExecAction{ | ||
| Command: []string{ | ||
| "/bin/sh", | ||
| "-c", | ||
| "echo OK", | ||
| }, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buildStubProbeHandler function creates readiness and liveness probes that use a stub command (echo OK). This means the pod will always be considered healthy even if the weed filer.backup process has crashed or is not functioning correctly.
Please consider implementing a more meaningful health check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I noticed, filer.backup doesn't provides a place for health checking. Its don't listens HTTP/GRPC/Metrics port
…e claims and secrets in role configuration
… Google Cloud Storage, and refactor existing sink configuration logic
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces asynchronous backup capabilities for the SeaweedFS filer. The changes include new API types for configuring backups to various sinks, RBAC updates, and controller logic. The review focuses on improving security by addressing credential storage, robustness by enhancing secret handling, and correctness by fixing configuration issues and improving health probes.
| AWSAccessKeyID string `json:"awsAccessKeyID,omitempty"` | ||
|
|
||
| // AWS secret access key (if empty, loads from shared credentials file) | ||
| AWSSecretAccessKey string `json:"awsSecretAccessKey,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing credentials like AWSAccessKeyID and AWSSecretAccessKey directly in the CRD poses a security risk, as they are stored in etcd in plain text. It's safer to use awsCredentialsSecretRef to fetch them from a Kubernetes Secret. Consider deprecating these fields and adding a warning in the field documentation.
| AWSAccessKeyID string `json:"awsAccessKeyID,omitempty"` | |
| // AWS secret access key (if empty, loads from shared credentials file) | |
| AWSSecretAccessKey string `json:"awsSecretAccessKey,omitempty"` | |
| // AWS access key ID (DEPRECATED: Use awsCredentialsSecretRef instead for better security) | |
| AWSAccessKeyID string `json:"awsAccessKeyID,omitempty"` | |
| // AWS secret access key (DEPRECATED: Use awsCredentialsSecretRef instead for better security) | |
| AWSSecretAccessKey string `json:"awsSecretAccessKey,omitempty"` |
| ReadinessProbe: &corev1.Probe{ | ||
| ProbeHandler: buildStubProbeHandler(), | ||
| InitialDelaySeconds: 10, | ||
| TimeoutSeconds: 3, | ||
| PeriodSeconds: 15, | ||
| SuccessThreshold: 1, | ||
| FailureThreshold: 100, | ||
| }, | ||
| LivenessProbe: &corev1.Probe{ | ||
| ProbeHandler: buildStubProbeHandler(), | ||
| InitialDelaySeconds: 20, | ||
| TimeoutSeconds: 3, | ||
| PeriodSeconds: 30, | ||
| SuccessThreshold: 1, | ||
| FailureThreshold: 6, | ||
| }, | ||
| }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… for Azure, Backblaze, GCS, and S3 sinks
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the ability to deploy weed in "filer.backup" mode, adding support for asynchronous backups to various cloud storage providers. The API changes are well-structured.
| ReadinessProbe: &corev1.Probe{ | ||
| ProbeHandler: buildStubProbeHandler(), | ||
| InitialDelaySeconds: 10, | ||
| TimeoutSeconds: 3, | ||
| PeriodSeconds: 15, | ||
| SuccessThreshold: 1, | ||
| FailureThreshold: 100, | ||
| }, | ||
| LivenessProbe: &corev1.Probe{ | ||
| ProbeHandler: buildStubProbeHandler(), | ||
| InitialDelaySeconds: 20, | ||
| TimeoutSeconds: 3, | ||
| PeriodSeconds: 30, | ||
| SuccessThreshold: 1, | ||
| FailureThreshold: 6, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readiness and liveness probes use a stub command (echo OK), confirming only the pod's shell responsiveness, not the weed filer.backup process's health. A long-running backup job could be stuck or failed without detection.
Investigate a more robust health check mechanism, like an HTTP endpoint or a command to check its status.
…GCS, and S3 credential extraction
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces asynchronous backup support for SeaweedFS filers. I've provided feedback focusing on improving security by removing plaintext credentials from the CRD and tightening RBAC permissions. I also suggest enhancing the robustness of the backup pods with more meaningful health probes and fixing a bug in the credential extraction logic. Overall, this is a great feature, and with these changes, it will be even more secure and reliable.
| // AWS access key ID (if empty, loads from shared credentials file) | ||
| AWSAccessKeyID string `json:"awsAccessKeyID,omitempty"` | ||
|
|
||
| // AWS secret access key (if empty, loads from shared credentials file) | ||
| AWSSecretAccessKey string `json:"awsSecretAccessKey,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing credentials like AWSAccessKeyID and AWSSecretAccessKey directly in the CRD spec poses a security risk, as they are stored in plaintext in etcd. It's recommended to only support sourcing these from Kubernetes secrets. This concern applies to other sink configurations that handle sensitive credentials, such as AzureSinkConfig and BackblazeSinkConfig. Consider removing these fields and relying solely on the ...CredentialsSecretRef fields.
| if masterKey, exists := secret[mapping.B2MasterApplicationKey]; exists { | ||
| b2MasterApplicationKey = masterKey | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of issues here with fetching the Backblaze master application key:
- Bug: If
mapping.B2MasterApplicationKeyis empty, the code will try to look up an empty string key in the secret data (secret[""]), which is likely not the intended behavior. A default key name should be used, similar to howb2AccountIDKeyis handled. - Inconsistent Logging: There is no log message if a custom-mapped
B2MasterApplicationKeyis not found in the secret, which is inconsistent with the handling ofb2AccountIDand other credential extraction logic in this PR.
b2MasterApplicationKeyName := mapping.B2MasterApplicationKey
if b2MasterApplicationKeyName == "" {
b2MasterApplicationKeyName = "b2MasterApplicationKey"
}
if masterKey, exists := secret[b2MasterApplicationKeyName]; exists {
b2MasterApplicationKey = masterKey
} else if mapping.B2MasterApplicationKey != "" {
log.Info("Secret key not found in secret", "secret", backblazeConfig.BackblazeCredentialsSecretRef.Name, "mapping", b2MasterApplicationKeyName)
}| func buildStubProbeHandler() corev1.ProbeHandler { | ||
| return corev1.ProbeHandler{ | ||
| Exec: &corev1.ExecAction{ | ||
| Command: []string{ | ||
| "/bin/sh", | ||
| "-c", | ||
| "echo OK", | ||
| }, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readiness and liveness probes are using a stub echo OK command. This is not a meaningful health check. It only verifies that the shell is available in the container, not that the filer.backup process is running and healthy. This could lead to a situation where the backup process has crashed or is stuck, but the pod is still considered healthy, causing silent backup failures.
Please implement a more robust health check. If weed filer.backup does not provide a health check endpoint, consider alternatives like:
- Checking if the
weedprocess is running (pgrep weed). - Having the process touch a file periodically and checking the file's modification time.
| // +kubebuilder:rbac:groups=extensions,resources=ingresses,verbs=get;list;watch;create;update;patch;delete | ||
| // +kubebuilder:rbac:groups=networking.k8s.io,resources=ingresses,verbs=get;list;watch;create;update;patch;delete | ||
| // +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list; | ||
| // +kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClusterRole is granted list permission on secrets cluster-wide. This is a security risk as it violates the principle of least privilege. The controller logic in getSecret only ever fetches a specific secret by name and namespace using r.Get(), so get permission is sufficient. The list permission does not appear to be used and should be removed to reduce the operator's potential attack surface.
| // +kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list; | |
| // +kubebuilder:rbac:groups=core,resources=secrets,verbs=get; |
| } else { | ||
| log.Info("Secret key not found in secret", "secret", azureConfig.AzureCredentialsSecretRef.Name, "mapping", accountNameKey) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging for a missing secret key is inconsistent with other sinks like S3. Here, it logs an info message even if no custom mapping was provided and the default key is not found. Other sinks only log if a custom mapping was provided but the key was not found in the secret. For consistency, this should be changed.
| } else { | |
| log.Info("Secret key not found in secret", "secret", azureConfig.AzureCredentialsSecretRef.Name, "mapping", accountNameKey) | |
| } | |
| } else if mapping.AccountName != "" { | |
| log.Info("Secret key not found in secret", "secret", azureConfig.AzureCredentialsSecretRef.Name, "mapping", accountNameKey) | |
| } |
| } else { | ||
| log.Info("Secret key not found in secret", "secret", azureConfig.AzureCredentialsSecretRef.Name, "mapping", accountKeyKey) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the AccountName field, the logging for a missing AccountKey is inconsistent with other sinks. It should only log if a custom mapping was provided but the key was not found.
| } else { | |
| log.Info("Secret key not found in secret", "secret", azureConfig.AzureCredentialsSecretRef.Name, "mapping", accountKeyKey) | |
| } | |
| } else if mapping.AccountKey != "" { | |
| log.Info("Secret key not found in secret", "secret", azureConfig.AzureCredentialsSecretRef.Name, "mapping", accountKeyKey) | |
| } |
| TimeoutSeconds: 3, | ||
| PeriodSeconds: 15, | ||
| SuccessThreshold: 1, | ||
| FailureThreshold: 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FailureThreshold for the readiness probe is set to 100. With a PeriodSeconds of 15, this means the pod can fail readiness checks for up to 25 minutes (100 * 15s) before being marked as Unready. This seems excessively high and could mask startup problems for a long time. A lower value, e.g., 20, would be more appropriate to detect issues faster.
| FailureThreshold: 100, | |
| FailureThreshold: 20, |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces asynchronous backup support. The changes include adding access to secrets and persistentvolumeclaims resources, supporting credentials from secrets, and enabling sink options for various providers. The code is well-structured, but there are some areas for improvement, including health probes, context propagation, and dependency management.
| ReadinessProbe: &corev1.Probe{ | ||
| ProbeHandler: buildStubProbeHandler(), | ||
| InitialDelaySeconds: 10, | ||
| TimeoutSeconds: 3, | ||
| PeriodSeconds: 15, | ||
| SuccessThreshold: 1, | ||
| FailureThreshold: 20, | ||
| }, | ||
| LivenessProbe: &corev1.Probe{ | ||
| ProbeHandler: buildStubProbeHandler(), | ||
| InitialDelaySeconds: 20, | ||
| TimeoutSeconds: 3, | ||
| PeriodSeconds: 30, | ||
| SuccessThreshold: 1, | ||
| FailureThreshold: 6, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readiness and liveness probes for the filer-backup container are using a stub echo OK command. This is a critical correctness issue, as it means the pod will always be reported as healthy to Kubernetes, even if the filer.backup process has crashed, hung, or is failing to perform backups. This can lead to a non-functional backup system that appears to be working correctly.
A proper health check must be implemented. Please investigate if the weed filer.backup command provides a health check endpoint or mechanism that can be used. If not, a more robust check should be devised, for example:
- Checking for the existence and recent modification of a status file.
- Executing a command that verifies the
weedprocess is running and responsive.
config/rbac/role.yaml
Outdated
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - persistentvolumeclaims | ||
| verbs: | ||
| - create | ||
| - delete | ||
| - get | ||
| - list | ||
| - patch | ||
| - update | ||
| - watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These permissions grant access to create, delete, get, list, patch, update, and watch persistentvolumeclaims. While necessary for managing the backup process, ensure that these permissions are scoped appropriately to prevent unintended modifications to other PVCs in the cluster. Consider using more restrictive resource names or labels if possible.
go.mod
Outdated
| github.com/spf13/pflag v1.0.5 // indirect | ||
| github.com/spf13/viper v1.13.0 // indirect | ||
| github.com/subosito/gotenv v1.4.1 // indirect | ||
| github.com/seaweedfs/seaweedfs v0.0.0-20250625060439-95261a712ee2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…rt resolution in controllers
…ons in seaweed_webhook.go, and refactor admin service and stateful set management in controller_admin.go
… set for improved gRPC support
…address configuration for service monitoring
…installation and usage instructions for SeaweedFS Operator
…les, enhancing accessibility and clarity
…ks, improving consistency and readability
…ols version to v0.18.0, and enhance CRD definitions with bucketclaims support and RBAC permissions
…g for better debugging
…to streamline cross-compilation process
…guration in Seaweed controller
…configurations, update Dockerfile and go.mod for improved build process
…reateS3BucketWithObjectLock function
…ithQuota to utilize CreateS3BucketWithObjectLock for improved bucket creation logic
…creation and reconciliation logic
…tClaim controller
…stom config is provided
…c filer client configuration
…to improve logging granularity
…for better readability and maintainability
|
thanks for the big effort. Is it done and ready for review? |
…figuration in deployment and service templates for seaweedfs-operator
…ted webhook templates for seaweedfs-operator
This PR adds ability to deploy weed in "filer.backup" mode.
What changed?
secretsandpersistentvolumeclaimsresources in manager-rolesinkoption supports all providers and fields that were in scaffoldExample:
Tests
Tested
sink.localbackup in local cluster, tested secret getting for configmap