-
Notifications
You must be signed in to change notification settings - Fork 20
feat(storage): add configuration for external object storage provider #1146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I started down this path and it seems like it's going to be pretty messy to do, unless accompanied by some serious refactoring. But from what I see, we still request a TLS certificate for the report generation sidecar(s), even when there are 0 reports replicas configured (which is the default). @ebaron do you think this cleanup would be worth doing? |
|
/build_test |
|
|
|
Now that #1051 is merged, see comments on cryostatio/cryostat-helm#209 (comment) . Egress policies should probably not be enabled if the user will also be using external object storage. This needs to be |
Sorry I missed this. I don't think it's a big deal to have the certs remain if not needed. This can always be an RFE |
Josh-Matsuoka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me deploying to a local crc cluster and pointing towards my own backblaze buckets
ecda9fc to
864c6ea
Compare
|
@andrewazores looks like you might have lost my suggestions when rebasing |
|
Hmm... having new issues with this PR now when also used together with reports sidecars - the external S3 (Backblaze B2) service is rejecting the presigned requests with 403. Seems to work fine when it's just Cryostat alone talking to the external S3. This might be something that I just missed in previous development/testing, maybe I never tried this combination of configurations. |
|
I get the same kind of response when I try to configure a local smoketest setup similarly (report sidecar + external S3): (this from the reports container) Presigned URLs are obviously quite restricted and locked down to prevent any kind of forgery or other unauthorized accesses, so it's easy to misconfigure something and invalidate things. I wonder if this just normally works with our cryostat-storage container because SeaweedFS might be more lenient on some of the restrictions than the B2 commercial provider is, and there might be something that we've just been doing wrong silently all this time. |
|
@ebaron I'm going to drop this back down to being a draft for the meantime. I would really like to get this in to 4.1 though, so I think these are the options ahead:
1 should be the long-term goal but since I don't know specifically what's going wrong, I think we need to consider whether we can do 2 or 3 for 4.1 and work on getting this fully implemented for a future release. |
|
See #1159 - we'll run into the same problems with presigned recording transfers there, too, since jfr-datasource and cryostat-reports are operating in virtually an identical manner. |
There's a long term option 4 which might be reasonable too, actually: add the S3 SDK client to cryostat-reports and jfr-datasource, so that they can talk directly to the storage themselves instead of just receiving presigned URLs from Cryostat. This does mean they also need to be aware of how Cryostat structures the files within buckets etc. so it's a little more coupled than the current design, but it won't have any of the integrity difficulties we're dealing with for presigned accesses. I think some S3 implementations also just do not support presigned URLs (generally not commercial ones, just small testing implementations) so this would allow the same remote storage design to work more generally with any S3 provider. |
Maybe we could add some abstraction for this to Cryostat Core? Reports already uses it, so we'd just have to add it to jfr-datasource, which already depends on JMC core, so not a big difference. |
|
-core doesn't really deal with archived recordings at all at this point let alone specifics of S3, so we would need to do some significant refactoring to move that concept over for everything else to consume it that way. But it would be an option. Unless we just extract utility functions for determining "file keys" ($jvmId/$filename) and such, and no actual S3-specific utilities, so that at least all three components can use shared function definitions for generating and understanding these keys. That helps somewhat, but it still misses the complexity about configuring S3 bucket names, S3 credentials, ensuring buckets exist, etc. |
|
See linked PRs on other components above ^ I think I figured out what was going wrong with the HTTP 403s on presigned URLs - for some reason, even though Cryostat's S3 SDK client is configured to use path-style accesses, the presigned URLs it generated were using the more recent virtual host/subdomain access style. The reports and jfr-datasource implementation of receiving presigned URLs assumed that the URI's "base" (authority) would be static, but when using S3 virtual host access each separate storage bucket has its own distinct subdomain name. So, the presigned URL transformation that was applied to append the path and query onto the base resulted in an invalid URL which was missing any information about which bucket the request object should be located in. I originally implemented this URI base configuration on the reports and jfr-datasource containers as a (weak) security measure, to ensure that they could not be told to download resources from URLs on unexpected origins. It's more complex to deal with this when using S3 virtual hosts, but that is the higher performance option and may be the only supported option for users using commercial storage providers, so we need to support it for this feature to make sense. Since both reports and jfr-datasource are now placed behind Ingress NetworkPolicies by default, and since jfr-datasource is also hidden with the Cryostat Pod and behind an auth proxy, the restriction on presigned URL base does not really add much security but does break compatibility. So, in the PRs linked above, I have simply removed that restriction and allowed the reports and jfr-datasource to simply retrieve the asset from the presigned URL verbatim, and have Cryostat send that verbatim URL as generated by its S3 SDK client. In local Podman smoketesting against my Backblaze B2 storage account this now seems to work just fine, with both reports and jfr-datasource successfully able to pull presigned files out. I will need to do some more work on -helm and -operator to clean up some configurations and to ensure that path-style access is not enforced when external storage is configured, but once I do that then I think this PR will be ready to go. |
…s are expected to work anonymously
Co-authored-by: Elliott Baron <[email protected]> Signed-off-by: Andrew Azores <[email protected]>
…t - don't apply it (rely on system truststores) if we're using external S3
49c7b9c to
f314472
Compare
|
The equivalent -helm PR is working fine with my B2 account and presigned recordings transfers to both -reports and jfr-datasource. This PR is having some issues with jfr-datasource, but is working fine with B2 and presigned -reports transfers. Digging into why. |
…g a managed cryostat-storage - don't apply it when using external S3
|
Fixed, it was just a silly mistake with TLS setup when using an external S3 service. I forgot to remove the configuration that overrides the datasource's truststore with one just containing the generated storage cert (which would be applied to the Cryostat instance's managed cryostat-storage container, if any). Removing the override so that the datasource just uses its default system truststore works, so that it's able to trust B2's cert. I used a Cryostat CR like this to test: apiVersion: operator.cryostat.io/v1beta2
kind: Cryostat
metadata:
name: cryostat-sample
namespace: cryostat
spec:
enableCertManager: true
networkPolicies: {}
objectStorageOptions:
provider:
metadataMode: bucket
region: us-east-1
url: https://s3.us-east-005.backblazeb2.com
usePathStyleAccess: false
secretName: s3cred
storageBucketNameOptions:
archivedRecordings: archivedrecordings
archivedReports: archivedreports
eventTemplates: eventtemplates
heapDumps: heapdumps
jmcAgentProbeTemplates: probes
metadata: cryostatmeta
threadDumps: threaddumps
reportOptions:
replicas: 1
resources: {}
targetNamespaces:
- cryostat
- apps1With an |
ebaron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad the fix for the pre-signed URLs wasn't too bad. Just a couple comments below.
internal/controllers/common/resource_definitions/resource_definitions.go
Show resolved
Hide resolved
|
Just noticed that there's a missing cleanup step if you go from a CR without external storage, then edit the same CR to use external storage. The cryostat-storage Deployment gets left running, unused, along with its Service, PVC, etc. |
|
Fixed in the latest commit. A nice side-effect is that because the PVC gets left behind, a CR can be edited to switch back and forth between cryostat-storage and an external storage instance, and the data available in ex. All Targets Archives is retained and can be retrieved seamlessly. |
|
/build_test |
|
|
Welcome to Cryostat! 👋
Before contributing, make sure you have:
mainbranch[chore, ci, docs, feat, fix, test]git commit -S -m "YOUR_COMMIT_MESSAGE"Fixes: #959
Fixes #1150
See cryostatio/cryostat-helm#247
Description of the change:
See cryostatio/cryostat-helm#247 - port of that work over to the Operator.
Needs tests and probably a fair amount of cleanup and edge case handling, but this is working at a basic level now. For example, we don't need to request TLS certs for provisioned storage when we are configuring an external storage connection.
Motivation for the change:
See #959 and cryostatio/cryostat-helm#246
How to manually test:
make cert_managercryostat.yamloc apply -f cryostat.yaml@cryostatio/reviewers I can share the credentials to my test Backblaze B2 account for testing this:
oc apply -f s3cred.yaml.