Skip to content

Conversation

@andrewazores
Copy link
Member

@andrewazores andrewazores commented Aug 21, 2025

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits: git commit -S -m "YOUR_COMMIT_MESSAGE"

Fixes: #959
Fixes #1150
See cryostatio/cryostat-helm#247

Description of the change:

See cryostatio/cryostat-helm#247 - port of that work over to the Operator.

Needs tests and probably a fair amount of cleanup and edge case handling, but this is working at a basic level now. For example, we don't need to request TLS certs for provisioned storage when we are configuring an external storage connection.

Motivation for the change:

See #959 and cryostatio/cryostat-helm#246

How to manually test:

  1. make cert_manager
  2. Check out and build PR, push Operator image to ex. quay.io, deploy to a cluster

cryostat.yaml

apiVersion: operator.cryostat.io/v1beta2
kind: Cryostat
metadata:
  name: cryostat-sample
  namespace: cryostat
spec:
  enableCertManager: true
  objectStorageOptions:
    provider:
      region: us-east-1
      url: https://s3.us-east-005.backblazeb2.com # or sub out the appropriate URL for your own S3-compatible object storage provider
      usePathStyleAccess: false
    secretName: s3cred

oc apply -f cryostat.yaml

@cryostatio/reviewers I can share the credentials to my test Backblaze B2 account for testing this: oc apply -f s3cred.yaml.

@mergify mergify bot added the safe-to-test label Aug 21, 2025
@andrewazores andrewazores added feat New feature or request needs-documentation labels Aug 21, 2025
@andrewazores
Copy link
Member Author

For example, we don't need to request TLS certs for provisioned storage when we are configuring an external storage connection.

I started down this path and it seems like it's going to be pretty messy to do, unless accompanied by some serious refactoring. But from what I see, we still request a TLS certificate for the report generation sidecar(s), even when there are 0 reports replicas configured (which is the default). @ebaron do you think this cleanup would be worth doing?

@andrewazores
Copy link
Member Author

/build_test

@andrewazores
Copy link
Member Author

@github-actions
Copy link

/build_test completed successfully ✅.
View Actions Run.

@andrewazores
Copy link
Member Author

andrewazores commented Aug 28, 2025

Now that #1051 is merged, see comments on cryostatio/cryostat-helm#209 (comment) . Egress policies should probably not be enabled if the user will also be using external object storage. This needs to be tested and/or documented.

@andrewazores andrewazores marked this pull request as ready for review September 15, 2025 20:08
@ebaron
Copy link
Member

ebaron commented Sep 16, 2025

For example, we don't need to request TLS certs for provisioned storage when we are configuring an external storage connection.

I started down this path and it seems like it's going to be pretty messy to do, unless accompanied by some serious refactoring. But from what I see, we still request a TLS certificate for the report generation sidecar(s), even when there are 0 reports replicas configured (which is the default). @ebaron do you think this cleanup would be worth doing?

Sorry I missed this. I don't think it's a big deal to have the certs remain if not needed. This can always be an RFE

@andrewazores andrewazores requested a review from a team September 16, 2025 18:45
Josh-Matsuoka
Josh-Matsuoka previously approved these changes Sep 24, 2025
Copy link
Contributor

@Josh-Matsuoka Josh-Matsuoka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me deploying to a local crc cluster and pointing towards my own backblaze buckets

@ebaron
Copy link
Member

ebaron commented Sep 24, 2025

@andrewazores looks like you might have lost my suggestions when rebasing

@andrewazores
Copy link
Member Author

Hmm... having new issues with this PR now when also used together with reports sidecars - the external S3 (Backblaze B2) service is rejecting the presigned requests with 403. Seems to work fine when it's just Cryostat alone talking to the external S3. This might be something that I just missed in previous development/testing, maybe I never tried this combination of configurations.

@andrewazores
Copy link
Member Author

I get the same kind of response when I try to configure a local smoketest setup similarly (report sidecar + external S3):

2025-09-24 19:38:51,238 ERROR [io.cry.rep.ReportResource] (executor-thread-1) java.io.IOException: Server returned HTTP response code: 403 for URL: https://s3.us-east-005.backblazeb2.com/NBQrozhmX9XPYm9bXlTJ1LdB9v6FE-dQS_NlGVSnSnc=/-deployments-quarkus-run-jar_startup_20250924T193847Z.jfr?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250924T193850Z&X-Amz-SignedHeaders=host&X-Amz-Credential=005c367ad28c3fa0000000002/20250924/us-east-1/s3/aws4_request&X-Amz-Expires=60&X-Amz-Signature=f14cc002cf5523d94bfe9de682bcdf378ba16d5ede5e0b127d8ebdc9bdf631d1

(this from the reports container)

Presigned URLs are obviously quite restricted and locked down to prevent any kind of forgery or other unauthorized accesses, so it's easy to misconfigure something and invalidate things. I wonder if this just normally works with our cryostat-storage container because SeaweedFS might be more lenient on some of the restrictions than the B2 commercial provider is, and there might be something that we've just been doing wrong silently all this time.

@andrewazores
Copy link
Member Author

@ebaron I'm going to drop this back down to being a draft for the meantime. I would really like to get this in to 4.1 though, so I think these are the options ahead:

  1. Just figure out what the cause of the 403 error is and fix it. This might be difficult, or may not even be possible - it's hard to tell what the actual cause of the rejection is.
  2. Enforce that a CR cannot have both report replicas and external object storage configured, or at least document this limitation. This is not ideal since these are both performance or robustness enhancing configuration options for larger Cryostat installations, and it would be great to combine them.
  3. Disable presigned recording transfers when external object storage is configured. This is easy to do, but this means that Cryostat has to go back to acting as a pipe to stream data between (now remote!) storage and the reports sidecars, so there's additional I/O + network traffic for no good reason and a bit of additional latency.

1 should be the long-term goal but since I don't know specifically what's going wrong, I think we need to consider whether we can do 2 or 3 for 4.1 and work on getting this fully implemented for a future release.

@andrewazores andrewazores marked this pull request as draft September 24, 2025 19:57
@andrewazores
Copy link
Member Author

See #1159 - we'll run into the same problems with presigned recording transfers there, too, since jfr-datasource and cryostat-reports are operating in virtually an identical manner.

@andrewazores
Copy link
Member Author

andrewazores commented Sep 24, 2025

@ebaron I'm going to drop this back down to being a draft for the meantime. I would really like to get this in to 4.1 though, so I think these are the options ahead:

  1. Just figure out what the cause of the 403 error is and fix it. This might be difficult, or may not even be possible - it's hard to tell what the actual cause of the rejection is.
  2. Enforce that a CR cannot have both report replicas and external object storage configured, or at least document this limitation. This is not ideal since these are both performance or robustness enhancing configuration options for larger Cryostat installations, and it would be great to combine them.
  3. Disable presigned recording transfers when external object storage is configured. This is easy to do, but this means that Cryostat has to go back to acting as a pipe to stream data between (now remote!) storage and the reports sidecars, so there's additional I/O + network traffic for no good reason and a bit of additional latency.

1 should be the long-term goal but since I don't know specifically what's going wrong, I think we need to consider whether we can do 2 or 3 for 4.1 and work on getting this fully implemented for a future release.

There's a long term option 4 which might be reasonable too, actually: add the S3 SDK client to cryostat-reports and jfr-datasource, so that they can talk directly to the storage themselves instead of just receiving presigned URLs from Cryostat. This does mean they also need to be aware of how Cryostat structures the files within buckets etc. so it's a little more coupled than the current design, but it won't have any of the integrity difficulties we're dealing with for presigned accesses. I think some S3 implementations also just do not support presigned URLs (generally not commercial ones, just small testing implementations) so this would allow the same remote storage design to work more generally with any S3 provider.

@ebaron
Copy link
Member

ebaron commented Sep 25, 2025

There's a long term option 4 which might be reasonable too, actually: add the S3 SDK client to cryostat-reports and jfr-datasource, so that they can talk directly to the storage themselves instead of just receiving presigned URLs from Cryostat. This does mean they also need to be aware of how Cryostat structures the files within buckets etc. so it's a little more coupled than the current design, but it won't have any of the integrity difficulties we're dealing with for presigned accesses. I think some S3 implementations also just do not support presigned URLs (generally not commercial ones, just small testing implementations) so this would allow the same remote storage design to work more generally with any S3 provider.

Maybe we could add some abstraction for this to Cryostat Core? Reports already uses it, so we'd just have to add it to jfr-datasource, which already depends on JMC core, so not a big difference.

@andrewazores
Copy link
Member Author

andrewazores commented Sep 25, 2025

-core doesn't really deal with archived recordings at all at this point let alone specifics of S3, so we would need to do some significant refactoring to move that concept over for everything else to consume it that way. But it would be an option.

Unless we just extract utility functions for determining "file keys" ($jvmId/$filename) and such, and no actual S3-specific utilities, so that at least all three components can use shared function definitions for generating and understanding these keys. That helps somewhat, but it still misses the complexity about configuring S3 bucket names, S3 credentials, ensuring buckets exist, etc.

@andrewazores
Copy link
Member Author

See linked PRs on other components above ^

I think I figured out what was going wrong with the HTTP 403s on presigned URLs - for some reason, even though Cryostat's S3 SDK client is configured to use path-style accesses, the presigned URLs it generated were using the more recent virtual host/subdomain access style. The reports and jfr-datasource implementation of receiving presigned URLs assumed that the URI's "base" (authority) would be static, but when using S3 virtual host access each separate storage bucket has its own distinct subdomain name. So, the presigned URL transformation that was applied to append the path and query onto the base resulted in an invalid URL which was missing any information about which bucket the request object should be located in.

I originally implemented this URI base configuration on the reports and jfr-datasource containers as a (weak) security measure, to ensure that they could not be told to download resources from URLs on unexpected origins. It's more complex to deal with this when using S3 virtual hosts, but that is the higher performance option and may be the only supported option for users using commercial storage providers, so we need to support it for this feature to make sense. Since both reports and jfr-datasource are now placed behind Ingress NetworkPolicies by default, and since jfr-datasource is also hidden with the Cryostat Pod and behind an auth proxy, the restriction on presigned URL base does not really add much security but does break compatibility. So, in the PRs linked above, I have simply removed that restriction and allowed the reports and jfr-datasource to simply retrieve the asset from the presigned URL verbatim, and have Cryostat send that verbatim URL as generated by its S3 SDK client.

In local Podman smoketesting against my Backblaze B2 storage account this now seems to work just fine, with both reports and jfr-datasource successfully able to pull presigned files out. I will need to do some more work on -helm and -operator to clean up some configurations and to ensure that path-style access is not enforced when external storage is configured, but once I do that then I think this PR will be ready to go.

@andrewazores
Copy link
Member Author

The equivalent -helm PR is working fine with my B2 account and presigned recordings transfers to both -reports and jfr-datasource. This PR is having some issues with jfr-datasource, but is working fine with B2 and presigned -reports transfers. Digging into why.

…g a managed cryostat-storage - don't apply it when using external S3
@andrewazores
Copy link
Member Author

Fixed, it was just a silly mistake with TLS setup when using an external S3 service. I forgot to remove the configuration that overrides the datasource's truststore with one just containing the generated storage cert (which would be applied to the Cryostat instance's managed cryostat-storage container, if any). Removing the override so that the datasource just uses its default system truststore works, so that it's able to trust B2's cert.

I used a Cryostat CR like this to test:

apiVersion: operator.cryostat.io/v1beta2
kind: Cryostat
metadata:
  name: cryostat-sample
  namespace: cryostat
spec:
  enableCertManager: true
  networkPolicies: {}
  objectStorageOptions:
    provider:
      metadataMode: bucket
      region: us-east-1
      url: https://s3.us-east-005.backblazeb2.com
      usePathStyleAccess: false
    secretName: s3cred
    storageBucketNameOptions:
      archivedRecordings: archivedrecordings
      archivedReports: archivedreports
      eventTemplates: eventtemplates
      heapDumps: heapdumps
      jmcAgentProbeTemplates: probes
      metadata: cryostatmeta
      threadDumps: threaddumps
  reportOptions:
    replicas: 1
    resources: {}
  targetNamespaces:
  - cryostat
  - apps1

With an s3cred Secret containing my B2 account credentials.

@andrewazores andrewazores marked this pull request as ready for review October 2, 2025 20:37
Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad the fix for the pre-signed URLs wasn't too bad. Just a couple comments below.

@andrewazores
Copy link
Member Author

Just noticed that there's a missing cleanup step if you go from a CR without external storage, then edit the same CR to use external storage. The cryostat-storage Deployment gets left running, unused, along with its Service, PVC, etc.

@andrewazores
Copy link
Member Author

Fixed in the latest commit. A nice side-effect is that because the PVC gets left behind, a CR can be edited to switch back and forth between cryostat-storage and an external storage instance, and the data available in ex. All Targets Archives is retained and can be retrieved seamlessly.

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

/build_test completed successfully ✅.
View Actions Run.

@andrewazores andrewazores merged commit 549b0f7 into cryostatio:main Oct 8, 2025
7 checks passed
@andrewazores andrewazores deleted the external-s3 branch October 8, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

3 participants