Skip to content

Conversation

@davelopez
Copy link
Contributor

@davelopez davelopez commented Sep 19, 2025

Currently, we can only fetch data from public URLs without any authentication or custom headers.

This PR introduces support for HTTP headers in URL fetch requests for landing requests. Headers are controlled through pattern-based configuration, and sensitive headers are automatically encrypted using Galaxy's vault system before storing in the database.

🚀 Features

1. Pattern-Based URL Header Configuration

  • URL Matching: Headers controlled per URL pattern using glob syntax (*, ?, **)
  • Multiple Patterns: One URL can match multiple patterns - union of all allowed headers is permitted
  • Configuration File: config/url_headers_conf.yml
  • Fail-Fast Security: Headers rejected if used without proper configuration

2. Automatic Sensitive Header Encryption

  • Per-Pattern Configuration: Each pattern explicitly declares which headers are sensitive
  • Vault Integration: Sensitive headers encrypted using Galaxy's vault system
  • Transparent Operation: Non-sensitive headers remain in plain text for performance
  • Secure-by-Default: Header is sensitive if ANY matching pattern marks it sensitive

3. Secure Storage Architecture

  • Database Protection: Sensitive header values are never stored in plain text
  • Vault References: Encrypted headers replaced with vault placeholders (e.g., __VAULT_HEADER_AUTHORIZATION__)
  • Automatic Decryption: Headers are automatically decrypted when landing requests are retrieved
  • Key Management: Hierarchical vault keys: headers/{landing_uuid}/{header_name}

🔧 Configuration

Example Configuration File

patterns:
    - url_pattern: "https://github.com/**"
      headers:
          - name: "Authorization"
            sensitive: true
          - name: "Accept"
            sensitive: false
    - url_pattern: "https://api.example.com/v1/**"
      headers:
          - name: "X-API-Key"
            sensitive: true

How Pattern Matching Works

  • All-Matches Logic: If a URL matches multiple patterns, the union of all allowed headers is permitted
  • Glob Syntax: Standard glob patterns (* = any chars, ? = single char, ** = recursive)
  • Order Independent: Pattern order doesn't matter - all matches contribute to allowed headers

🔧 How It Works

API Usage Examples

Creating a Data Landing Request with Headers

POST /api/data_landings
Content-Type: application/json

{
  "request_state": {
    "targets": [{
      "destination": {"type": "hdas"},
      "items": [{
        "src": "url",
        "url": "https://api.example.com/data.json",
        "ext": "json",
        "headers": {
          "Authorization": "Bearer secret-token-123",
          "X-API-Key": "api-key-456",
          "User-Agent": "Galaxy",
          "Content-Type": "application/json"
        }
      }]
    }]
  },
  "public": true
}

Creating a Workflow Landing Request with Headers

POST /api/workflow_landings
Content-Type: application/json

{
  "workflow_id": "workflow_123",
  "workflow_target_type": "stored_workflow",
  "request_state": {
    "input_dataset": {
      "src": "url",
      "url": "https://secure-data.example.com/dataset.csv",
      "ext": "csv",
      "headers": {
        "Authorization": "Bearer workflow-token-789",
        "X-Custom-Header": "custom-value"
      }
    }
  },
  "public": true
}

Under the Hood: Encryption Process

  1. Configuration Check: System validates headers against configured URL patterns
  2. Pattern Matching: All matching patterns identified for the URL
  3. Header Validation: Only headers allowed by at least one matching pattern are accepted
  4. Sensitivity Detection: The Header is sensitive if ANY matching pattern marks it sensitive
  5. Vault Storage: Sensitive headers encrypted and stored in vault:
    landing_request/headers/{landing_uuid}/authorization
    landing_request/headers/{landing_uuid}/x_api_key
    
  6. Reference Replacement: Sensitive values replaced with vault references:
    {
        "headers": {
            "Authorization": "__VAULT_HEADER_AUTHORIZATION__",
            "X-API-Key": "__VAULT_HEADER_X_API_KEY__"
        }
    }
  7. Transparent Decryption: When a landing request is retrieved, vault references are automatically replaced with actual values

🔒 Security Features

Pattern-Based Access Control

  • Explicit Allowlist: Only headers explicitly configured for matching URL patterns are allowed
  • Fail-Fast: Requests with unauthorized headers are rejected immediately
  • Secure-by-Default: If any matching pattern marks a header as sensitive, it's treated as sensitive

Vault Configuration Required

This feature requires a configured Galaxy vault. See the vault documentation for setup instructions.

Fallback Behavior

  • No Configuration: Missing config file returns null configuration (no headers allowed)
  • No Vault: Feature requires vault - fails fast if vault not configured and sensitive headers are used

✅ Testing

  • Included unit and integration tests covering:
    • Pattern matching logic
    • Header validation and sensitivity detection
    • Vault encryption/decryption process
    • API request handling with headers

🎯 Use Cases

This enhancement enables several important use cases:

  1. Private API Access: Fetch data from APIs requiring authentication tokens
  2. Rate Limiting: Include API keys for higher rate limits
  3. Custom Protocols: Support for proprietary authentication schemes
  4. Workflow Integration: Secure data fetching within workflow executions

How to test the changes?

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

)
except Exception:
log.warning("Failed to encrypt headers in landing request state", exc_info=True)
pass # Continue without encryption if vault fails
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather this fail outright than risk storing things that should be encrypted in an unencrypted fashion - especially given the rest of the app will assume the encryption has already occurred. Does this make testing harder or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version should cover this. Thank you!!

@jmchilton
Copy link
Member

This approach has made the admin configuration trivial and deployment much easier as a result. The existing upload process as is is already... sort of exploitable... I mean we don't do a great job at rate limiting Galaxy (maybe this has improved?) and we let most URIs be accessed for users on behalf of the Galaxy server - it is scary from a security perspective. Allowing users to set arbitrary headers including (especially?) user-agent makes it even a richer target for hacking I would suspect. If we shipped an allow-list of headers and URI patterns that allow that header and whether the header should be secured then I would be much more comfortable from a security perspective. It would be much harder to configure then but we would be sure exactly what the exploit surface is.

Additionally, I trust the list of headers is relatively complete and well thought through but again I would be more comfortable if we had an explicit allow list because again we would understand the exploit surface exactly.

I'm not a -1 on any of this though - I'm just expressing my concerns and telling you how I would have had it work - which people may think would be too much config. Still though one can imagine blends of the approaches - maybe it is off by default but there is a configuration that allows any requests like this or restricted set of requests and admins can decide on their level of comfort.

Even if people believe I'm being too cautious or appropriately cautious but the admin/deployment burden of addressing it would be too steep - I would still strongly encourage we don't allow the user agent to be overridden - if an API wants Galaxy to access it they shouldn't require a non-Galaxy user agent.

What we pick to allow through makes me anxious - but after that - the actual implementation of allowing those headers and securing seems really well thought through well. It seems to fit with our existing APIs beautifully - that part of the implementation seems perfect to me.

@davelopez
Copy link
Contributor Author

Thank you @jmchilton! As always, great constructive feedback! I will add some configuration to make this functionality more explicitly controlled 👍

@davelopez davelopez marked this pull request as draft September 22, 2025 07:55
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch 2 times, most recently from c7f7816 to 174dc62 Compare October 15, 2025 08:22
@davelopez davelopez marked this pull request as ready for review October 15, 2025 18:21
@davelopez
Copy link
Contributor Author

I've added the config option to specify URL patterns and sets of allowed headers for each pattern. This should be more explicit while maintaining flexibility for admins to allow general safe headers. I've also updated the PR description with the updates.
Thanks again for the feedback! Let me know if there is something else worth improving 🙏

@davelopez davelopez marked this pull request as draft October 24, 2025 14:22
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch 2 times, most recently from c0c11c7 to d7a5df4 Compare October 28, 2025 12:54
@davelopez davelopez marked this pull request as ready for review October 28, 2025 13:06
@davelopez davelopez modified the milestones: 25.1, 26.0 Oct 29, 2025
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch from d7a5df4 to 00cf447 Compare October 30, 2025 15:01
Introduces the ability to specify optional HTTP headers for URL-based
data fetching. These headers are passed
to the fetch logic to enhance flexibility in handling authenticated
or customized requests.
Introduces functions to identify, encrypt, and decrypt sensitive HTTP
headers securely using Galaxy's Vault system.
Refactors header encryption and decryption logic to remove tool-specific dependencies, enabling support for workflow landings.
Introduces a new integration test to verify the encryption of sensitive
headers in workflow landing requests. Ensures that headers containing
authorization tokens and API keys are securely encrypted using Galaxy's
vault system and not stored in plain text in the database.

Refactors helper methods to support both tool and workflow landing
request models.
Introduces warning logs to capture encryption and decryption failures
in the landing request state, providing better visibility into issues
with header processing. This helps in diagnosing and addressing
potential problems during runtime without halting execution.
Ensuring that issues with the vault or encryption process are surfaced immediately.
Introduces a utility function to recursively check for sensitive
headers in nested data structures, enhancing the ability to
identify headers requiring encryption.

Includes unit tests covering various cases such as nested headers,
non-sensitive headers, and edge cases to ensure robustness.
Introduces a new module to configure and manage allowed HTTP request headers for external URL fetches.
Ensures that when multiple URL patterns match a given URL, header permissions (allowance and sensitivity) are correctly consolidated.
Introduces a new sample configuration to define an allow-list for HTTP headers in external URL fetch requests. This mechanism allows administrators to specify which headers are permitted for different URL patterns, improving security and control over fetch requests.

The configuration also supports marking headers as sensitive, prompting encryption of their values. The sample provides illustrative examples for common services like GitHub, AWS S3, and generic cloud storage.
Adds common authentication-related headers (Authorization, X-Auth-Token, X-API-Key) to the default sensitive list for HTTPS URLs in the sample configuration. This provides a more secure default example for users, preventing accidental exposure of sensitive credentials.

Includes a new comment advising users to only employ the minimum necessary configuration for their specific needs, reinforcing security best practices.
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch from 00cf447 to c970f53 Compare November 10, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants