Skip to content

Conversation

nvictus
Copy link
Member

@nvictus nvictus commented Aug 31, 2025

This PR migrates from custom HTTP fetching to obspec/obstore for standardized cloud storage access: HTTP(s), S3, GCS, Azure. This includes implementing a custom obspec store for FTP and a CachedStore wrapper to manage the metadata caching and the two-tier caching of blocks.

There is no longer a separate mount point for each URL scheme. Instead the url scheme is given explicitly and the request is dispatched to the appropriate storage client, e.g.:

simple-httpfs -f --log log.txt /tmp/cloud &

cat /tmp/cloud/https://example.com/README.txt...
cat /tmp/cloud/s3://example/README.txt...

Configuring the storage clients and their request behavior is supported through the store_configs, credential_providers, client_options, and retry_config options. These are not yet exposed via the CLI, but can also be configured with environment variables (see docs). The default behavior includes retry attempts. When no storage config is provided for an object storage backend, we set skip_signature to True to support public buckets.

Additional updates

FUSE interface

  • Improved FUSE operations and error reporting.
  • Implemented the readdir operation to support ls.
  • The trailing sentinel string to recognize URLs is now customizable.
  • Suppress the use of OS cruft files on MacOS (noappledouble)

Modernization

  • Package infra: Migrated from setup.py/versioneer to pyproject.toml with hatchling backend, uv, and ruff.
  • Testing: Added tests covering all major components (HttpFs, caching layers, FTP store). Added a smoke test bash script for manual use as well.
  • Type checking: Added type annotations and strict mypy configuration.

Issues

  • Native background mode doesn't seem to work properly on a mac.

nvictus added 28 commits August 17, 2025 13:32
- Fix logic in _resolve_search_dir and _resolve_path functions
- Add detailed docstrings explaining prefix handling behavior
- Clarify comments about directory vs file prefix logic
- Add tests/test_caching.py with 40 tests for LRUCache and CachedStore
- Add tests/test_httpfs.py with 30 tests for FUSE operations
- Test multi-tier caching behavior and scheme-based cache isolation
- Test path_to_url helper, load_store function, and all FUSE methods
- Use obstore.MemoryStore for realistic testing scenarios
- Include proper exception handling and error condition testing
- Refactor CachedStore to use base_url parameter instead of separate scheme
- Add dedicated cache key generation methods for metadata and blocks
- Include URI scheme in cache keys to prevent collisions across storage backends
- Fix critical bug in get_range() method passing wrong path parameter
- Simplify list_with_delimiter() implementation
- Update to use obspec.exceptions for proper error handling with map_exception()
- Replace isinstance checks with proper exception mapping
- Enhance error handling in getattr(), readdir(), and read() methods
- Add proper FUSE error codes (EACCES, ENOENT, EIO) for different error types
- Update _load_cached_store() method for consistent store creation
- Improve logging and debug output formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant