Skip to content

[Storage] Support AWS S3 Files (NFS access to S3 buckets) #9455

@andylizf

Description

@andylizf

Summary

AWS launched Amazon S3 Files on April 7, 2026. It allows mounting any S3 bucket as a native NFS v4.1+ file system without FUSE. This could benefit AWS workloads that suffer from cold cache warmup after spot preemption or need shared mount access across nodes.

Current State

SkyPilot uses goofys (x86_64) and rclone (ARM64) to FUSE-mount S3.

  • No persistent cache across restarts: goofys caches in memory only. When a spot instance is preempted and SkyPilot recovers on a new node, the goofys process starts with a cold cache and re-fetches all data from S3. For large datasets this adds significant warmup time on each recovery.
  • Two code paths: goofys for x86_64, rclone for ARM64, with different behavior and performance characteristics.

What S3 Files Offers

S3 Files provides a kernel-level NFS mount backed by a two-tier architecture: an EFS-based cache layer for hot data, with S3 as the durable backend.

Performance specs from AWS docs:

Metric Spec
Per-client max read throughput 3 GiB/s
Aggregate read throughput Up to TB/s across clients
Read IOPS Up to 250,000 per filesystem
Aggregate write throughput 1–5 GiB/s (region-dependent)
Write IOPS Up to 50,000 per filesystem
FS→S3 sync latency ~60 seconds (batched)

Files ≤128KB are automatically cached with both metadata and data on first directory access. Files >128KB cache metadata only; data is fetched from S3 on actual read. Cached data unused for 30 days is automatically evicted.

Potential Benefits for ML Workloads

Persistent read cache across spot preemptions: S3 Files' EFS-backed cache is a managed service independent of compute instances. When a spot instance is preempted and SkyPilot recovers on a new node in the same AZ, mounting the same S3 Files filesystem gives immediate access to previously cached data without re-fetching from S3. This directly addresses the cold-cache warmup problem with goofys.

Multi-node shared access: NFS close-to-open consistency means one node can write a file, close it, and other nodes immediately see the update on open. This is useful for distributed training checkpointing (rank 0 writes, other ranks read) without application-level coordination. Current FUSE mounts have no cross-node cache coherence.

Write durability: Writes land in EFS (persistent, survives process crash) and auto-commit to S3 every ~60s. For checkpoint writes where immediate S3 visibility isn't required, this is simpler than rclone VFS cache. However, the 60-second commit window is fixed and not configurable at launch.

Considerations

Setup complexity: S3 Files requires creating a filesystem resource (aws s3files create-file-system), enabling bucket versioning + server-side encryption, configuring IAM roles with AmazonS3FilesClientFullAccess, creating per-AZ mount targets, opening TCP 2049 in security groups, and installing amazon-efs-utils. This is significantly more infrastructure than downloading a goofys binary. SkyPilot would need provisioning and lifecycle management code for these resources.

Cost: S3 Files incurs charges for the EFS cache layer and NFS requests on top of S3 storage fees. Users should be able to make an informed choice between FUSE (free, lower performance) and S3 Files (paid, better cache persistence).

Cold read performance: First access to uncached large files (≥1MB) streams directly from S3 through the NFS layer. This is not necessarily faster than goofys reading directly from S3 via FUSE for cold data.

S3→FS sync latency: Changes made to S3 objects outside the filesystem take 30–60 seconds to reflect in the NFS mount (independent benchmark). For workloads that need to see S3-side changes quickly, goofys' 5-second stat cache TTL actually detects changes faster.

Suggested Approach

Since S3 Files requires pre-provisioned infrastructure, an opt-in model seems appropriate: users who have set up an S3 Files filesystem for their bucket can specify it in storage config (e.g. mode: MOUNT_S3FILES or a filesystem_id parameter), and SkyPilot handles the NFS mount. The existing goofys/rclone paths remain the default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions