Skip to content

feat(download): v2 file download via /files/{fileId}#686

Open
jhagberg wants to merge 6 commits into
feat/v2-full-listfrom
feat/v2-download
Open

feat(download): v2 file download via /files/{fileId}#686
jhagberg wants to merge 6 commits into
feat/v2-full-listfrom
feat/v2-download

Conversation

@jhagberg
Copy link
Copy Markdown
Contributor

@jhagberg jhagberg commented Apr 24, 2026

Related issue(s) and PR(s)
Closes #677. Stacks on #685.

Description
Wires v2 file download through GET /files/{fileId} plus the server-provided downloadUrl plus the X-C4GH-Public-Key header, retires the legacy /s3-transfer helpers, and shrinks download/download.go to a downloadOnewriteBodyToDisk flow sitting on apiclient.Client.DownloadFile. Removes the "v2 download is not yet implemented" guard that #685 put in.

Key design points

  • Client.DownloadFile(ctx, DownloadRequest) (DownloadResult, error) is the new abstraction. DownloadResult carries the canonical File (authoritative filename — UserArg can be a bare fileId), the response body, and the server's Content-Length.
  • V2Client.DownloadFile resolves UserArg via ExactPath (paths) or ListFiles scan (bare ids), then hits the server-provided downloadUrl with X-C4GH-Public-Key. URLs go through url.Parse + ResolveReference so both relative and absolute forms work.
  • Bearer-token leakage guard: when the resolved download host differs from BaseURL, the Authorization header is dropped. A pre-signed cross-origin URL is self-authenticating; leaking the bearer to a third-party host is the kind of thing you read about in CVE writeups.
  • 403 from either resolve or GET flattens to dataset/file does not exist or access denied: <arg> — existence-leakage contract preserved from v1.
  • downloadOne uses result.File.FilePath (anonymized) for the on-disk path. Downloading by fileId now produces a meaningfully-named file instead of outdir/<fileId>. A containment check rejects any outputPath escaping outDir, so a server-provided path with ../ can't write outside the configured output directory.
  • Retires downloadFile, getFileIDURL, GetDatasets, GetFilesInfo, getBody, setupCookieJar, the download.File alias, and the cookieJar/cookiePath/appVersion package vars. The apiclient.WithV1CookieJar option also retires here in the same refactor.
  • Carries the field rename (VersionClientVersion) and int64 widening from feat(apiclient): Client interface + V1Client wrapper #679's review-amend into the new V1/V2 DownloadFile sites, and mirrors feat(download): v2 CI harness + minimal ListDatasets #681's SDA-Client-Version + User-Agent header pair on v2.

Out of scope (tracked separately)

How to test

Local:

  • go build ./..., go vet ./..., gofmt -l .: clean.
  • go test ./...: 176 pass.
  • golangci-lint run --timeout 5m: 0 issues.

Integration (build tag integration, against the live v2 dev stack):

  • TestV2_ListDatasets_Smoke, TestV2_ListFiles_Smoke, TestV2_ListFiles_ExactPath_Smoke, TestV2_ListFiles_PathPrefix_NoMatch, TestV2_DatasetInfo_Smoke, TestV2_DownloadFile_EndToEnd: all pass.

Manual, rebuilt sda-cli against the live dev stack:

  • download --api-version v2 --dataset-id EGAD00000000001 --pubkey <pem> test-file.c4gh writes the file (~1.2 kB c4gh bytes).
  • download --api-version v2 ... EGAF00000000001 (by fileId) writes test-file.c4gh — the canonical name, not the bare fileId.
  • download --api-version v2 without --pubkey errors with v2 downloads require --pubkey.
  • download --api-version v2 --dataset-id DOES_NOT_EXIST ... errors with dataset/file does not exist or access denied: <arg> (403 flattened).

Test plan

  • go test ./...: 176 pass
  • go vet ./...: clean
  • golangci-lint run: clean
  • gofmt -l .: clean
  • Integration: end-to-end v2 download passes
  • Manual: by-path, by-fileId, missing pubkey, 403 flatten
  • CI: golint, test (ubuntu + windows), integration-v2, integration (s3), codeql-analysis

jhagberg added 6 commits May 13, 2026 15:18
Interface-only change. Implementations stubbed until later tasks in
this PR.
Resolves UserArg (path or ID) via ListFiles + legacy substring match,
then GETs /s3/{dataset}/{filePath}. Returns the raw body for the
caller to stream to disk.

Substring matching preserved from download.getFileIDURL — known
imprecise but legacy-compatible. v2 uses exact filePath filter.
Resolves UserArg via exact filePath filter (paths) or list+match (ids),
follows server-provided downloadUrl, sends X-C4GH-Public-Key. 403 from
either list or download surfaces as ambiguous 'does not exist or
access denied' (existence-leakage prevention).

File gains a DownloadURL field (empty on v1). Required so downloads
can follow the server's URL instead of constructing /files/{id}
client-side.
datasetCase/recursiveCase/fileCase now call a shared downloadOne helper
that goes through client.DownloadFile, followed by the new
writeBodyToDisk helper (progress-bar + .part atomic rename) extracted
from the old downloadFile.

Removes the #679 backward-compat wrappers (download.GetDatasets,
download.GetFilesInfo, download.File alias) and the legacy helpers that
moved to apiclient in #679 (setupCookieJar, cookieJar, cookiePath,
getBody) plus getFileIDURL and downloadFile — UserArg (path or fileId)
is now resolved inside V1Client.DownloadFile / V2Client.DownloadFile.

recursiveCase uses v2's pathPrefix filter server-side when
--api-version=v2; v1 still filters client-side since it has no
server-side filter.

Adds a test for the v2-missing-pubkey guard and replaces the old
downloadFile-cleanup test with a writeBodyToDisk-cleanup test driven by
an errReader that fails mid-stream. Tests for deleted helpers
(TestGetDatasets, TestGetFilesInfo, TestFileIdUrl, TestGetBody*,
TestSetupCookiejar) are removed — their behavior is covered by
apiclient/v1_test.go (TestV1Client_ListDatasets_*,
TestV1Client_DownloadFile_*).
Pulls the dev c4gh pubkey from the reencrypt container, calls
DownloadFile for the seeded test file, verifies size and non-empty
body.
Renames the new V1Client.DownloadFile and V2Client.DownloadFile call
sites from c.cfg.Version to c.cfg.ClientVersion, mirrors the v2
"SDA-Client-Version alongside User-Agent" header pair that #681 picked
up, and updates the downloadOneWithClient test fixture.
@jhagberg jhagberg force-pushed the feat/v2-download branch from 3805aeb to 7cac102 Compare May 13, 2026 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant