Skip to content

feat(download): v2 CI harness + minimal ListDatasets#681

Merged
jhagberg merged 8 commits into
mainfrom
feat/v2-ci-and-minimal-list
May 28, 2026
Merged

feat(download): v2 CI harness + minimal ListDatasets#681
jhagberg merged 8 commits into
mainfrom
feat/v2-ci-and-minimal-list

Conversation

@jhagberg
Copy link
Copy Markdown
Contributor

@jhagberg jhagberg commented Apr 23, 2026

Related issue(s) and PR(s)
Closes #675. Stacks on #679. Part of umbrella issue #663.

Description
Wires V2Client.ListDatasets (single-page) against the real v2 server, lands the v2 wire types (v2File, response envelopes, datasetInfoResponse) that #685 and #686 consume, and adds a CI workflow that boots a pinned neicnordic/sensitive-data-archive dev stack for integration tests. ListFiles and DatasetInfo remain stubs until #685. Also restores the legacy "failed to get X, reason: ..." error-wrap prefix for the list command (accidentally dropped when the apiclient abstraction landed in #679).

Updated from review (nanjiangshu): V2Client.ListDatasets uses url.JoinPath instead of string concatenation, getJSON now sends SDA-Client-Version alongside User-Agent so v1-style log pipelines still see the version on v2, and the int(f.DecryptedSize) cast in v2File.toFile() drops out following the File.DecryptedFileSize widening to int64 in #679. The HTTP-client timeout question from review is being discussed inline — Client.Timeout would cut multi-GB downloads in #686, so the counter-suggestion is Transport.ResponseHeaderTimeout.

How to test

  • go build ./..., go vet ./..., go test ./... (174 pass) — clean.
  • Integration: go test -tags=integration ./... against a booted dev stack — TestV2_ListDatasets_Smoke passes.
  • sda-cli list --datasets --api-version v2 prints the seeded dataset IDs.
  • sda-cli list --dataset X --api-version v2 errors with the stub message until feat(download): v2 full list — pagination, filters, DatasetInfo #685.
  • CI job integration-v2 exercises the full setup on GitHub Actions.

Test plan

  • go test ./...: 174 pass
  • go vet ./...: clean
  • golangci-lint run: clean
  • gofmt -l .: clean
  • Integration: TestV2_ListDatasets_Smoke against the live dev stack
  • Manual: v2 --datasets prints EGAD00000000001; --dataset X --api-version v2 returns stub error; --api-version v9 rejected
  • CI: golint, test (ubuntu + windows), integration-v2, integration (s3), codeql-analysis

Comment thread downloadclient/v2.go
Comment thread apiclient/v2.go Outdated
Comment thread apiclient/v2.go Outdated
Comment thread apiclient/v2_types.go Outdated
@jhagberg jhagberg force-pushed the feat/v2-client-abstraction branch from 0ead8d5 to dfcea63 Compare May 13, 2026 13:02
jhagberg added a commit that referenced this pull request May 13, 2026
Three changes from review comments (nanjiangshu):

- v2.ListDatasets uses url.JoinPath(BaseURL, "datasets") instead of
  string concatenation. Handles trailing slashes correctly and surfaces
  a clear error on a malformed BaseURL.
- v2 getJSON sends SDA-Client-Version alongside the existing User-Agent
  header so log pipelines keyed on the v1 header still see the version
  on v2 traffic.
- v2_types.go toFile() drops the int(f.DecryptedSize) cast now that
  apiclient.File.DecryptedFileSize is int64 (from the #679 amend).
@jhagberg jhagberg force-pushed the feat/v2-ci-and-minimal-list branch from f00c5f7 to 82e21b0 Compare May 13, 2026 13:12
jhagberg added a commit that referenced this pull request May 13, 2026
Three follow-ups after rebasing on the amended #679 and #681:

- v2_test.go: int64 assertion on the widened File.DecryptedFileSize
- v2.go: blank line before paginate return (lint nlreturn)
- list.go: drop now-redundant int(info.Size) cast on DatasetInfo.Size
  (both formatFileSizeOutput and DatasetInfo.Size are int64)
jhagberg added a commit that referenced this pull request May 13, 2026
Renames the new V1Client.DownloadFile and V2Client.DownloadFile call
sites from c.cfg.Version to c.cfg.ClientVersion, mirrors the v2
"SDA-Client-Version alongside User-Agent" header pair that #681 picked
up, and updates the downloadOneWithClient test fixture.
@jhagberg jhagberg force-pushed the feat/v2-client-abstraction branch from dfcea63 to e9d7558 Compare May 19, 2026 06:26
jhagberg added a commit that referenced this pull request May 19, 2026
- Switch ListDatasets URL construction to url.JoinPath for trailing-slash
  tolerance (nanjiangshu); thread error from JoinPath through paginate.
- Send SDA-Client-Version alongside User-Agent on v2 calls so any
  downstream monitoring keyed on the SDA header still sees v2 traffic
  (nanjiangshu).
- Widen v2File.toFile() conversion to drop the int(f.DecryptedSize) cast
  that paired with the v1 int field; relies on #679 widening
  DecryptedFileSize to int64.
- Pick up #679's package rename apiclient -> downloadclient and Config
  field/parameter renames (ClientVersion, apiVersion) in v2 code paths.
@jhagberg jhagberg force-pushed the feat/v2-ci-and-minimal-list branch from 82e21b0 to 43c4f11 Compare May 19, 2026 06:33
jhagberg added a commit that referenced this pull request May 19, 2026
- Switch ListDatasets URL construction to url.JoinPath for trailing-slash
  tolerance (nanjiangshu); thread error from JoinPath through paginate.
- Send SDA-Client-Version alongside User-Agent on v2 calls so any
  downstream monitoring keyed on the SDA header still sees v2 traffic
  (nanjiangshu).
- Widen v2File.toFile() conversion to drop the int(f.DecryptedSize) cast
  that paired with the v1 int field; relies on #679 widening
  DecryptedFileSize to int64.
- Pick up #679's package rename apiclient -> downloadclient and Config
  field/parameter renames (ClientVersion, apiVersion) in v2 code paths.
@jhagberg jhagberg force-pushed the feat/v2-ci-and-minimal-list branch from 43c4f11 to ba8cb90 Compare May 19, 2026 07:03
jhagberg added a commit that referenced this pull request May 19, 2026
Three follow-ups on the rebased base:

- v2_test.go: int64 assertion on the widened File.DecryptedFileSize
- v2.go: blank line before paginate return (lint nlreturn)
- integration_v2_test.go: pick up the apiclient -> downloadclient rename
  and the ClientVersion field name (the v2 integration smoke test was
  added before #679's review-amend landed)
jhagberg added a commit that referenced this pull request May 19, 2026
Two follow-ups on the rebased base:

- v2.go DownloadFile: emit SDA-Client-Version alongside User-Agent on v2
  so monitoring keyed on the SDA header sees v2 download traffic
  (matches #681's review-amend on v2 list calls).
- download_test.go downloadOneWithClient: use the renamed Config field
  (ClientVersion, not Version) so the v1 download test fixture matches
  the post-rename API.

This commit also carries the gofmt re-sort of download/download.go and
download_test.go imports after the apiclient -> downloadclient rename
cascaded into pr4.
Copy link
Copy Markdown
Contributor

@nanjiangshu nanjiangshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, well done!

Base automatically changed from feat/v2-client-abstraction to main May 21, 2026 09:29
Copy link
Copy Markdown
Contributor

@zeidlitz zeidlitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty straight forward to me. Left a few questions for curiosity.

Comment thread list/list.go Outdated
Comment thread downloadclient/v2.go Outdated
jhagberg added a commit that referenced this pull request May 21, 2026
Carries forward fallout from #679/#681 review-amends, plus the
JoinPath consistency fix flagged by nanjiang on ListFiles and applied
to DatasetInfo for symmetry:

- v2.go ListFiles + DatasetInfo: switch from string concat to
  url.JoinPath(c.cfg.BaseURL, "datasets", url.PathEscape(datasetID), ...)
  matching ListDatasets. PathEscape stays because the SDA schema allows
  DOI-format dataset IDs (e.g. "10.1234/567") which contain "/", as
  jbygdell pointed out.
- v2_test.go: int64 assertion on the widened File.DecryptedFileSize
- v2.go: blank line before paginate return (lint nlreturn)
- integration_v2_test.go: pick up the apiclient -> downloadclient
  rename and the ClientVersion field name (the v2 integration smoke
  test was added before #679's review-amend landed)
jhagberg added a commit that referenced this pull request May 21, 2026
Two follow-ups on the rebased base:

- v2.go DownloadFile: emit SDA-Client-Version alongside User-Agent on v2
  so monitoring keyed on the SDA header sees v2 download traffic
  (matches #681's review-amend on v2 list calls).
- download_test.go downloadOneWithClient: use the renamed Config field
  (ClientVersion, not Version) so the v1 download test fixture matches
  the post-rename API.

This commit also carries the gofmt re-sort of download/download.go and
download_test.go imports after the apiclient -> downloadclient rename
cascaded into pr4.
jhagberg added a commit that referenced this pull request May 21, 2026
- Switch ListDatasets URL construction to url.JoinPath for trailing-slash
  tolerance (nanjiangshu); thread error from JoinPath through.
- Send SDA-Client-Version alongside User-Agent on v2 calls so any
  downstream monitoring keyed on the SDA header still sees v2 traffic
  (nanjiangshu).
- Widen v2File.toFile() conversion to drop the int(f.DecryptedSize) cast
  that paired with the v1 int field; relies on #679 widening
  DecryptedFileSize to int64.
- Extract const maxErrorBodyBytes = 200 and use it in v2.go's error-body
  truncation (zeidlitz); the same constant becomes useful in #685's
  chore commit (caps the body read with io.LimitReader, max+1).
- Drop the dead 'invalid base URL' string-equality blocks in list.go's
  datasetFiles and Datasets (zeidlitz noted errors.Is would be nicer);
  after #679's fail-fast validation in downloadclient.New() the new
  error is 'Config.BaseURL is required' and it fires before ListFiles
  or ListDatasets is ever called, so the special case was unreachable.
- Picks up #679's package rename apiclient -> downloadclient and Config
  field/parameter renames (ClientVersion, apiVersion) in v2 code paths.
@jhagberg jhagberg force-pushed the feat/v2-ci-and-minimal-list branch from ba8cb90 to 6e99807 Compare May 21, 2026 12:56
jhagberg added a commit that referenced this pull request May 21, 2026
Carries forward fallout from #679/#681 review-amends, plus the JoinPath
consistency fix flagged by nanjiang on ListFiles and applied to
DatasetInfo for symmetry, plus zeidlitz's maxErrorBodyBytes constant
adopted by the LimitReader site that this PR introduces:

- v2.go ListFiles + DatasetInfo: switch from string concat to
  url.JoinPath(c.cfg.BaseURL, "datasets", url.PathEscape(datasetID), ...)
  matching ListDatasets. PathEscape stays because the SDA schema allows
  DOI-format dataset IDs (e.g. "10.1234/567") which contain "/", as
  jbygdell pointed out.
- v2.go LimitReader: switch the literal 201 to maxErrorBodyBytes+1
  (zeidlitz; constant defined in #681's review-amend).
- v2_test.go: int64 assertion on the widened File.DecryptedFileSize.
- v2.go: blank line before paginate return (lint nlreturn).
- integration_v2_test.go: pick up the apiclient -> downloadclient rename
  and the ClientVersion field name (the v2 integration smoke test was
  added before #679's review-amend landed).
jhagberg added a commit that referenced this pull request May 21, 2026
Three follow-ups on the rebased base:

- v2.go DownloadFile: emit SDA-Client-Version alongside User-Agent on v2
  so monitoring keyed on the SDA header sees v2 download traffic
  (matches #681's review-amend on v2 list calls).
- v2.go DownloadFile error path: adopt maxErrorBodyBytes for the
  LimitReader cap and the body truncation (constant from #681).
- download_test.go downloadOneWithClient: use the renamed Config field
  (ClientVersion, not Version) so the v1 download test fixture matches
  the post-rename API.

This commit also carries the gofmt re-sort of download/download.go and
download_test.go imports after the apiclient -> downloadclient rename
cascaded into pr4.
jhagberg added a commit that referenced this pull request May 21, 2026
- Switch ListDatasets URL construction to url.JoinPath for trailing-slash
  tolerance (nanjiangshu); thread error from JoinPath through.
- Send SDA-Client-Version alongside User-Agent on v2 calls so any
  downstream monitoring keyed on the SDA header still sees v2 traffic
  (nanjiangshu).
- Widen v2File.toFile() conversion to drop the int(f.DecryptedSize) cast
  that paired with the v1 int field; relies on #679 widening
  DecryptedFileSize to int64.
- Extract const maxErrorBodyBytes = 200 and use it in v2.go's error-body
  truncation (zeidlitz); the same constant becomes useful in #685's
  chore commit (caps the body read with io.LimitReader, max+1).
- Drop the dead 'invalid base URL' string-equality blocks in list.go's
  datasetFiles and Datasets (zeidlitz noted errors.Is would be nicer);
  after #679's fail-fast validation in downloadclient.New() the new
  error is 'Config.BaseURL is required' and it fires before ListFiles
  or ListDatasets is ever called, so the special case was unreachable.
- Picks up #679's package rename apiclient -> downloadclient and Config
  field/parameter renames (ClientVersion, apiVersion) in v2 code paths.
@jhagberg jhagberg force-pushed the feat/v2-ci-and-minimal-list branch from 6e99807 to b834ee9 Compare May 21, 2026 14:24
jhagberg added a commit that referenced this pull request May 21, 2026
Carries forward fallout from #679/#681 review-amends, plus the JoinPath
consistency fix flagged by nanjiang on ListFiles and applied to
DatasetInfo for symmetry, plus zeidlitz's maxErrorBodyBytes constant
adopted by the LimitReader site that this PR introduces:

- v2.go ListFiles + DatasetInfo: switch from string concat to
  url.JoinPath(c.cfg.BaseURL, "datasets", url.PathEscape(datasetID), ...)
  matching ListDatasets. PathEscape stays because the SDA schema allows
  DOI-format dataset IDs (e.g. "10.1234/567") which contain "/", as
  jbygdell pointed out.
- v2.go LimitReader: switch the literal 201 to maxErrorBodyBytes+1
  (zeidlitz; constant defined in #681's review-amend).
- v2_test.go: int64 assertion on the widened File.DecryptedFileSize.
- v2.go: blank line before paginate return (lint nlreturn).
- integration_v2_test.go: pick up the apiclient -> downloadclient rename
  and the ClientVersion field name (the v2 integration smoke test was
  added before #679's review-amend landed).
jhagberg added a commit that referenced this pull request May 21, 2026
Three follow-ups on the rebased base:

- v2.go DownloadFile: emit SDA-Client-Version alongside User-Agent on v2
  so monitoring keyed on the SDA header sees v2 download traffic
  (matches #681's review-amend on v2 list calls).
- v2.go DownloadFile error path: adopt maxErrorBodyBytes for the
  LimitReader cap and the body truncation (constant from #681).
- download_test.go downloadOneWithClient: use the renamed Config field
  (ClientVersion, not Version) so the v1 download test fixture matches
  the post-rename API.

This commit also carries the gofmt re-sort of download/download.go and
download_test.go imports after the apiclient -> downloadclient rename
cascaded into pr4.
@jhagberg jhagberg requested a review from zeidlitz May 26, 2026 12:17
jhagberg added 8 commits May 26, 2026 15:15
Record the sensitive-data-archive commit that the upcoming v2
integration-CI job will check out. Pinning prevents upstream drift from
silently breaking our CI.

Pin: 608878fa453770fcb3962bf0239366905c125982 — latest origin/main
commit that directly touched dev-tools/download-v2-dev/ (Copilot-review
fixes to the dev compose added in neicnordic/sensitive-data-archive#2368).
v2's FileInfo has checksums as array and a server-provided downloadUrl.
Conversion to the shared File type happens in v2File.toFile() — maps
checksums to legacy scalar fields (prefer sha256).
First real v2 wire call. Single-page only — pagination arrives in #676
with an explicit error if the server returns a non-null nextPageToken
(prevents silent result truncation).

ListFiles and DatasetInfo are still stubs that error out with
'not implemented until #676'. The list command adapts accordingly:
--dataset <id> fails with that stub error (needs ListFiles), while
--datasets branches on version and prints just dataset IDs for v2
(per-dataset file count + size enrichment lands with DatasetInfo in #676).
download_test.go and list_test.go cover both paths.

All '#N' references in V2Client doc comments, stub-error messages, and
the boundary tests use the real sub-issue numbers so GitHub auto-links
to the intended issue rather than whatever PR happens to carry that
short number in the repository.
… error

Clarifies the Config.Version godoc so it documents both v1
(SDA-Client-Version header) and v2 (User-Agent "sda-cli/<version>")
mappings, trims the pagination-not-yet-implemented error message to the
first clause, and points the trimmed message at the real follow-up
sub-issue (#676) rather than a placeholder.
Boots dev-tools/download-v2-dev/ from neicnordic/sensitive-data-archive
at a pinned commit, fetches a dev token from mockauth, and runs tests
guarded by //go:build integration. Pin documented in
docs/v2-dev-stack-pin.md.

First integration test: V2Client.ListDatasets against the seeded
EGAD00000000001 dataset.
- Wait loop 120s → 240s covers cold-runner image-pull time.
- Wait for both download (:8085/health/ready) and mockauth (:8000/tokens)
  before the token-fetch step — mockauth's first-run pip install can lag
  behind download's readiness, making /tokens fail intermittently.
- Dump all compose services on failure, not just download (postgres,
  minio, reencrypt, or mockauth failures show up too).
- Run the log dump BEFORE teardown so the dump still has containers to
  read from (teardown removes the compose project).
After the apiclient abstraction in #679 landed, list.datasetFiles and
list.Datasets returned raw V1Client errors on HTTP / transport failures,
dropping the 'failed to get files/datasets, reason: ...' prefix that
callers saw before. This restores the legacy prefix for both commands
while leaving the bare 'invalid base URL' error untouched
(TestListDatasetNoUrl and TestListDatasetsNoUrl assert on that string
directly).

Caught by code review on #675. download already had the wrap in
getFileIDURL but list lost it during the abstraction refactor.
- Switch ListDatasets URL construction to url.JoinPath for trailing-slash
  tolerance (nanjiangshu); thread error from JoinPath through.
- Send SDA-Client-Version alongside User-Agent on v2 calls so any
  downstream monitoring keyed on the SDA header still sees v2 traffic
  (nanjiangshu).
- Widen v2File.toFile() conversion to drop the int(f.DecryptedSize) cast
  that paired with the v1 int field; relies on #679 widening
  DecryptedFileSize to int64.
- Extract const maxErrorBodyBytes = 200 and use it in v2.go's error-body
  truncation (zeidlitz); the same constant becomes useful in #685's
  chore commit (caps the body read with io.LimitReader, max+1).
- Drop the dead 'invalid base URL' string-equality blocks in list.go's
  datasetFiles and Datasets (zeidlitz noted errors.Is would be nicer);
  after #679's fail-fast validation in downloadclient.New() the new
  error is 'Config.BaseURL is required' and it fires before ListFiles
  or ListDatasets is ever called, so the special case was unreachable.
- Picks up #679's package rename apiclient -> downloadclient and Config
  field/parameter renames (ClientVersion, apiVersion) in v2 code paths.
@jhagberg jhagberg force-pushed the feat/v2-ci-and-minimal-list branch from b834ee9 to 779d968 Compare May 26, 2026 13:18
jhagberg added a commit that referenced this pull request May 26, 2026
Carries forward fallout from #679/#681 review-amends, plus the JoinPath
consistency fix flagged by nanjiang on ListFiles and applied to
DatasetInfo for symmetry, plus zeidlitz's maxErrorBodyBytes constant
adopted by the LimitReader site that this PR introduces:

- v2.go ListFiles + DatasetInfo: switch from string concat to
  url.JoinPath(c.cfg.BaseURL, "datasets", url.PathEscape(datasetID), ...)
  matching ListDatasets. PathEscape stays because the SDA schema allows
  DOI-format dataset IDs (e.g. "10.1234/567") which contain "/", as
  jbygdell pointed out.
- v2.go LimitReader: switch the literal 201 to maxErrorBodyBytes+1
  (zeidlitz; constant defined in #681's review-amend).
- v2_test.go: int64 assertion on the widened File.DecryptedFileSize.
- v2.go: blank line before paginate return (lint nlreturn).
- integration_v2_test.go: pick up the apiclient -> downloadclient rename
  and the ClientVersion field name (the v2 integration smoke test was
  added before #679's review-amend landed).
jhagberg added a commit that referenced this pull request May 26, 2026
Three follow-ups on the rebased base:

- v2.go DownloadFile: emit SDA-Client-Version alongside User-Agent on v2
  so monitoring keyed on the SDA header sees v2 download traffic
  (matches #681's review-amend on v2 list calls).
- v2.go DownloadFile error path: adopt maxErrorBodyBytes for the
  LimitReader cap and the body truncation (constant from #681).
- download_test.go downloadOneWithClient: use the renamed Config field
  (ClientVersion, not Version) so the v1 download test fixture matches
  the post-rename API.

This commit also carries the gofmt re-sort of download/download.go and
download_test.go imports after the apiclient -> downloadclient rename
cascaded into pr4.
@jhagberg jhagberg added this pull request to the merge queue May 28, 2026
Merged via the queue into main with commit e2896ab May 28, 2026
7 checks passed
@jhagberg jhagberg deleted the feat/v2-ci-and-minimal-list branch May 28, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(download): v2 API CI harness + minimal /datasets listing

3 participants