Add notebook image CI build (mdx2:1.0.2-1) and remove broken docker job#1
Merged
Conversation
Reproduces diffuseproject/mdx2:test using a captured conda lockfile + mdx2 pinned to 327bf6e (PR #56 head, the source :test was built from), then adds openssh-client, openssh-sftp-server, and rsync so scp, sftp, and rsync work through the SSH gateway. The existing :1.0.0/:latest build job is left untouched.
The pre-existing docker: job was added 2026-03-25 (PR #57's revert era) and has never successfully pushed an image to Dockerhub. Two pre-existing bugs: file: dockerfile (lowercase, doesn't exist on Linux runners), and the login step ran unconditionally with creds that weren't set until 2026-05-02. The image it would build (.github/Dockerfile, the standalone Jupyter Lab launcher tagged :latest / :1.0.0) is not used anywhere in the Diffuse deployment chain — it's only referenced by mdx2-workflows/Dockerfile as a base for local Prefect dev, and that base image already exists on Dockerhub from a manual push in March 2026. Removing the dead job leaves the workflow with one focused, working job and clears the perpetual red X on every PR. The .github/Dockerfile is kept around as a reference if anyone wants to revive :1.0.0 builds later in a focused PR.
- Drop python_stage: nothing in the image invokes /usr/local/bin/python3.10; the conda env at /root/micromamba/envs/mdx2-dev/bin supplies the canonical Python (3.10.19, matching dxtbx 3.24.1) - Drop /opt/conda mkdir and PATH entry: the directory was never populated - Remove .git from cloned source so `git status` inside the running pod doesn't surface stale build-context state Image shrinks ~50 MB (4.66 GB to 4.61 GB uncompressed). Lockfile parity preserved: normalized md5 of conda list unchanged at d5efd15da8bcbddebfa20b20ef2f2ff6.
Hoist IMAGE_NAME, IMAGE_TAG, and MDX2_COMMIT to workflow-level env so they're a single review surface for "what version are we publishing." Add a pre-push step that fails if the target tag already exists on Dockerhub, preventing silent overwrites of published immutable tags. Add a paths filter to the push trigger so only image-relevant changes trigger a build-and-push on main; pull_request stays unfiltered for predictable required-status-check behavior. Set permissions to least-privilege (contents:read for checkout, actions:write for type=gha cache).
Recipe-style operational doc under docs/ covering the three upgrade scenarios (mdx2 source bump, conda env refresh, OS apt addition) with concrete commands, rollback procedure, and common pitfalls. The 1.0.2 -> 1.0.4 upgrade walkthrough doubles as the cleanup procedure for the PR-ref scaffolding (MDX2_PR_REF + git fetch origin pull/56/head), which becomes deletable as soon as we move off the 327bf6e pin. Cluster SSH endpoint is referred to as <sampleworks-host>; actual value lives in the diff-use/infra Pulumi config (private repo).
feb7879 to
daaa8c1
Compare
The shared diffuseproject/mdx2 Dockerhub repo currently holds two unrelated images: :1.0.0 and :latest are the standalone Jupyter Lab launcher (saada's image, base for mdx2-workflows/Dockerfile), while :test (and now :1.0.2-1) are the JupyterHub singleuser image. Sharing a repo across two semantically different images, distinguished only by tag, is the kind of latent collision that bites during incidents. Move the new image to its own repo (diffuseproject/mdx2-notebook). The legacy :1.0.0/:latest tags stay where they are; this repo's Dockerfile and docker-compose.yml continue to reference them unchanged. Provenance comments referring to the legacy diffuseproject/mdx2:test tag are kept verbatim, since they document historical fact about a real Dockerhub image that still exists at the old name.
Roll the previous notebook->jhub naming through every dependent surface in one atomic commit: - diffuseproject/mdx2-notebook -> diffuseproject/mdx2-jhub (workflow IMAGE_NAME) - Dockerfile.notebook -> Dockerfile.jhub (git mv) - notebook-env.lock -> jhub-env.lock (git mv) - docs/upgrading-the-notebook-image.md -> docs/upgrading-the-jhub-image.md (git mv) - workflow paths filter, file: arg, job key, display name, cache scope all updated to match. Reasoning: image names that describe the operational role (jhub) are more specific than artifact-form names (notebook), and the team's verbal shorthand for this image is 'the jhub image'. Aligning the artifact name with how it is referred to in conversation removes one translation step between Slack/standup and the codebase. Preserved verbatim: prose references to 'a notebook' (user's Jupyter session, not the image), the kubernetes container name '-c notebook' (set in the JupyterHub deployment config and not under this repo's control), and 'jupyterhub_notebook_image_tag' (a field name in the diff-use/webapp config; renaming that is a separate webapp-side decision).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a CI build for
diffuseproject/mdx2-jhub:1.0.2-1, the JupyterHub singleuser image used by the Diffuse SSH gateway. Reproduces the existing:testimage's scientific stack via a captured conda lockfile, pins the mdx2 source to the same commit:testwas built from, and addsopenssh-client,openssh-sftp-server, andrsyncsoscp,sftp, andrsyncwork through the gateway.Also removes the pre-existing broken
docker:job from the workflow, strips dead weight from the new Dockerfile, and adds workflow safety guards to prevent silent tag overwrites.Why
sshinto a JupyterHub notebook pod via the Diffuse SSH gateway works (shipped via webapp PRs #287–#294). Butscp,rsync, andsftpfail because they exec the corresponding binaries inside the pod, and the running:testimage has none of them:Modern OpenSSH (≥9.0) defaults
scpto the SFTP subsystem, which is why even basicscpstarted failing.Approach
The previous
:testimage (built 2026-02-26) was produced from a now-deleted feature branch (feat/jupyterhub-singleuser, PR #56) bydocker buildon someone's laptop. Its source Dockerfile was never committed anywhere. To rebuild faithfully:jhub-env.lock, 385 packages from conda-forge). Captured withmicromamba env export --explicit -n mdx2-devfrom the live:testpod, so the new image gets byte-equivalent DIALS 3.24.1, dxtbx 3.24.1, hdf5 1.14.3, Python 3.10.19, and 381 other packages — same build hashes, same versions.327bf6e1541e3e0b63a22c8aff100b92c4aa6e39— PR #56's head commit. mdx2/VERSION = 1.0.2 and the package directory matches the live pod exactly (including the absence ofio.py, which only exists onmain). Reachable viarefs/pull/56/headonly since the branch was deleted; the Dockerfile fetches that ref explicitly.jupyterhub==5.4.3andjupyter-vscode-proxy==0.7.openssh-client,openssh-sftp-server,rsync.The tag
1.0.2-1follows Debian-revision style: upstream1.0.2(mdx2 version) +-1(our first build with this upstream).Future upgrade story
The lockfile decouples mdx2 source upgrades from scientific stack upgrades:
MDX2_COMMITenv var in workflow1.0.3-1,1.0.4-1, ...jhub-env.lock1.0.2-2,1.0.2-3, ...<new-mdx2>-1Lockfile regeneration: spin up a temp container,
micromamba env export --explicit -n mdx2-dev > jhub-env.lock, commit. BumpIMAGE_TAGenv var in the workflow at the same time — the immutability check will refuse to overwrite the existing tag if you forget.Step-by-step recipes for each scenario (with concrete commands, smoke tests, rollback procedure, and common pitfalls): see
docs/upgrading-the-jhub-image.md.Workflow safety
The workflow has explicit guards against accidental tag overwrites and unnecessary builds:
IMAGE_NAME,IMAGE_TAG,MDX2_COMMITare hoisted to a single review surface at the top of the workflow. Bumping a version is a single diff line.docker manifest inspectagainst Dockerhub. If${IMAGE_NAME}:${IMAGE_TAG}already exists, the workflow fails with an explicit error message telling the maintainer to bumpIMAGE_TAG. Prevents silent overwrites of published-and-deployed tags.push:to main only fires whenDockerfile.jhub,jhub-env.lock, or the workflow itself changes. Docs-only PRs that get merged don't trigger an unnecessary rebuild. PR-CI stays unfiltered for predictable required-status-check behavior.contents: read, actions: write(only the latter fortype=ghacache writes). Nopackages: writesince we don't push to ghcr.Files changed
Dockerfile.jhub(71 lines). 3-stage build:mambaorg/micromamba:1.5.5(micromamba binary) →debian:stable-slim(mdx2 git clone, isolated stage so git stays out of final image) →debian:stable-slim(final, with conda env from lockfile + mdx2 editable + pip pins + ssh tooling).jhub-env.lock(389 lines). Conda explicit lockfile; do not edit by hand.docs/upgrading-the-jhub-image.md(304 lines). Operational runbook for the three upgrade scenarios (mdx2 source bump, conda env refresh, OS apt additions) with concrete commands, rollback procedure, and common pitfalls..github/workflows/docker.yml. The pre-existingdocker:job is removed. The newnotebook:job replaces it with: env-var-driven version, paths-filtered push trigger, conditional login, immutability check, and least-privilege permissions.Why remove the existing
docker:jobIt was added 2026-03-25 with two pre-existing bugs that prevented it from ever working:
file: dockerfile(lowercase, fatal on Linux runners) and unconditional login (failed every PR before Dockerhub creds existed). It has never successfully pushed an image. The image it would build (.github/Dockerfile, the:latest-flavored standalone Jupyter Lab launcher withCMD jupyter lab) is not used in the Diffuse deployment chain — onlymdx2-workflows/Dockerfileitself references:1.0.0as a base image, and that base already exists on Dockerhub from a manual push by jlee in March 2026..github/Dockerfileitself is kept around as a reference if anyone wants to revive:1.0.0automation in a focused follow-up PR.Pre-merge checklist
vars.DOCKERHUB_USERNAME(verified working — login step succeeds in CI)secrets.DOCKERHUB_TOKEN(same)Dockerfile.jhubpasses (no push, just verifies the build succeeds in a clean ubuntu-latest runner)diffuseproject/mdx2-jhub:1.0.2-1to DockerhubVerification done locally
docker buildx build4616b2f03c42…, 4.61 GB uncompressed (~50 MB smaller than:testafter stripping unused python_stage and/opt/conda)which scp; which rsync; ls /usr/lib/openssh/sftp-serverjupyterhub-singleuser --version5.4.3jupyterhub-singleuserboot with realistic envimport mdx2; mdx2.__version__1.0.2(matches live pod)mdx2.ioattributeio.pyonly exists on main).gitremoveddxtbx.__version__3.24.1(matches live)d5efd15da8bcbddebfa20b20ef2f2ff6matches lockfile exactlymdx2.import_data --helpsftp-server -h,scp(no args),rsync --versionactionlinton workflowCutover plan (after merge)
diffuseproject/mdx2-jhub:1.0.2-1to Dockerhub. (One-time: the new Dockerhub repodiffuseproject/mdx2-jhubneeds to exist as public and thevars.DOCKERHUB_USERNAMEaccount needs push access. Lazy creation on first push usually works for orgs with auto-create-on-push enabled; eager creation in the Dockerhub UI is the boring/reliable path.)diff-use/webapp) updates the JupyterHub singleuser image config fromdiffuseproject/mdx2:testtodiffuseproject/mdx2-jhub:1.0.2-1. Seeapp/config.py:217plus the repo-name constant if the webapp keeps repo and tag separate.imagePullPolicy: Always.Rollback: revert the webapp PR (single-line). Old
:teststays parked on Dockerhub.Notes for future maintenance
327bf6elives only atrefs/pull/56/head. If the PR is ever purged the commit becomes unreachable. Suggested follow-up: push that SHA as a durable git tag indiff-use/mdx2(e.g.singleuser-source-pin) so future builds can pin to a tag instead of a PR ref..github/Dockerfileis kept in the repo despite no longer being referenced by any workflow. It's the (verified) source for what produced:1.0.0and:lateston Dockerhub. Useful as a reference if the standalone Lab image needs reviving later.1.0.2-1is verified in prod, with its own rollback boundary.Commit history
1db4c76mdx2:1.0.2-1) — the main changecdf2221docker:job from workflow14ec2e8python_stage,/opt/conda, build-context.git) — saves ~50 MB3249501fc51467MDX2_COMMIT: what the SHA is and when to bump itdaaa8c1docs/upgrading-the-jhub-image.md7443d20diffuseproject/mdx2todiffuseproject/mdx2-notebookto disambiguate from the standalone Lab image at:1.0.0/:latestd002c88notebooktojhub(image, Dockerfile, lockfile, doc, workflow refs) — operational-role naming matches the team's verbal shorthand