Skip to content

Update Docker across kubernetes/test-infra CI #4184

@saschagrunert

Description

@saschagrunert

Related: PRs #4183, #4185 | Issue #4180
Status: Investigation Complete | Target: 1.36+

Summary

kubernetes/release pins docker-ce-cli to 24.0.x (API 1.43) because test-infra DinD images have older Docker daemons. Latest docker-ce-cli is 29.0.0 (API 1.52), causing incompatibility: client version 1.52 is too new. Maximum supported API version is 1.43

Root Cause: test-infra Dockerfiles don't pin Docker versions → production images built months ago have Docker 24.x while new clients expect 1.52+

Goals

  1. Pin Docker daemon version in test-infra images (recommend: 27.x for API 1.46)
  2. Remove docker-ce-cli pinning from kubernetes/release
  3. Validate 357 DinD-enabled job configs across multiple SIGs
  4. Minimal disruption to releases

Investigation Findings

DinD Images & Scale

  • Images: bootstrap and kubekins-e2e-v2 in test-infra install docker-ce without version pinning
  • Impact: 1,737 DinD-enabled jobs across 357 config files (release, CSI, cloud providers, networking, storage, node, Cluster API, KIND)
  • Latest available: Docker 29.0.0 (API 1.52, breaks API < 1.44)

Docker Version Mapping

  • Docker 24.0.x → API 1.43 (current in kubernetes/release)
  • Docker 27.x → API 1.46 ⭐ Recommended
  • Docker 29.0.0 → API 1.52 (breaks older clients)

Key Files

  • images/bootstrap/Dockerfile & images/kubekins-e2e-v2/Dockerfile - need version pinning
  • images/bootstrap/runner.sh - DinD initialization
  • 357 job configs with preset-dind-enabled: "true"

Implementation Plan

Phase 1: Research ✅ - Investigation complete
Phase 2: Development - Pin Docker version in Dockerfiles, build staging images
Phase 3: Testing - Validate critical jobs (release, CSI, KIND)
Phase 4: Rollout - Non-critical → critical jobs, unpin docker-ce-cli in kubernetes/release
Phase 5: Cleanup - Remove PR #4183 workaround, update docs

Risks & Mitigations

  • Docker 29.x breaks API < 1.44 → Use Docker 27.x instead
  • 357 jobs to validate → Automate testing, focus on critical paths (release, CSI)
  • Job failures during rollout → Phased rollout, maintain rollback capability
  • KIND compatibility → Validate in testing phase

Next Steps

  1. Decide Docker version: 27.x (recommended) vs 29.x
  2. Pin version in bootstrap and kubekins-e2e-v2 Dockerfiles
  3. Build staging images, test with critical jobs
  4. Phased rollout to production
  5. Remove docker-ce-cli pin from kubernetes/release

Coordination: SIG Testing, SIG Release (discuss in mailing list/meetings)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/cleanupCategorizes issue or PR as related to cleaning up code, process, or technical debt.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions