diff --git a/upgrade-postgres-to-14/README.md b/upgrade-postgres-to-14/README.md new file mode 100644 index 0000000..21e7d02 --- /dev/null +++ b/upgrade-postgres-to-14/README.md @@ -0,0 +1,265 @@ +# CircleCI Server: Postgres 12 → 14 upgrade script + +`upgrade-postgres-to-14.sh` automates the on-disk PostgreSQL major-version upgrade inside your CircleCI Server installation. It renders and applies a one-shot Kubernetes Job that runs `pg_upgrade --link` against your existing Postgres PVC, then prints the helm values block to update and the `helm upgrade` command to run. + +The full upgrade procedure — including platform-specific guidance on snapshots, rollback, and recovery — is documented separately at **docs.circleci.com** (CircleCI Server upgrade guide). This README focuses on operating the script and the end-to-end flow at a high level. + +## What the script does + +- Discovers your PVC name, postgres password secret, and (while postgres is still running) the source-cluster encoding/locale. Every value is overridable. +- Verifies that your application-layer deployments are scaled to 0 before doing anything. +- If the postgres StatefulSet is still running, captures encoding/locale from the live cluster and **scales postgres to 0** automatically, then waits for the underlying volume to detach. +- Renders a `pg_upgrade` Job manifest, applies it, and streams the Job's logs. +- Verifies completion by waiting for the Job's `Complete` condition. +- On success, prints the helm values block to paste into your CircleCI Server values file and the exact `helm upgrade` command to run. + +## What the script does NOT do + +- **It does not take backups.** Snapshotting the PVC or pre-provisioning a PVC clone is your responsibility and platform-specific (CSI VolumeSnapshot, your cloud provider's snapshot CLI, Velero, etc.). The script assumes you have a known-good restore path before you invoke it. +- **It does not scale your application layer.** You scale `layer=application` deployments to 0 before invoking the script. The script verifies this and refuses to proceed otherwise. (It does scale the postgres StatefulSet itself — see above.) +- **It does not run `helm upgrade`.** That step is yours — the script tells you exactly what to run. The `helm upgrade` afterward restores replicas to their correct counts for both postgres and your application layer. + +## End-to-end upgrade procedure + +1. **Pre-flight.** + - Confirm your cluster can pull the target `server-postgres:14.22.x` image. By default the script pulls from CircleCI's ACR (`cciserver.azurecr.io`); use `--dockerhub` if you pull from Docker Hub. + - Decide on a rollback strategy: a point-in-time snapshot of the postgres PVC, or a pre-provisioned PVC clone, or both. *How* you take either depends on your platform — see the detailed upgrade guide on docs.circleci.com. + - Source-cluster encoding and locale: in the standard flow, the script captures these for you (it queries `template1` while postgres is still running, then scales postgres to 0 itself). You don't need to query manually. If you'd like to verify ahead of time: + ``` + PGP=$(kubectl -n get secret \ + -o jsonpath='{.data.postgres-password}' | base64 -d) + kubectl -n exec postgresql-0 -- env PGPASSWORD="$PGP" \ + psql -U postgres -tAc \ + "SELECT pg_encoding_to_char(encoding), datcollate, datctype \ + FROM pg_database WHERE datname='template1'" + ``` + The script's built-in defaults are `UTF8` / `C.UTF-8` / `C.UTF-8`. Common alternative locales are `en_US.UTF-8` for `LC_COLLATE` / `LC_CTYPE` on some Bitnami builds. If you prefer to scale postgres down yourself (instead of letting the script do it), capture these values *before* scaling and pass them to the script via: + ``` + ./upgrade-postgres-to-14.sh -n \ + --initdb-encoding \ + --initdb-lc-collate \ + --initdb-lc-ctype + ``` + A locale or encoding mismatch causes `pg_upgrade` to abort partway through its consistency checks with an error naming the offending database and value. + - Plan a maintenance window. Total downtime depends on your database size, storage performance, the number of databases and extensions, and post-upgrade `VACUUM ANALYZE` time. We recommend at least 30-60 minutes. + +2. **Take a backup.** Snapshot the PVC, or pre-provision a clone PVC, or both. The remaining steps assume you can restore from one of these if anything goes wrong. + +3. **Quiesce the application layer.** Scale your application deployments to 0. Leave the postgres StatefulSet alone — the script will scale it down itself after capturing the live cluster's encoding/locale: + ``` + kubectl -n scale deploy -l layer=application --replicas=0 + ``` + The script refuses to proceed if any `layer=application` deployment still has replicas > 0. + +4. **Run the upgrade.** With the application layer quiesced: + ``` + ./upgrade-postgres-to-14.sh -n + ``` + See [Flags](#flags) for overrides. The script will: + - Query the live postgres pod for encoding/locale. + - Prompt to scale the postgres StatefulSet to 0, then wait for the underlying volume to detach (the cloud-side detach round-trip typically takes 30–90 seconds). + - Apply the `pg_upgrade` Job and stream its logs. + - Exit 0 once `pg_upgrade` reports `Upgrade Complete` and the new PG14 cluster is in place at `/bitnami/postgresql/data`. + + If you'd rather scale postgres down yourself, do so before invoking the script — but capture encoding/locale *before* you scale, since the script can't query a stopped pod. + +5. **Update your helm values.** On success, the script prints the exact `postgresql:` block to paste into your CircleCI Server values file. The change is in the `postgresql.image` registry/repository/tag — for the default (ACR) run, the diff against the previous PG12 pin looks like: + + ```diff + postgresql: + image: + registry: cciserver.azurecr.io + repository: circleci/server-postgres + tag: 14.22.4094-4922444 + ``` + + If you ran the script with `--dockerhub`, `registry: docker.io` stays — only the `tag` changes. + +6. **Roll forward.** Run `helm upgrade` against your CircleCI Server release. This upgrades the chart, deploys the new postgres image, and restores correct replica counts for postgres and your application workloads — you don't need a separate scale-up: + ``` + helm upgrade circleci-server \ + oci://cciserver.azurecr.io/circleci-server \ + --version \ + -f + ``` + The new postgres pod mounts the upgraded data directory and starts on PG14. + +7. **Validate.** + ``` + PGP=$(kubectl -n get secret \ + -o jsonpath='{.data.postgres-password}' | base64 -d) + kubectl -n exec -it postgresql-0 -- \ + env PGPASSWORD="$PGP" psql -U postgres -c 'SELECT version();' + kubectl -n exec -it postgresql-0 -- \ + env PGPASSWORD="$PGP" vacuumdb -U postgres --all --analyze-in-stages + ``` + `pg_upgrade` does NOT carry optimizer statistics across — `vacuumdb --analyze-in-stages` rebuilds them and keeps query plans sane. Smoke-test your CircleCI install (log in, trigger a workflow) and watch the postgres logs for any startup warnings. + +8. **Clean up.** After at least 24 hours of healthy operation on PG14: + - Delete the completed upgrade Job (its pod has terminated, but the Job object stays in the namespace until you remove it): + ``` + kubectl -n delete job postgres-upgrade-12-to-14 + ``` + - Inside the postgres pod, remove the old data directory left behind by `pg_upgrade`: + ``` + kubectl -n exec -it postgresql-0 -- bash + # then, inside the pod: + bash /bitnami/postgresql/upgrade-logs/delete_old_cluster.sh + ``` + - Delete your snapshot or pre-provisioned clone PVC if you no longer need them. + +## Quick start + +```bash +# Standard install: everything auto-discovered, ACR by default +./upgrade-postgres-to-14.sh -n circleci-server + +# Pull server-postgres from Docker Hub instead of ACR +./upgrade-postgres-to-14.sh -n circleci-server --dockerhub + +# Use a newer PG14 tag than the script's default +./upgrade-postgres-to-14.sh -n circleci-server -t 14.22.5000-newsha + +# Preview the rendered Job manifest without applying +./upgrade-postgres-to-14.sh -n circleci-server --dry-run + +# Fully explicit, no confirmation prompts +./upgrade-postgres-to-14.sh -n circleci-server \ + --pvc-name data-postgresql-0 --secret-name postgresql \ + --initdb-lc-collate en_US.UTF-8 --initdb-lc-ctype en_US.UTF-8 \ + -y +``` + +## Flags + +| Flag | Default | Purpose | +|---|---|---| +| `-n, --namespace NS` | *(required)* | Namespace where your postgres StatefulSet runs. | +| `-t, --pg14-tag TAG` | `14.22.4094-4922444` | server-postgres image tag to upgrade to. | +| `--pvc-name NAME` | auto-discovered | Source PVC. Defaults to whatever has `app.kubernetes.io/name=postgresql` in your namespace. | +| `--secret-name NAME` | auto-discovered | Secret holding the postgres superuser password. | +| `--secret-key KEY` | `postgres-password` | Key within that secret. | +| `--initdb-encoding ENC` | discovered, else `UTF8` | New cluster encoding. | +| `--initdb-lc-collate LOC` | discovered, else `C.UTF-8` | New cluster `LC_COLLATE`. Must match source. | +| `--initdb-lc-ctype LOC` | discovered, else `C.UTF-8` | New cluster `LC_CTYPE`. Must match source. | +| `--dockerhub` | off | Pull `server-postgres` from Docker Hub (`circleci/server-postgres`) instead of ACR. | +| `--acr-path PATH` | `cciserver.azurecr.io/circleci/server-postgres` | Override the ACR repository path. | +| `--upgrade-job-image IMG` | `tianon/postgres-upgrade:12-to-14` | Image used by the `pg_upgrade` Job itself (always pulled from Docker Hub). | +| `-y, --yes` | off | Skip confirmation prompts. | +| `--dry-run` | off | Render the Job manifest and exit without applying. | +| `-h, --help` | — | Show usage. | + +## Prerequisites + +- `kubectl` configured for the target cluster (verify with `kubectl config current-context`). +- A standard CircleCI Server internal installation of `postgresql`. +- Network reachability from your cluster to the registry hosting the PG14 image (ACR by default), with `imagePullSecrets` configured accordingly. +- Network reachability from your cluster to Docker Hub for the `tianon/postgres-upgrade:12-to-14` utility image used by the Job. If your cluster can't reach Docker Hub, mirror that image to your own registry and pass it via `--upgrade-job-image`. +- Your application-layer deployments (`layer=application`) scaled to 0 before invocation — the script verifies this and refuses to proceed otherwise. The postgres StatefulSet does *not* need to be scaled in advance; the script handles that. + +## Auto-discovery behavior + +The script reads the cluster to fill in placeholder values, so a typical invocation needs only `--namespace`. + +- **PVC name and secret name** are looked up via the `app.kubernetes.io/name=postgresql` label. +- **Encoding and locale** are queried from the live postgres pod's `template1` catalog when the script starts. Discovery happens *before* the script scales postgres down, so it succeeds in the standard flow. If you'd prefer to scale postgres down yourself before invoking the script, you must pass the values via flags — by the time the script runs, the pod is gone and discovery has no live cluster to query. + +If discovery fails or you scaled postgres down ahead of time and didn't pass flags, the script falls back to its built-in defaults (`UTF8` / `C.UTF-8` / `C.UTF-8`) and prints a warning. A mismatch with the source cluster's actual settings causes `pg_upgrade` to abort partway through its consistency checks — see [Pre-flight](#end-to-end-upgrade-procedure) (step 1) for how to override. + +## Image source defaults + +- **server-postgres image** defaults to `cciserver.azurecr.io/circleci/server-postgres:` from CircleCI's ACR. Use `--dockerhub` to pull `circleci/server-postgres:` from Docker Hub instead, or `--acr-path` to override the ACR path. +- **pg_upgrade utility image** is `tianon/postgres-upgrade:12-to-14`, always pulled from Docker Hub regardless of `--dockerhub`. If your cluster cannot reach Docker Hub, mirror this image and pass `--upgrade-job-image` accordingly. + +## What success looks like + +The script prints the next-step block at the end of a successful run: + +``` +═══════════════════════════════════════════════════════════════════════════ +NEXT STEPS +═══════════════════════════════════════════════════════════════════════════ + +1. UPDATE your helm values file. Replace the postgresql block with: + + postgresql: + image: + registry: cciserver.azurecr.io + repository: circleci/server-postgres + tag: 14.22.4094-4922444 + +2. RUN helm upgrade ... +3. VALIDATE ... +4. AFTER ≥24h of healthy operation, clean up ... +═══════════════════════════════════════════════════════════════════════════ +``` + +After updating the values file, run `helm upgrade circleci-server oci://cciserver.azurecr.io/circleci-server --version -f ` to roll forward. This single command rolls the chart, applies the new postgres image, and restores correct replica counts for postgres and your application workloads — no separate scale-up needed. + +## Troubleshooting + +### Re-running after a failed Job + +The Job's first step refuses to re-run if `data-12`, `data-14`, or `data-12.preupgrade` exist on the PVC. To recover, run a one-shot pod that mounts the PVC and resets the state: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: pg-upgrade-cleanup + namespace: +spec: + restartPolicy: Never + securityContext: + runAsUser: 0 + runAsGroup: 0 + containers: + - name: cleanup + image: busybox:latest + command: + - sh + - -c + - | + set -ex + cd /bitnami/postgresql + if [ -d data-12 ] && [ ! -e data ]; then mv data-12 data; fi + rm -rf data-14 upgrade-logs + chown -R 1001:1001 data + volumeMounts: + - name: data + mountPath: /bitnami/postgresql + volumes: + - name: data + persistentVolumeClaim: + claimName: +``` + +Apply, wait for it to reach phase `Succeeded`, delete it, then re-run `upgrade-postgres-to-14.sh`. + +### Common pg_upgrade failures + +- **Locale mismatch** — `lc_collate values for database "" do not match` aborts the consistency checks. Re-run with `--initdb-lc-collate` / `--initdb-lc-ctype` set to the source cluster's actual values. +- **Missing extension build** — if your PG12 databases use extensions (e.g. `pg_stat_statements`, `pgcrypto`, `uuid-ossp`), the target PG14 image must include them. Catalog all extensions per database during pre-flight: + ``` + for db in $(kubectl -n exec postgresql-0 -- env PGPASSWORD="$PGP" \ + psql -U postgres -tAc \ + "SELECT datname FROM pg_database WHERE datistemplate=false AND datname!='postgres'"); do + echo "=== $db ===" + kubectl -n exec postgresql-0 -- env PGPASSWORD="$PGP" \ + psql -U postgres -d "$db" -c "SELECT extname, extversion FROM pg_extension" + done + ``` +- **Authentication failure connecting to source cluster** — confirm `POSTGRES_SECRET_NAME` and `POSTGRES_SECRET_KEY` are correct by running: + ``` + PGP=$(kubectl -n get secret \ + -o jsonpath='{.data.}' | base64 -d) + kubectl -n exec postgresql-0 -- env PGPASSWORD="$PGP" \ + psql -U postgres -c 'SELECT 1' + ``` + *before* quiescing the cluster. + +For anything outside these common cases — including rollback procedures — refer to the CircleCI Server upgrade guide on docs.circleci.com. + +## Where to go next + +The detailed customer-facing upgrade documentation (with platform-specific snapshot and PVC clone procedures, rollback playbooks, and the full troubleshooting reference) is on **docs.circleci.com** under the CircleCI Server upgrade guide. diff --git a/upgrade-postgres-to-14/upgrade-postgres-to-14.sh b/upgrade-postgres-to-14/upgrade-postgres-to-14.sh new file mode 100644 index 0000000..d2f212b --- /dev/null +++ b/upgrade-postgres-to-14/upgrade-postgres-to-14.sh @@ -0,0 +1,626 @@ +#!/usr/bin/env bash +# +# Automates the on-disk Postgres 12 → 14 upgrade for CircleCI Server. +# - Auto-discovers PVC, secret, and (while postgres is up) source-cluster +# encoding/locale. Every value is overridable. +# - If the postgres StatefulSet is still running when the script starts, it +# captures encoding/locale, then scales postgres to 0 and waits for the +# underlying volume to detach. +# - Renders the pg_upgrade Job manifest, applies it, streams logs, and +# verifies completion via `kubectl wait --for=condition=Complete`. +# - On success, prints the helm values block to update and reminds the +# operator to run `helm upgrade` manually. +# +# PREREQUISITES (operator's responsibility): +# - Application-layer deployments (label `layer=application`) scaled to 0. +# The script refuses to proceed if any are still running. +# - Snapshot or PVC clone of the source PVC (platform-specific). +# +# DOES NOT do: app-layer scale-down, snapshot, validation, or post-upgrade +# cleanup. The `helm upgrade` after a successful run restores correct +# replica counts for postgres and the application layer. +# +set -euo pipefail + +SCRIPT_NAME=$(basename "$0") + +# ============================================================================ +# Defaults (override via flags) +# ============================================================================ +NAMESPACE="" +PVC_NAME="" +SECRET_NAME="" +SECRET_KEY="postgres-password" +PG14_TAG="14.22.4094-4922444" + +ACR_PATH="cciserver.azurecr.io/circleci/server-postgres" +DOCKERHUB_PATH="circleci/server-postgres" +USE_DOCKERHUB=false + +INITDB_ENCODING="" +INITDB_LC_COLLATE="" +INITDB_LC_CTYPE="" + +UPGRADE_JOB_IMAGE="tianon/postgres-upgrade:12-to-14" +JOB_NAME="postgres-upgrade-12-to-14" + +ASSUME_YES=false +DRY_RUN=false + +# ============================================================================ +# Helpers +# ============================================================================ +if [ -t 1 ]; then + C_RED='\033[1;31m'; C_YEL='\033[1;33m'; C_CYA='\033[1;36m'; C_GRN='\033[1;32m'; C_OFF='\033[0m' +else + C_RED=''; C_YEL=''; C_CYA=''; C_GRN=''; C_OFF='' +fi + +log() { printf "${C_CYA}[%s]${C_OFF} %s\n" "$SCRIPT_NAME" "$*" >&2; } +ok() { printf "${C_GRN}[%s]${C_OFF} %s\n" "$SCRIPT_NAME" "$*" >&2; } +warn() { printf "${C_YEL}[%s] WARN:${C_OFF} %s\n" "$SCRIPT_NAME" "$*" >&2; } +err() { printf "${C_RED}[%s] ERROR:${C_OFF} %s\n" "$SCRIPT_NAME" "$*" >&2; } +die() { err "$@"; exit 1; } + +confirm() { + $ASSUME_YES && return 0 + local prompt="$1" + local reply + read -r -p "$prompt [y/N] " reply + [[ "$reply" =~ ^[Yy]$ ]] +} + +# ============================================================================ +# Usage +# ============================================================================ +usage() { + cat </dev/null || true) + local count + count=$(echo "$names" | wc -w | tr -d ' ') + case "$count" in + 0) return 1 ;; + 1) echo "$names" ;; + *) err "Multiple $kind match '$selector' in '$NAMESPACE': $names" + err "Specify explicitly via the appropriate flag." + return 2 ;; + esac +} + +discover_locale() { + # Returns 0 if locale info was discovered into INITDB_* vars, 1 otherwise. + local pgp info + pgp=$(kubectl -n "$NAMESPACE" get secret "$SECRET_NAME" \ + -o jsonpath="{.data.$SECRET_KEY}" 2>/dev/null | base64 -d 2>/dev/null) || return 1 + [[ -z "$pgp" ]] && return 1 + info=$(kubectl -n "$NAMESPACE" exec postgresql-0 -- \ + env PGPASSWORD="$pgp" psql -U postgres -tAc \ + "SELECT pg_encoding_to_char(encoding) || '|' || datcollate || '|' || datctype FROM pg_database WHERE datname='template1'" \ + 2>/dev/null) || return 1 + [[ -z "$info" || "$info" != *"|"* ]] && return 1 + INITDB_ENCODING="${INITDB_ENCODING:-${info%%|*}}" + local tmp="${info#*|}" + INITDB_LC_COLLATE="${INITDB_LC_COLLATE:-${tmp%|*}}" + INITDB_LC_CTYPE="${INITDB_LC_CTYPE:-${tmp##*|}}" +} + +log "Discovering defaults from namespace '$NAMESPACE'..." + +if [[ -z "$PVC_NAME" ]]; then + PVC_NAME=$(discover_single pvc 'app.kubernetes.io/name=postgresql') \ + || die "Could not auto-discover PVC; pass --pvc-name" + log " PVC: $PVC_NAME (discovered)" +else + log " PVC: $PVC_NAME" +fi + +if [[ -z "$SECRET_NAME" ]]; then + SECRET_NAME=$(discover_single secret 'app.kubernetes.io/name=postgresql') \ + || die "Could not auto-discover secret; pass --secret-name" + log " Secret: $SECRET_NAME (discovered)" +else + log " Secret: $SECRET_NAME" +fi +log " Secret key: $SECRET_KEY" + +# Locale discovery is deferred — it depends on whether the postgres pod is +# still running when we get to the pre-check phase. If it is, we'll query +# template1 there. If it's not, we'll either use the values passed via +# flags or fall back to Job script defaults (UTF8 / C.UTF-8 / C.UTF-8). + +# ============================================================================ +# Pre-checks +# ============================================================================ +log "" +log "Pre-checks:" + +CURRENT_CTX=$(kubectl config current-context) +log " kubectl context: $CURRENT_CTX" + +kubectl get ns "$NAMESPACE" >/dev/null 2>&1 \ + || die "Namespace '$NAMESPACE' does not exist in context '$CURRENT_CTX'" + +kubectl -n "$NAMESPACE" get pvc "$PVC_NAME" >/dev/null 2>&1 \ + || die "PVC '$PVC_NAME' not found in namespace '$NAMESPACE'" + +kubectl -n "$NAMESPACE" get secret "$SECRET_NAME" >/dev/null 2>&1 \ + || die "Secret '$SECRET_NAME' not found in namespace '$NAMESPACE'" + +SECRET_HAS_KEY=$(kubectl -n "$NAMESPACE" get secret "$SECRET_NAME" \ + -o jsonpath="{.data.$SECRET_KEY}" 2>/dev/null) +[[ -z "$SECRET_HAS_KEY" ]] && die "Secret '$SECRET_NAME' has no key '$SECRET_KEY'" + +# Application layer must already be scaled to 0 — they hold the postgres +# connections that would block shutdown. The script leaves their quiesce to +# the operator, but verifies it before doing anything else. +APP_RUNNING=$(kubectl -n "$NAMESPACE" get deploy -l layer=application \ + -o jsonpath='{range .items[?(@.spec.replicas>0)]}{.metadata.name}({.spec.replicas}) {end}' \ + 2>/dev/null) +if [[ -n "$APP_RUNNING" ]]; then + if $DRY_RUN; then + log " application deployments: still running ($APP_RUNNING) (would need to be 0 for a real run; dry-run continues)" + else + err "Application deployments (layer=application) are still running:" + err " $APP_RUNNING" + err "Scale them down first:" + err " kubectl -n $NAMESPACE scale deploy -l layer=application --replicas=0" + die "Application layer must be at replicas=0 before running this script" + fi +else + log " application deployments (layer=application): all at replicas=0 (good)" +fi + +# Postgres StatefulSet state determines what happens next: +# - Already at 0 → we proceed directly to applying the Job. Locale must +# come from flags or fall back to Job-script defaults. +# - Running → capture encoding/locale from the live cluster, then scale +# postgres to 0 before applying the Job. +SS_REPLICAS=$(kubectl -n "$NAMESPACE" get sts postgresql \ + -o jsonpath='{.spec.replicas}' 2>/dev/null || echo "") +POSTGRES_NEEDS_SCALE=false +if [[ "$SS_REPLICAS" == "0" ]]; then + log " postgres StatefulSet replicas: 0 (already quiesced)" + if [[ -z "$INITDB_ENCODING" && -z "$INITDB_LC_COLLATE" && -z "$INITDB_LC_CTYPE" ]]; then + warn "postgres is already at 0 — cannot auto-discover encoding/locale at this point." + warn "Job script defaults will apply: UTF8 / C.UTF-8 / C.UTF-8." + warn "If the source cluster uses different settings, pg_upgrade will fail at its locale-compat check." + warn "Re-run with --initdb-encoding / --initdb-lc-collate / --initdb-lc-ctype to override." + else + log " initdb (explicitly set): encoding=${INITDB_ENCODING:-} lc_collate=${INITDB_LC_COLLATE:-} lc_ctype=${INITDB_LC_CTYPE:-}" + fi +else + log " postgres StatefulSet replicas: $SS_REPLICAS — script will capture locale, then scale it to 0" + POSTGRES_NEEDS_SCALE=true + if [[ -z "$INITDB_ENCODING" || -z "$INITDB_LC_COLLATE" || -z "$INITDB_LC_CTYPE" ]]; then + if discover_locale 2>/dev/null; then + log " initdb (discovered live): encoding=$INITDB_ENCODING lc_collate=$INITDB_LC_COLLATE lc_ctype=$INITDB_LC_CTYPE" + else + warn "Could not query postgresql-0 for encoding/locale even though the pod appears up." + warn "Job script defaults will apply: UTF8 / C.UTF-8 / C.UTF-8." + warn "If those don't match the source, pg_upgrade will fail at its locale-compat check." + fi + else + log " initdb (explicitly set): encoding=$INITDB_ENCODING lc_collate=$INITDB_LC_COLLATE lc_ctype=$INITDB_LC_CTYPE" + fi +fi + +# Existing Job? — only relevant when we're actually going to apply +if ! $DRY_RUN; then + if kubectl -n "$NAMESPACE" get job "$JOB_NAME" >/dev/null 2>&1; then + warn "Job '$JOB_NAME' already exists in '$NAMESPACE'." + confirm "Delete it before proceeding?" || die "Aborted" + log " Deleting existing Job..." + kubectl -n "$NAMESPACE" delete job "$JOB_NAME" --wait=true + fi +fi + +# ============================================================================ +# Scale postgres to 0 if needed +# ============================================================================ +if $POSTGRES_NEEDS_SCALE && ! $DRY_RUN; then + log "" + log "Scaling postgres StatefulSet to 0..." + confirm "Proceed with scaling postgres down? This takes the database offline." \ + || die "Aborted" + kubectl -n "$NAMESPACE" scale sts postgresql --replicas=0 + log " waiting for postgresql-0 pod to terminate..." + kubectl -n "$NAMESPACE" wait --for=delete pod/postgresql-0 --timeout=5m + + log " waiting for volume to detach (cloud-side detach round-trip)..." + PV_FOR_SCALE=$(kubectl -n "$NAMESPACE" get pvc "$PVC_NAME" \ + -o jsonpath='{.spec.volumeName}') + VA_FOR_SCALE=$(kubectl get volumeattachment \ + -o jsonpath='{range .items[?(@.spec.source.persistentVolumeName=="'"$PV_FOR_SCALE"'")]}{.metadata.name}{end}' \ + 2>/dev/null || true) + if [[ -n "$VA_FOR_SCALE" ]]; then + kubectl wait --for=delete "volumeattachment/$VA_FOR_SCALE" --timeout=3m + log " volume fully detached" + else + log " no VolumeAttachment found; volume already detached" + fi +fi + +# ============================================================================ +# Render YAML +# ============================================================================ +render_yaml() { + cat <
} lc_collate=${INITDB_LC_COLLATE:-} lc_ctype=${INITDB_LC_CTYPE:-} +--- +apiVersion: batch/v1 +kind: Job +metadata: + name: $JOB_NAME + namespace: $NAMESPACE + labels: + purpose: pg-major-upgrade + source-version: "12.16" + target-version: "14.22" +spec: + backoffLimit: 0 + template: + metadata: + labels: + app.kubernetes.io/name: postgres-upgrade + spec: + restartPolicy: Never + securityContext: + runAsUser: 0 + runAsGroup: 0 + fsGroup: 0 + containers: + - name: pg-upgrade + image: $UPGRADE_JOB_IMAGE + imagePullPolicy: IfNotPresent + env: + - name: PGPASSWORD + valueFrom: + secretKeyRef: + name: $SECRET_NAME + key: $SECRET_KEY +HEADER + + [[ -n "$INITDB_ENCODING" ]] && cat < [1/7] Verifying PG 12 cluster at $CUR" + if [[ ! -f "$CUR/PG_VERSION" ]]; then + echo "FATAL: $CUR/PG_VERSION not found."; exit 1 + fi + VER=$(cat "$CUR/PG_VERSION") + if [[ "$VER" != "12" ]]; then + echo "FATAL: expected PG_VERSION=12, found '$VER'"; exit 1 + fi + if [[ -d "$OLD" || -d "$NEW" || -d "$ARCHIVED" ]]; then + echo "FATAL: leftover dir(s) from a prior run found under $PGROOT." + ls -la "$PGROOT" || true + exit 1 + fi + + echo "==> [2/7] Staging directories and configs" + mv "$CUR" "$OLD" + mkdir -p "$NEW" "$LOGS" + chmod 0700 "$OLD" "$NEW" + + if [[ ! -f "$OLD/postgresql.conf" ]]; then + echo " injecting minimal postgresql.conf" + printf '%s\n' \ + '# Minimal postgresql.conf written by the pg_upgrade Job.' \ + '# pg_upgrade overrides connection settings via -c on pg_ctl.' \ + > "$OLD/postgresql.conf" + fi + if [[ ! -f "$OLD/pg_hba.conf" ]]; then + echo " injecting minimal pg_hba.conf" + printf '%s\n' \ + '# Minimal pg_hba.conf written by the pg_upgrade Job.' \ + '# Trust local connections so pg_upgrade can read catalogs.' \ + 'local all all trust' \ + 'host all all 127.0.0.1/32 trust' \ + 'host all all ::1/128 trust' \ + > "$OLD/pg_hba.conf" + fi + + chown -R ${UPGRADE_UID}:${UPGRADE_GID} "$OLD" "$NEW" "$LOGS" + + echo "==> [3/7] initdb new PG14 cluster" + INITDB_ENCODING="${INITDB_ENCODING:-UTF8}" + INITDB_LC_COLLATE="${INITDB_LC_COLLATE:-C.UTF-8}" + INITDB_LC_CTYPE="${INITDB_LC_CTYPE:-C.UTF-8}" + echo " encoding=$INITDB_ENCODING lc_collate=$INITDB_LC_COLLATE lc_ctype=$INITDB_LC_CTYPE" + gosu postgres /usr/lib/postgresql/14/bin/initdb \ + -D "$NEW" \ + --encoding="$INITDB_ENCODING" \ + --lc-collate="$INITDB_LC_COLLATE" \ + --lc-ctype="$INITDB_LC_CTYPE" + + echo "==> [4/7] Running pg_upgrade --link (jobs=4)" + cd "$LOGS" + gosu postgres /usr/lib/postgresql/14/bin/pg_upgrade \ + --old-bindir=/usr/lib/postgresql/12/bin \ + --new-bindir=/usr/lib/postgresql/14/bin \ + --old-datadir="$OLD" \ + --new-datadir="$NEW" \ + --link \ + --jobs=4 + + echo "==> [5/7] Verifying new cluster" + NEWVER=$(cat "$NEW/PG_VERSION") + if [[ "$NEWVER" != "14" ]]; then + echo "FATAL: new cluster PG_VERSION='$NEWVER', expected 14."; exit 1 + fi + + echo "==> [6/7] Swapping data dirs into place" + mv "$OLD" "$ARCHIVED" + mv "$NEW" "$CUR" + + echo "==> [7/7] Restoring chart's uid:gid (${CHART_UID}:${CHART_GID}) on $CUR" + chown -R ${CHART_UID}:${CHART_GID} "$CUR" + + echo "==> Done. PG 14 cluster is at $CUR." + ls -la "$PGROOT" + ls -la "$LOGS" +BASH_BODY + + cat <