Problem: runner leaks TCP connections — newHTTPClient() on every poll cycle exhausts ephemeral ports

### Issue

### Issue

  The remote runner leaks TCP connections to the server until it exhausts all ephemeral source ports, after which it can no longer poll for jobs and all tasks remain stuck in `waiting`.

  **Root cause** (in `services/runners/job_pool.go`):

  `sendProgress()` and `checkNewJobs()` each call `newHTTPClient()` on every invocation, and both are invoked **every second** by the `requestTimer` ticker in `JobPool.Run()`:

  ```go
  func (p *JobPool) sendProgress() (ok bool) {
        client := newHTTPClient()   // new http.Transport on every call
        ...
  }

  func (p *JobPool) checkNewJobs() {
        ...
        client := newHTTPClient()   // new http.Transport on every call
        ...
  }
  ```

  Each `newHTTPClient()` creates a fresh `http.Transport` with its own keep-alive connection pool. The connection is therefore never reused across poll cycles, and the idle connection is never closed — it stays
  ESTABLISHED inside an abandoned transport. At ~2 requests/second this accumulates roughly 2 new connections per second.

  **Observed impact** (Pro v2.18.8, Kubernetes/OpenShift, remote runners):

  After ~12h of uptime, each runner pod held **~28,200 ESTABLISHED connections** to the server service:

  ```
  $ netstat -tn | awk '{print $6}' | sort | uniq -c
    28232 ESTABLISHED
  ```

  all pointing to the server:

  ```
    28232 172.30.54.155:3000
  ```

  At that point every new outbound dial fails and the runner logs, twice per second:

  ```
  level=error msg="Put \"http://semaphore:3000/api/internal/runners\": dial tcp 172.30.54.155:3000: connect: cannot assign requested address" action="send request" context=sending_progress
  level=error msg="Get \"http://semaphore:3000/api/internal/runners\": dial tcp 172.30.54.155:3000: connect: cannot assign requested address" action="send request" context="checking new jobs"
  ```

  Tasks dispatched to the runner stay in `waiting` forever. The accumulated connections (TCP buffers + transports held by GC) also inflate memory usage over time — we initially hit OOMKills at a 768Mi limit
  before identifying this leak.

  **Aggravating detail:** in `checkNewJobs()`, `defer resp.Body.Close()` is placed *after* the early `return` for `resp.StatusCode >= 400`, so in error mode response bodies leak as well.

  ### Suggested fix

  Create the `http.Client` once (e.g. a field on `JobPool`, or a package-level lazily-initialized client) and reuse it in `sendProgress()`, `checkNewJobs()`, `tryRegisterRunner()` and `Unregister()`. The default
   `http.Transport` keep-alive pool will then reuse a single connection instead of leaking one per request. Also move `defer resp.Body.Close()` before the status-code check in `checkNewJobs()`.

  ### Workaround

  Kubernetes liveness probe that restarts the runner pod before port exhaustion:

  ```yaml
  livenessProbe:
    exec:
      command:
        - sh
        - -c
        - '[ "$(netstat -tn 2>/dev/null | grep -c ESTABLISHED)" -lt 5000 ]'
    initialDelaySeconds: 120
    periodSeconds: 60
    failureThreshold: 3
  ```

  ### Impact

  Ansible (task execution)

  ### Installation method

  Kubernetes

  ### Database

  Postgres

  ### Semaphore Version

  Pro v2.18.8 (server and runner) — bug still present in current `develop` source

  ### Additional information

  Two independent runner pods on the same cluster exhibited identical behavior (28,232 and 28,234 connections respectively after ~12h).

### Impact

Configuration

### Installation method

Docker

### Database

_No response_

### Browser

_No response_

### Semaphore Version

Pro v2.18.8

### Ansible Version

```bash

```

### Logs & errors

_No response_

### Manual installation - system information

_No response_

### Configuration

_No response_

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Problem: runner leaks TCP connections — newHTTPClient() on every poll cycle exhausts ephemeral ports #3941

Issue

Issue

Suggested fix

Workaround

Impact

Installation method

Database

Semaphore Version

Additional information

Impact

Installation method

Database

Browser

Semaphore Version

Ansible Version

Logs & errors

Manual installation - system information

Configuration

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Problem: runner leaks TCP connections — newHTTPClient() on every poll cycle exhausts ephemeral ports #3941

Description

Issue

Issue

Suggested fix

Workaround

Impact

Installation method

Database

Semaphore Version

Additional information

Impact

Installation method

Database

Browser

Semaphore Version

Ansible Version

Logs & errors

Manual installation - system information

Configuration

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions