Skip to content

Degraded performance (latency, connection errors) when workload is in Istio Ambient (ztunnel) and protected by GlobalNetworkPolicy #11404

@dkulchinsky

Description

@dkulchinsky

Hey folks 👋🏼 I'm cross-referencing an issue I opened for Istio ztunnel (ambient proxy): istio/ztunnel#1666

As I described in that issue, it wasn't at first apparent what was causing the performance issues we were observing, however we've since realized that it was the combination of using ztunnel (Istio Ambient) & Calico GlobalNetworkPolicy

Expected Behavior

Using a GlobalNetworkPolicy should not affect the performance & reliability of traffic to the workload it selects.

Current Behavior

When the workload (NGINX Ingress Controller) is protected by a GNP, we are seeing high rate of connection errors & latency spikes.

Possible Solution

No solution was found, however the issue is mitigated when:

  1. The GNP is removed
  2. The workload (ingress controller) is removed from the Ambient mesh
  3. The source workload is in a different namespace than the ingress controllers (this is not 100% confirmed, I'm still testing, but seem to be the case)

Steps to Reproduce (for bugs)

The referenced ztunnel issue provides all the details about our test setup and how to reproduce a similar environment.

Locust (running in-cluster) --> AWS NLB (LoadBalancer Service) --> Ingress Controller --> echo-server

Note: Locust needs to run in the same namespace as the Ingress Controller for this issue to occur.

Context

This is the GNP policy we have in place, this GNP is designed to ensure that requests to port 8443 (the webhook port) are only allowed from the control-plane node, otherwise it allows ingress on all the other ports, including 15008.

This GNP is used to mitigate security issues in the NGINX Ingress Controller webhook admission controller, this policy works as expected, as far as allow/deny traffic.

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  name: default.int-ic-admission-webhook-restricted
spec:
  applyOnForward: true
  ingress:
  - action: Allow
    destination:
      ports:
      - 8443
    protocol: TCP
    source:
      selector: has(node-role.kubernetes.io/control-plane)
  - action: Allow
    destination:
      ports:
      - 15008
      - 4191
      - 80
      - 443
      - 10254
    protocol: TCP
  - action: Deny
  namespaceSelector: projectcalico.org/name == 'platform'
  selector: app.kubernetes.io/instance == 'int-ic'
  types:
  - Ingress

Your Environment

  • Calico version: v3.29.4

  • Calico dataplane (bpf, nftables, iptables, windows etc.): iptables

  • Orchestrator version (e.g. kubernetes, openshift, etc.): K8s v1.32.7

  • Operating System and version: Ubuntu 24.04.3 LTS, Kernel: 6.14.0-1015-aws

  • Istio/ztunnel version: 1.28.0 (same issue observed on versions 1.27.x)

  • NGINX Ingress Controller v1.11.6

  • Backend service: jmalloc/echo-server:v0.3.7

  • Locust 2.41.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions