Skip to content

HandleResponseBodyModelStreaming is called one additional time after streamingEndMsg is received by the server. #1903

@BenjaminBraunDev

Description

@BenjaminBraunDev

What happened:
While implementing new requestcontrol plugins for latency prediction, we noticed that our ResponseStreaming plugin was being run an additional time after our ResponseComplete plugin finished. Both the streaming and complete hooks are run in HandleResponseBodyModelStreaming and ResponseStreaming is always run first before checking if streamingEndMsg was received and if so, running ResponseComplete hooks. The only way the plugins could be getting called in this order is if HandleResponseBodyModelStreaming is being called an additional time after the streamingEndMsg is received.

What you expected to happen:
When streamingEndMsg is received, HandleResponseBodyModelStreaming should not be called again, as it's assumed that the end token is the final one received (This is also triggers when the request is marked as complete in reqCtx).

How to reproduce it (as minimally and precisely as possible):
Create a ResponseStreaming and ResponseComplete plugin that print logs. When sending a streamed request, you will notice that the ResponseStreaming runs once after ResponseComplete.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: v1.33.5-dispatcher
Kustomize Version: v5.6.0
Server Version: v1.33.5-gke.1162000

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions