`HandleResponseBodyModelStreaming` is called one additional time after `streamingEndMsg` is received by the server.

**What happened**:
While implementing new requestcontrol plugins for latency prediction, we noticed that our `ResponseStreaming` plugin was being run an additional time after our `ResponseComplete` plugin finished. Both the streaming and complete hooks are run in `HandleResponseBodyModelStreaming` and `ResponseStreaming` is always run first before checking if `streamingEndMsg` was received and if so, running `ResponseComplete` hooks. The only way the plugins could be getting called in this order is if `HandleResponseBodyModelStreaming` is being called an additional time after the `streamingEndMsg` is received.

**What you expected to happen**:
When `streamingEndMsg` is received, `HandleResponseBodyModelStreaming` should not be called again, as it's assumed that the end token is the final one received (This is also triggers when the request is marked as complete in `reqCtx`).

**How to reproduce it (as minimally and precisely as possible)**:
Create a `ResponseStreaming` and `ResponseComplete` plugin that print logs. When sending a streamed request, you will notice that the `ResponseStreaming` runs once after `ResponseComplete`.

**Environment**:
- Kubernetes version (use `kubectl version`):
```
Client Version: v1.33.5-dispatcher
Kustomize Version: v5.6.0
Server Version: v1.33.5-gke.1162000
```
- Inference extension version (use `git describe --tags --dirty --always`):
`e0afb897` (commit hash)
- Cloud provider or hardware configuration:
`GKE`
- Install tools:
Helm chart via getting started guide (with aforementioned custom requestcontrol plugins see #1839)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`HandleResponseBodyModelStreaming` is called one additional time after `streamingEndMsg` is received by the server. #1903

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HandleResponseBodyModelStreaming is called one additional time after streamingEndMsg is received by the server. #1903

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`HandleResponseBodyModelStreaming` is called one additional time after `streamingEndMsg` is received by the server. #1903