Skip to content

Remote model inference streaming #3898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Remote model inference streaming #3898

wants to merge 2 commits into from

Conversation

jngz-es
Copy link
Collaborator

@jngz-es jngz-es commented Jun 10, 2025

Description

Support remote model streaming predict API. For now support openai v1 completion and bedrock converse stream APIs.

example

  1. Create an openai connector
curl -X POST "http://localhost:9200/_plugins/_ml/connectors/_create" -H "Content-Type: application/json" -d'
{
    "name": "OpenAI Chat Connector",
    "description": "The connector to public OpenAI model service for GPT 3.5",
    "version": 1,
    "protocol": "http",
    "parameters": {
        "endpoint": "api.openai.com",
        "model": "gpt-3.5-turbo"
    },
    "credential": {
        "openAI_key": "<your_token_id>"
    },
    "actions": [
        {
            "action_type": "predict",
            "method": "POST",
            "url": "https://${parameters.endpoint}/v1/chat/completions",
            "headers": {
                "Authorization": "Bearer ${credential.openAI_key}"
            },
            "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages} }"
        }
    ]
}
'
  1. Create a model with the above connector.
curl -X POST "http://localhost:9200/_plugins/_ml/models/_register?deploy=true" -H "Content-Type: application/json" -d'
{
    "name": "openai gpt 3.5 turbo",
    "function_name": "remote",
    "description": "openai model",
    "connector_id": "<your_connector_id>"
}
'
  1. Predict with stream api.
curl -X POST "http://localhost:9200/_plugins/_ml/models/AlI3QZcB-tBcnGSI1JlH/_predict/stream" -H "Transfer-Encoding: chunked" -H "Content-Type: application/json" -d'
{
  "parameters": {
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Can you summarize Prince Hamlet of William Shakespeare in around 1000 words?"
      }
    ],
    "_llm_interface": "openai/v1/chat/completions"
  }
}
'
  1. You can run the non-stream predict too.
curl -X POST "http://localhost:9200/_plugins/_ml/models/AlI3QZcB-tBcnGSI1JlH/_predict" -H "Content-Type: application/json" -d'
{
  "parameters": {
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Can you summarize Prince Hamlet of William Shakespeare in around 1000 words?"
      }
    ]
  }
}
'

Related Issues

#3630

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jngz-es
Copy link
Collaborator Author

jngz-es commented Jun 10, 2025

Hi team, I am publishing this pr for your early review. Meanwhile I am adding feature flag, fixing the existing tests and adding new tests.

llmInterface = llmInterface.trim().toLowerCase(Locale.ROOT);
llmInterface = StringEscapeUtils.unescapeJava(llmInterface);
switch (llmInterface) {
case "openai/v1/chat/completions":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you mentioned Bedrock converse support too in the description. Are we only going with Open AI for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The neededStreamParameterInPayload is to check if we need to add "stream": true in http request body as openai does. But Bedrock converseStream doesn't use that parameter, but use a specific url like POST /model/modelId/converse-stream, so no converse here.

settings,
STREAM_PREDICT_THREAD_POOL,
OpenSearchExecutors.allocatedProcessors(settings) * 4,
10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand, how did we come up with these numbers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used remote configuration as a reference when configuring stream pool.

root.setRowCount(1);
} else {
try {
Thread.sleep(500);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid thread.sleep by using a blocking queue?

@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 05:50 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 05:50 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 05:50 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 05:50 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 05:50 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 05:50 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 05:50 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 05:50 — with GitHub Actions Failure
@jngz-es jngz-es force-pushed the feature/streaming_1 branch from d44ef25 to c31819c Compare June 10, 2025 19:34
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 19:35 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 19:35 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 19:35 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 19:35 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 19:35 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 19:35 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 19:35 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 19:35 — with GitHub Actions Failure
@jngz-es jngz-es force-pushed the feature/streaming_1 branch from c31819c to b2d5fb7 Compare June 10, 2025 20:10
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 20:12 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 20:12 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 20:12 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 20:12 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 20:12 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 20:12 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 20:12 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 10, 2025 20:12 — with GitHub Actions Failure
Signed-off-by: Jing Zhang <[email protected]>
@jngz-es jngz-es force-pushed the feature/streaming_1 branch from b2d5fb7 to bfeb504 Compare June 11, 2025 21:28
add feature flag

Signed-off-by: Jing Zhang <[email protected]>
@jngz-es jngz-es force-pushed the feature/streaming_1 branch from bfeb504 to 29fab5a Compare June 11, 2025 21:33
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 11, 2025 21:34 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 11, 2025 21:34 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 11, 2025 21:34 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 11, 2025 21:34 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 11, 2025 21:35 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 11, 2025 21:35 — with GitHub Actions Error
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 11, 2025 21:35 — with GitHub Actions Failure
@jngz-es jngz-es had a problem deploying to ml-commons-cicd-env June 11, 2025 21:35 — with GitHub Actions Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants